Kaichengalex commited on
Commit
cca7ba6
·
verified ·
1 Parent(s): 5562026

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -27,7 +27,7 @@ Yingda Chen,</span>
27
 
28
 
29
  <p align="center">
30
- <img src="figures/fig1.png" width="85%" height="85">
31
  </p>
32
 
33
 
@@ -40,12 +40,12 @@ Yingda Chen,</span>
40
  To enhance the MLLM's embedding capability, we propose textual discriminative knowledge distillation. The training process involves decoupling the MLLM's LLM component and processing text with the prompt "Summarize the above sentences in one word.", followed by aligning the student (MLLM) and teacher (NV-Embed V2) embeddings via KL divergence on batch-wise similarity distributions. **Notably, only the LLM component is fine-tuned during this process, while all other parameters remain frozen**.
41
 
42
  <p align="center">
43
- <img src="figures/fig2.png" width="85%" >
44
  </p>
45
 
46
  After that, we propose hard negative enhanced instruction tuning enhances multimodal systems by improving visual sensitivity, strengthening cross-modal alignment, and boosting instruction-following capabilities. At its core are two key innovations: a false negative filtering mechanism using a similarity threshold to eliminate misleading samples, and an automatic hard negative sampling strategy that selects top-k similar but non-matching examples to increase training difficulty.
47
  <p align="center">
48
- <img src="figures/fig3.png" width="85%" >
49
  </p>
50
 
51
 
@@ -102,12 +102,12 @@ print("Score: ", Score)
102
  ## 🔢 Results
103
  ### Diverse Retrieval
104
  <p align="center">
105
- <img src="figures/res1.png" width="85%" >
106
  </p>
107
 
108
  ### MMEB
109
  <p align="center">
110
- <img src="figures/res2.png" width="85%" >
111
  </p>
112
 
113
  ## 📖 Citation
 
27
 
28
 
29
  <p align="center">
30
+ <img src="figures/fig1.png">
31
  </p>
32
 
33
 
 
40
  To enhance the MLLM's embedding capability, we propose textual discriminative knowledge distillation. The training process involves decoupling the MLLM's LLM component and processing text with the prompt "Summarize the above sentences in one word.", followed by aligning the student (MLLM) and teacher (NV-Embed V2) embeddings via KL divergence on batch-wise similarity distributions. **Notably, only the LLM component is fine-tuned during this process, while all other parameters remain frozen**.
41
 
42
  <p align="center">
43
+ <img src="figures/fig2.png">
44
  </p>
45
 
46
  After that, we propose hard negative enhanced instruction tuning enhances multimodal systems by improving visual sensitivity, strengthening cross-modal alignment, and boosting instruction-following capabilities. At its core are two key innovations: a false negative filtering mechanism using a similarity threshold to eliminate misleading samples, and an automatic hard negative sampling strategy that selects top-k similar but non-matching examples to increase training difficulty.
47
  <p align="center">
48
+ <img src="figures/fig3.png">
49
  </p>
50
 
51
 
 
102
  ## 🔢 Results
103
  ### Diverse Retrieval
104
  <p align="center">
105
+ <img src="figures/res1.png">
106
  </p>
107
 
108
  ### MMEB
109
  <p align="center">
110
+ <img src="figures/res2.png">
111
  </p>
112
 
113
  ## 📖 Citation