Fix paper link
Browse filesThis PR updates the model card by fixing the link to the paper and linking it to Arxiv.
README.md
CHANGED
@@ -1,17 +1,17 @@
|
|
1 |
---
|
2 |
-
|
|
|
3 |
datasets:
|
4 |
- TIGER-Lab/MMEB-train
|
5 |
language:
|
6 |
- en
|
7 |
-
base_model:
|
8 |
-
- microsoft/Phi-3.5-vision-instruct
|
9 |
library_name: transformers
|
|
|
|
|
10 |
tags:
|
11 |
- Retrieval
|
12 |
- Multimodal
|
13 |
- Embedding
|
14 |
-
pipeline_tag: image-text-to-text
|
15 |
---
|
16 |
|
17 |
# Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
|
@@ -25,7 +25,7 @@ Yingda Chen,</span>
|
|
25 |
<a href="https://weidong-tom-cai.github.io/">Weidong Cai</a>,</span>
|
26 |
<a href="https://jiankangdeng.github.io">Jiankang Deng</a></span>
|
27 |
|
28 |
-
[π‘ Project Page](https://garygutc.github.io/UniME) | [π Paper](https://arxiv.org/
|
29 |
|
30 |
|
31 |
<p align="center">
|
@@ -62,8 +62,16 @@ from torch.nn import functional as F
|
|
62 |
from transformers import AutoProcessor, AutoModelForCausalLM
|
63 |
|
64 |
base_model_path="DeepGlint-AI/UniME-Phi3.5-V-4.2B"
|
65 |
-
img_prompt = '<|user
|
66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
text = "A man is crossing the street with a red car parked nearby."
|
69 |
image_path = "figures/demo.png"
|
@@ -108,6 +116,8 @@ print("Score: ", Score)
|
|
108 |
|
109 |
## π Citation
|
110 |
If you find this repository useful, please use the following BibTeX entry for citation.
|
|
|
|
|
111 |
```latex
|
112 |
@misc{gu2025breakingmodalitybarrieruniversal,
|
113 |
title={Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs},
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- microsoft/Phi-3.5-vision-instruct
|
4 |
datasets:
|
5 |
- TIGER-Lab/MMEB-train
|
6 |
language:
|
7 |
- en
|
|
|
|
|
8 |
library_name: transformers
|
9 |
+
license: mit
|
10 |
+
pipeline_tag: image-text-to-text
|
11 |
tags:
|
12 |
- Retrieval
|
13 |
- Multimodal
|
14 |
- Embedding
|
|
|
15 |
---
|
16 |
|
17 |
# Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
|
|
|
25 |
<a href="https://weidong-tom-cai.github.io/">Weidong Cai</a>,</span>
|
26 |
<a href="https://jiankangdeng.github.io">Jiankang Deng</a></span>
|
27 |
|
28 |
+
[π‘ Project Page](https://garygutc.github.io/UniME) | [π Paper](https://arxiv.org/abs/2504.17432) | [π» Github](https://github.com/deepglint/UniME)
|
29 |
|
30 |
|
31 |
<p align="center">
|
|
|
62 |
from transformers import AutoProcessor, AutoModelForCausalLM
|
63 |
|
64 |
base_model_path="DeepGlint-AI/UniME-Phi3.5-V-4.2B"
|
65 |
+
img_prompt = '<|user|>
|
66 |
+
<|image_1|>
|
67 |
+
Summary above image in one word: <|end|>
|
68 |
+
<|assistant|>
|
69 |
+
'
|
70 |
+
text_prompt = '<|user|>
|
71 |
+
<sent>
|
72 |
+
Summary above sentence in one word: <|end|>
|
73 |
+
<|assistant|>
|
74 |
+
'
|
75 |
|
76 |
text = "A man is crossing the street with a red car parked nearby."
|
77 |
image_path = "figures/demo.png"
|
|
|
116 |
|
117 |
## π Citation
|
118 |
If you find this repository useful, please use the following BibTeX entry for citation.
|
119 |
+
|
120 |
+
[π Paper](https://arxiv.org/abs/2504.17432)
|
121 |
```latex
|
122 |
@misc{gu2025breakingmodalitybarrieruniversal,
|
123 |
title={Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs},
|