nielsr HF Staff commited on
Commit
1630397
Β·
verified Β·
1 Parent(s): 08ebc82

Fix paper link

Browse files

This PR updates the model card by fixing the link to the paper and linking it to Arxiv.

Files changed (1) hide show
  1. README.md +17 -7
README.md CHANGED
@@ -1,17 +1,17 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - TIGER-Lab/MMEB-train
5
  language:
6
  - en
7
- base_model:
8
- - microsoft/Phi-3.5-vision-instruct
9
  library_name: transformers
 
 
10
  tags:
11
  - Retrieval
12
  - Multimodal
13
  - Embedding
14
- pipeline_tag: image-text-to-text
15
  ---
16
 
17
  # Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
@@ -25,7 +25,7 @@ Yingda Chen,</span>
25
  <a href="https://weidong-tom-cai.github.io/">Weidong Cai</a>,</span>
26
  <a href="https://jiankangdeng.github.io">Jiankang Deng</a></span>
27
 
28
- [🏑 Project Page](https://garygutc.github.io/UniME) | [πŸ“„ Paper](https://arxiv.org/pdf/2504.17432) | [πŸ’» Github](https://github.com/deepglint/UniME)
29
 
30
 
31
  <p align="center">
@@ -62,8 +62,16 @@ from torch.nn import functional as F
62
  from transformers import AutoProcessor, AutoModelForCausalLM
63
 
64
  base_model_path="DeepGlint-AI/UniME-Phi3.5-V-4.2B"
65
- img_prompt = '<|user|>\n<|image_1|>\nSummary above image in one word: <|end|>\n<|assistant|>\n'
66
- text_prompt = '<|user|>\n<sent>\nSummary above sentence in one word: <|end|>\n<|assistant|>\n'
 
 
 
 
 
 
 
 
67
 
68
  text = "A man is crossing the street with a red car parked nearby."
69
  image_path = "figures/demo.png"
@@ -108,6 +116,8 @@ print("Score: ", Score)
108
 
109
  ## πŸ“– Citation
110
  If you find this repository useful, please use the following BibTeX entry for citation.
 
 
111
  ```latex
112
  @misc{gu2025breakingmodalitybarrieruniversal,
113
  title={Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs},
 
1
  ---
2
+ base_model:
3
+ - microsoft/Phi-3.5-vision-instruct
4
  datasets:
5
  - TIGER-Lab/MMEB-train
6
  language:
7
  - en
 
 
8
  library_name: transformers
9
+ license: mit
10
+ pipeline_tag: image-text-to-text
11
  tags:
12
  - Retrieval
13
  - Multimodal
14
  - Embedding
 
15
  ---
16
 
17
  # Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
 
25
  <a href="https://weidong-tom-cai.github.io/">Weidong Cai</a>,</span>
26
  <a href="https://jiankangdeng.github.io">Jiankang Deng</a></span>
27
 
28
+ [🏑 Project Page](https://garygutc.github.io/UniME) | [πŸ“„ Paper](https://arxiv.org/abs/2504.17432) | [πŸ’» Github](https://github.com/deepglint/UniME)
29
 
30
 
31
  <p align="center">
 
62
  from transformers import AutoProcessor, AutoModelForCausalLM
63
 
64
  base_model_path="DeepGlint-AI/UniME-Phi3.5-V-4.2B"
65
+ img_prompt = '<|user|>
66
+ <|image_1|>
67
+ Summary above image in one word: <|end|>
68
+ <|assistant|>
69
+ '
70
+ text_prompt = '<|user|>
71
+ <sent>
72
+ Summary above sentence in one word: <|end|>
73
+ <|assistant|>
74
+ '
75
 
76
  text = "A man is crossing the street with a red car parked nearby."
77
  image_path = "figures/demo.png"
 
116
 
117
  ## πŸ“– Citation
118
  If you find this repository useful, please use the following BibTeX entry for citation.
119
+
120
+ [πŸ“„ Paper](https://arxiv.org/abs/2504.17432)
121
  ```latex
122
  @misc{gu2025breakingmodalitybarrieruniversal,
123
  title={Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs},