Safetensors
qwen2
xiangan lbourdois commited on
Commit
0663758
·
verified ·
1 Parent(s): 4ec92e0

Improve language tag (#1)

Browse files

- Improve language tag (9b62d248e46f471dc8e4d7cc817cf4bb2cd48ca4)


Co-authored-by: Loïck BOURDOIS <[email protected]>

Files changed (1) hide show
  1. README.md +167 -156
README.md CHANGED
@@ -1,157 +1,168 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - zh
5
- - en
6
- metrics:
7
- - bleu
8
- base_model:
9
- - Qwen/Qwen2.5-7B-Instruct
10
- ---
11
-
12
-
13
- [[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
14
-
15
-
16
- ## Embodied Ability Evaluation: Performance in RoboVQA and OpenEQA
17
-
18
-
19
-
20
- | | | MLCD <br> Embodied-7B | LLaVA <br> OneVision-7B | GPT-4v | RoboMamba |
21
- :-- | :-- | :-: | :-: | :-: | :-: |
22
- | RoboVQA | BLEU1 | <span style="color:red">73.16</span> | 38.12 | - | 54.9 |
23
- | | BLEU2 | <span style="color:red">66.39</span> | 33.56 | - | 44.2 |
24
- | | BLEU3 | <span style="color:red">60.61</span> | 31.76 | - | 39.5 |
25
- | | BLEU4 | <span style="color:red">56.56</span> | 30.97 | - | 36.3 |
26
- | OpenEQA | Object State Recognition | <span style="color:red">71.83</span> | - | 63.2 | - |
27
- | | Object Recognition | <span style="color:red">49.46</span> | - | 43.4 | - |
28
- | | Functional Reasoning | 54.38 | - | <span style="color:red">57.4</span> | - |
29
- | | Spatial Understanding | <span style="color:red">48.64</span> | - | 33.6 | - |
30
- | | Attribute Recognition | <span style="color:red">67.08</span> | - | 57.2 | - |
31
- | | World Knowledge | <span style="color:red">53.87</span> | - | 50.7 | - |
32
- | | Object Localization | <span style="color:red">43.06</span> | - | 42.0 | - |
33
-
34
-
35
-
36
-
37
- ## General Ability Evaluation: Comparison with LLaVA OneVision-7B and GPT-4
38
-
39
- | Dataset | Split | MLCD<br>Embodied-7B | LLaVA<br>OneVision-7B | GPT-4v | GPT-4o |
40
- | :-- | :-: | :-: | :-: | :-: | :-: |
41
- | A12D | test | 79.9 | 81.4 | 78.2 | 94.2 |
42
- | ChartQA | test | 83.0 | 80.0 | 78.5 | 85.7 |
43
- | DocVQA | test | 91.6 | 87.5 | 88.4 | 92.8 |
44
- | InfoVQA | val | 73.9 | 70.7 | - | - |
45
- | InfoVQA | test | 70.0 | 68.8 | - | - |
46
- | MMMU | val | 47.3 | 48.8 | 56.8 | 69.1 |
47
- | MMStar | test | 58.5 | 61.7 | 57.1 | 63.9 |
48
- | OCRBench | - | 749.0 | 697.0 | 656.0 | 805.0 |
49
- | RealWorldQA | test | 68.9 | 66.3 | 61.4 | 58.6 |
50
- | SeedBench | image | 74.9 | 75.4 | 49.9 | 76.2 |
51
- | MMbench | en-dev | 81.1 | 83.2 | 81.3 | 83.4 |
52
- | MMbench | en-test | 80.1 | 80.8 | 75.0 | - |
53
- | MME | test | 578/1603 | 418/1580 | 517/1409 | - |
54
-
55
- ## Usage
56
-
57
- ### A. Installation
58
-
59
- ```bash
60
- git clone https://github.com/deepglint/unicom
61
- cd unicom
62
-
63
- # Upgrade pip and install necessary dependencies
64
- pip install --upgrade pip
65
- pip install -e ".[train]"
66
- ```
67
-
68
- ### B. Inference
69
-
70
- ```bash
71
- git clone https://github.com/deepglint/unicom
72
- cd unicom
73
- pip install --upgrade pip
74
- pip install -e ".[train]"
75
- pip install flash-attn --no-build-isolation
76
-
77
- CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir DeepGlint-AI/MLCD-Embodied-7B
78
-
79
- # example:
80
- # >> Enter 'exit' to end the conversation, 'reset' to clear the chat history.
81
- # >> Enter image file paths (comma-separated): ./asserts/logo.png
82
- # >> User: <image>What kind of animal is it in this picture?
83
- # >> Assistant: The image features a stylized representation of a cat, characterized by its vibrant and abstract depiction.
84
- # >> User: What color is this cat?
85
- # >> Assistant: The cat in the image is primarily white with blue, orange and pink accents, creating a visually appealing and unique appearance.
86
- # >> User: <image>请你介绍一下这个图片
87
- # >> Assistant: 这是一幅充满创意的猫头艺术作品。它采用了多色渐变和抽象风格,将猫的头部描绘成一个充满活力和色彩的视觉冲击。猫的眼睛用金色渲染,显得非常有神采,
88
- # 而粉色的鼻子则增添了一丝可爱感。整体设计融合了现代艺术与传统猫头图案,创造出一种既独特又引人入胜的视觉效果。。
89
- ```
90
-
91
- ### C. Evaluation for Embodied Ability
92
-
93
- #### Step 1
94
-
95
- Download raw data following [OpenEQA](https://github.com/facebookresearch/open-eqa/tree/main/data) and [RoboVQA](https://console.cloud.google.com/storage/browser/gdm-robovqa)(val part)
96
-
97
- #### Step 2
98
-
99
- Converting raw data into the format required for model evaluation.
100
- ```bash
101
- # convert OpenEQA benchmark. Note: replace the paths with your own.
102
- python llava/benchmark/make_openeqa_bmk.py
103
-
104
- # convert RoboVQA benchmark. Note: replace the paths with your own.
105
- python llava/benchmark/make_robovqa_bmk.py
106
- ```
107
-
108
- #### Step 3
109
-
110
- Make sure that your top-level directory structure should look like this:
111
- ```
112
- |--/path/to/your/benchmarks
113
- | |--OpenEQA
114
- | | |--openeqa_scannet.parquet
115
- | | |--openeqa_hm3d.parquet
116
- | |--RoboVQA
117
- | |--robovqa.parquet
118
- |--/path/to/your/images
119
- |--openeqa_val
120
- | |--scannet-v0
121
- | | |--002-scannet-scene0709_00
122
- | | |--xxx-scannet-scenexxxx_xx
123
- | |--hm3d-v0
124
- | |--000-hm3d-BFRyYbPCCPE
125
- | |--xxx-hm3d-xxxxxxxxxxx
126
- |--robovqa_val
127
- |--robovqa_221911
128
- |--robovqa_xxxxxx
129
- ```
130
-
131
- #### Step 4
132
-
133
- Run script for evaluation
134
- ```bash
135
- # Note: replace 'YOUR_API_KEY', 'YOUR_ENDPOINT', 'bmk_root', 'image_folder' with your own.
136
- bash scripts/eval/eval_robo.sh /path/to/your/model
137
- ```
138
-
139
- ### D. Evaluation for General Ability
140
-
141
- Install the evaluation tool and execute the evaluation script:
142
- ```bash
143
- pip install lmms-eval==0.2.0
144
- PYTHONPATH=./ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m accelerate.commands.launch \
145
- --main_process_port=12444 \
146
- --num_processes=8 \
147
- -m lmms_eval \
148
- --model llava \
149
- --model_args pretrained=DeepGlint-AI/MLCD-Embodied-7B,conv_template=qwen_1_5 \
150
- --tasks mme \
151
- --batch_size 1 \
152
- --log_samples \
153
- --log_samples_suffix mlcd \
154
- --output_path ./eval_log/
155
- ```
156
-
 
 
 
 
 
 
 
 
 
 
 
157
  We would like to express our gratitude to [Huajie Tan](https://huggingface.co/tanhuajie2001), [Yumeng Wang](https://huggingface.co/devymex), [Yin Xie](https://huggingface.co/Yin-Xie) for his significant contributions to the experimental validation in MLLMs.
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ metrics:
18
+ - bleu
19
+ base_model:
20
+ - Qwen/Qwen2.5-7B-Instruct
21
+ ---
22
+
23
+
24
+ [[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
25
+
26
+
27
+ ## Embodied Ability Evaluation: Performance in RoboVQA and OpenEQA
28
+
29
+
30
+
31
+ | | | MLCD <br> Embodied-7B | LLaVA <br> OneVision-7B | GPT-4v | RoboMamba |
32
+ :-- | :-- | :-: | :-: | :-: | :-: |
33
+ | RoboVQA | BLEU1 | <span style="color:red">73.16</span> | 38.12 | - | 54.9 |
34
+ | | BLEU2 | <span style="color:red">66.39</span> | 33.56 | - | 44.2 |
35
+ | | BLEU3 | <span style="color:red">60.61</span> | 31.76 | - | 39.5 |
36
+ | | BLEU4 | <span style="color:red">56.56</span> | 30.97 | - | 36.3 |
37
+ | OpenEQA | Object State Recognition | <span style="color:red">71.83</span> | - | 63.2 | - |
38
+ | | Object Recognition | <span style="color:red">49.46</span> | - | 43.4 | - |
39
+ | | Functional Reasoning | 54.38 | - | <span style="color:red">57.4</span> | - |
40
+ | | Spatial Understanding | <span style="color:red">48.64</span> | - | 33.6 | - |
41
+ | | Attribute Recognition | <span style="color:red">67.08</span> | - | 57.2 | - |
42
+ | | World Knowledge | <span style="color:red">53.87</span> | - | 50.7 | - |
43
+ | | Object Localization | <span style="color:red">43.06</span> | - | 42.0 | - |
44
+
45
+
46
+
47
+
48
+ ## General Ability Evaluation: Comparison with LLaVA OneVision-7B and GPT-4
49
+
50
+ | Dataset | Split | MLCD<br>Embodied-7B | LLaVA<br>OneVision-7B | GPT-4v | GPT-4o |
51
+ | :-- | :-: | :-: | :-: | :-: | :-: |
52
+ | A12D | test | 79.9 | 81.4 | 78.2 | 94.2 |
53
+ | ChartQA | test | 83.0 | 80.0 | 78.5 | 85.7 |
54
+ | DocVQA | test | 91.6 | 87.5 | 88.4 | 92.8 |
55
+ | InfoVQA | val | 73.9 | 70.7 | - | - |
56
+ | InfoVQA | test | 70.0 | 68.8 | - | - |
57
+ | MMMU | val | 47.3 | 48.8 | 56.8 | 69.1 |
58
+ | MMStar | test | 58.5 | 61.7 | 57.1 | 63.9 |
59
+ | OCRBench | - | 749.0 | 697.0 | 656.0 | 805.0 |
60
+ | RealWorldQA | test | 68.9 | 66.3 | 61.4 | 58.6 |
61
+ | SeedBench | image | 74.9 | 75.4 | 49.9 | 76.2 |
62
+ | MMbench | en-dev | 81.1 | 83.2 | 81.3 | 83.4 |
63
+ | MMbench | en-test | 80.1 | 80.8 | 75.0 | - |
64
+ | MME | test | 578/1603 | 418/1580 | 517/1409 | - |
65
+
66
+ ## Usage
67
+
68
+ ### A. Installation
69
+
70
+ ```bash
71
+ git clone https://github.com/deepglint/unicom
72
+ cd unicom
73
+
74
+ # Upgrade pip and install necessary dependencies
75
+ pip install --upgrade pip
76
+ pip install -e ".[train]"
77
+ ```
78
+
79
+ ### B. Inference
80
+
81
+ ```bash
82
+ git clone https://github.com/deepglint/unicom
83
+ cd unicom
84
+ pip install --upgrade pip
85
+ pip install -e ".[train]"
86
+ pip install flash-attn --no-build-isolation
87
+
88
+ CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir DeepGlint-AI/MLCD-Embodied-7B
89
+
90
+ # example:
91
+ # >> Enter 'exit' to end the conversation, 'reset' to clear the chat history.
92
+ # >> Enter image file paths (comma-separated): ./asserts/logo.png
93
+ # >> User: <image>What kind of animal is it in this picture?
94
+ # >> Assistant: The image features a stylized representation of a cat, characterized by its vibrant and abstract depiction.
95
+ # >> User: What color is this cat?
96
+ # >> Assistant: The cat in the image is primarily white with blue, orange and pink accents, creating a visually appealing and unique appearance.
97
+ # >> User: <image>请你介绍一下这个图片
98
+ # >> Assistant: 这是一幅充满创意的猫头艺术作品。它采用了多色渐变和抽象风格,将猫的头部描绘成一个充满活力和色彩的视觉冲击。猫的眼睛用金色渲染,显得非常有神采,
99
+ # 而粉色的鼻子则增添了一丝可爱感。整体设计融合了现代艺术与传统猫头图案,创造出一种既独特又引人入胜的视觉效果。。
100
+ ```
101
+
102
+ ### C. Evaluation for Embodied Ability
103
+
104
+ #### Step 1
105
+
106
+ Download raw data following [OpenEQA](https://github.com/facebookresearch/open-eqa/tree/main/data) and [RoboVQA](https://console.cloud.google.com/storage/browser/gdm-robovqa)(val part)
107
+
108
+ #### Step 2
109
+
110
+ Converting raw data into the format required for model evaluation.
111
+ ```bash
112
+ # convert OpenEQA benchmark. Note: replace the paths with your own.
113
+ python llava/benchmark/make_openeqa_bmk.py
114
+
115
+ # convert RoboVQA benchmark. Note: replace the paths with your own.
116
+ python llava/benchmark/make_robovqa_bmk.py
117
+ ```
118
+
119
+ #### Step 3
120
+
121
+ Make sure that your top-level directory structure should look like this:
122
+ ```
123
+ |--/path/to/your/benchmarks
124
+ | |--OpenEQA
125
+ | | |--openeqa_scannet.parquet
126
+ | | |--openeqa_hm3d.parquet
127
+ | |--RoboVQA
128
+ | |--robovqa.parquet
129
+ |--/path/to/your/images
130
+ |--openeqa_val
131
+ | |--scannet-v0
132
+ | | |--002-scannet-scene0709_00
133
+ | | |--xxx-scannet-scenexxxx_xx
134
+ | |--hm3d-v0
135
+ | |--000-hm3d-BFRyYbPCCPE
136
+ | |--xxx-hm3d-xxxxxxxxxxx
137
+ |--robovqa_val
138
+ |--robovqa_221911
139
+ |--robovqa_xxxxxx
140
+ ```
141
+
142
+ #### Step 4
143
+
144
+ Run script for evaluation
145
+ ```bash
146
+ # Note: replace 'YOUR_API_KEY', 'YOUR_ENDPOINT', 'bmk_root', 'image_folder' with your own.
147
+ bash scripts/eval/eval_robo.sh /path/to/your/model
148
+ ```
149
+
150
+ ### D. Evaluation for General Ability
151
+
152
+ Install the evaluation tool and execute the evaluation script:
153
+ ```bash
154
+ pip install lmms-eval==0.2.0
155
+ PYTHONPATH=./ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m accelerate.commands.launch \
156
+ --main_process_port=12444 \
157
+ --num_processes=8 \
158
+ -m lmms_eval \
159
+ --model llava \
160
+ --model_args pretrained=DeepGlint-AI/MLCD-Embodied-7B,conv_template=qwen_1_5 \
161
+ --tasks mme \
162
+ --batch_size 1 \
163
+ --log_samples \
164
+ --log_samples_suffix mlcd \
165
+ --output_path ./eval_log/
166
+ ```
167
+
168
  We would like to express our gratitude to [Huajie Tan](https://huggingface.co/tanhuajie2001), [Yumeng Wang](https://huggingface.co/devymex), [Yin Xie](https://huggingface.co/Yin-Xie) for his significant contributions to the experimental validation in MLLMs.