Improve language tag

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show

README.md +82 -68

README.md CHANGED Viewed

@@ -1,69 +1,83 @@
----
-license: apache-2.0
-datasets:
-- liuhaotian/LLaVA-Pretrain
-- lmms-lab/LLaVA-NeXT-Data
-base_model:
-- Qwen/Qwen2.5-7B-Instruct
----
-[[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
-## Model
-We used [**MLCD**](https://huggingface.co/DeepGlint-AI/mlcd-vit-large-patch14-336) as the Vision Encoder in [LLaVA-Next](https://huggingface.co/lmms-lab/llava-next-qwen-32b).
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6478679d7b370854241b2ad8/8n_jBobanaLNAQjM5eZeg.png)
-## Data
-Our model was trained on publicly available data from the [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) and [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) datasets.
-## How to eval
-```shell
-pip install lmms-eval==0.2.0
-CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
-python -m accelerate.commands.launch \
-  --main_process_port=12581 \
-  --num_processes=8 \
-  -m lmms_eval \
-  --model llava \
-  --model_args pretrained=DeepGlint-AI/llava-mlcd-qwen2.5-7b,conv_template=qwen_1_5 \
-  --tasks mmbench,mme,mmmu,ocrbench,scienceqa,scienceqa_img,seedbench,gqa,pope,textvqa_val,ai2d,chartqa,docvqa_val,infovqa_val,mmstar \
-  --batch_size 1 \
-  --log_samples \
-  --log_samples_suffix mlcd_llava_qwen2_7b \
-  --output_path ./log
-```
-## Performance and Limitations
-In our experiments, we replaced the CLIP model in [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) with the MLCD model to demonstrate the performance of the MLCD model in Multimodal Large Language Models (MLLMs). For the language model, we used [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). The evaluation results show that the modified model performs exceptionally well across multiple benchmarks, validating the effectiveness of the MLCD model within MLLMs.
-| Vision Tower | MLCD (ViT_L_14_336px) | CLIP (ViT_L_14_336px) |
-|:----------------|:-------------|:-------------|
-| LLM | Qwen2.5-7B | Qwen2.5-7B |
-| AI2D | **76.98** | 73.15 |
-| ScienceQA_img | **78.09** | 76.35 |
-| GQA | **64.17** | 63.31 |
-| InfoVQA_val | **43.48** | 38.88 |
-| MMBench_cn_dev | **74.83** | 72.51 |
-| MMBench_en_dev | **76.37** | 74.57 |
-| MME(cognition) | **432** | 384 |
-| MME(perception) | **1598** | 1512 |
-| SeedBench | **68.20** | 66.80 |
-| SeedBench_img | **73.75** | 72.72 |
-| MMStar | **50.98** | 48.98 |
-| MMMU | **44.30** | 44.20 |
-| OCRBench | **531.00** | 525.00 |
-| ChartQA | **67.84** | 66.52 |
-| DocVQA_val | **76.46** | 75.21 |
-| POPE | 88.69 | **88.83** |
-| TextVQA_val | 61.69 | **62.47** |
-### C. Limitations
-Models with larger datasets will perform better on more tasks. We are currently training such models and will soon make them available.
-## Acknowledgments
 We would like to express our gratitude to [Yumeng Wang](https://huggingface.co/devymex) for his significant contributions to the experimental validation in MLLMs.

+---
+license: apache-2.0
+datasets:
+- liuhaotian/LLaVA-Pretrain
+- lmms-lab/LLaVA-NeXT-Data
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+---
+[[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
+## Model
+We used [**MLCD**](https://huggingface.co/DeepGlint-AI/mlcd-vit-large-patch14-336) as the Vision Encoder in [LLaVA-Next](https://huggingface.co/lmms-lab/llava-next-qwen-32b).
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6478679d7b370854241b2ad8/8n_jBobanaLNAQjM5eZeg.png)
+## Data
+Our model was trained on publicly available data from the [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) and [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) datasets.
+## How to eval
+```shell
+pip install lmms-eval==0.2.0
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+python -m accelerate.commands.launch \
+  --main_process_port=12581 \
+  --num_processes=8 \
+  -m lmms_eval \
+  --model llava \
+  --model_args pretrained=DeepGlint-AI/llava-mlcd-qwen2.5-7b,conv_template=qwen_1_5 \
+  --tasks mmbench,mme,mmmu,ocrbench,scienceqa,scienceqa_img,seedbench,gqa,pope,textvqa_val,ai2d,chartqa,docvqa_val,infovqa_val,mmstar \
+  --batch_size 1 \
+  --log_samples \
+  --log_samples_suffix mlcd_llava_qwen2_7b \
+  --output_path ./log
+```
+## Performance and Limitations
+In our experiments, we replaced the CLIP model in [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) with the MLCD model to demonstrate the performance of the MLCD model in Multimodal Large Language Models (MLLMs). For the language model, we used [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). The evaluation results show that the modified model performs exceptionally well across multiple benchmarks, validating the effectiveness of the MLCD model within MLLMs.
+| Vision Tower | MLCD (ViT_L_14_336px) | CLIP (ViT_L_14_336px) |
+|:----------------|:-------------|:-------------|
+| LLM | Qwen2.5-7B | Qwen2.5-7B |
+| AI2D | **76.98** | 73.15 |
+| ScienceQA_img | **78.09** | 76.35 |
+| GQA | **64.17** | 63.31 |
+| InfoVQA_val | **43.48** | 38.88 |
+| MMBench_cn_dev | **74.83** | 72.51 |
+| MMBench_en_dev | **76.37** | 74.57 |
+| MME(cognition) | **432** | 384 |
+| MME(perception) | **1598** | 1512 |
+| SeedBench | **68.20** | 66.80 |
+| SeedBench_img | **73.75** | 72.72 |
+| MMStar | **50.98** | 48.98 |
+| MMMU | **44.30** | 44.20 |
+| OCRBench | **531.00** | 525.00 |
+| ChartQA | **67.84** | 66.52 |
+| DocVQA_val | **76.46** | 75.21 |
+| POPE | 88.69 | **88.83** |
+| TextVQA_val | 61.69 | **62.47** |
+### C. Limitations
+Models with larger datasets will perform better on more tasks. We are currently training such models and will soon make them available.
+## Acknowledgments
 We would like to express our gratitude to [Yumeng Wang](https://huggingface.co/devymex) for his significant contributions to the experimental validation in MLLMs.