--- base_model: - Qwen/Qwen2.5-VL-72B-Instruct language: - en license: apache-2.0 tags: - transformers - multimodal pipeline_tag: visual-question-answering --- # INFRL-Qwen2.5-VL-72B-Preview ## Model Overview - **INFRL-Qwen2.5-VL-72B-Preview** improves visual reasoning upon [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) model. - As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)). ## Evaluation | Models | MathVision (test) | MathVista (testmini) | MathVerse (testmini) | |-------------------|-------------------|----------------------|----------------------| | GPT4o | 30.6 | 60 | 41.2 | | Gemini-2.0-Flash | 41.3 | 70.1 | 50.6 | | Claude 3.5 Sonnet | 33.5 | 67.7 | 47.8 | | QvQ-72B | 35.9 | 71.4 | 48.6 | | InternVL2.5-78B | 34.9 | 72.3 | 51.7 | | Qwen-VL-2.5-72B | 38.1 | 74.8 | 57.18 | | INFRL-VL-Preview | 41.9 | 77.8 | 58.84 | We will release a code repository for VLM evaluation. It supports RL training with simple rule-based rewards, meanwhile aligning with LLM-Judge results. Stay tuned! ## Contributors ### Supervisors Wei Chu • Yuan Qi ### VL Team Haozhe Wang • Zuming Huang ### RL Team Haozhe Wang • Chao Qu • Long Li ## Thanks Thanks to Jiaran Hao, Liuyihan Song for supports in the RL infrastructure. ## Citation If you find our model useful, please consider citing: ``` @misc {INFRL_VL_Preview, author = { {Wang, Haozhe and Huang, Zuming and Qu, Chao and Chu, Wei and Qi, Yuan} }, title = { INFRL-Qwen2.5-VL-72B-Preview }, year = 2025, url = { https://huggingface.co/infly/INFRL-Qwen2.5-VL-72B-Preview}, publisher = { Hugging Face } } ```