WaltonFuture/Qwen2.5VL-7b-RLCS

🐙 GitHub Repo: waltonfuture/RL-with-Cold-Start
📜 Paper (arXiv): Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start (arXiv:2505.22334)

Cold Start Stage

We conduct supervised fine-tuning on Qwen2.5-VL-3B and Qwen2.5-VL-7B using ms-swift. In this stage, please refer to this curated dataset distilled from Qwen2.5-VL-32B using rejection sampling.

Setup

git clone https://github.com/waltonfuture/RL-with-Cold-Start.git
cd RL-with-Cold-Start/SFT
pip install -e .

Prepare Data

python convert_data.py

SFT

bash qwen2.5vl_sft.sh

The checkpoint can be found in SFT/output.

RL Stage

We further conduct GRPO using EasyR1. Please refer to this dataset for the GRPO training.

Setup

git clone https://github.com/waltonfuture/RL-with-Cold-Start.git
cd RL-with-Cold-Start/GRPO
pip install -e .

GRPO Training (replace the checkpoint with the model after SFT)

bash examples/qwen2_5_vl_7b_grpo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easyr1/qwen2_5_vl_7b_grpo/global_step_80/actor

Data Access

Our two stage datasets are now available on Huggingface.

Stage	Data
Cold Start	Multimodal-Cold-Start
RL	Multimodal-RL-Data

Model Access

Our models are now available on Huggingface.

Backbone	Our model
Qwen2.5-VL-7b	Qwen2.5VL-7b-RL-with-Cold-Start
Qwen2.5-VL-3b	Qwen2.5VL-3b-RL-with-Cold-Start

Acknowledgment

Our models are built upon the amazing Qwen2.5-VL family. We thank EasyR1 and ms-swift for their training codes.

Contact

Please contact Lai Wei ([email protected]) if needed.

Citation

@article{wei2025advancing,
  title={Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start},
  author={Wei, Lai and Li, Yuting and Zheng, Kaipeng and Wang, Chen and Wang, Yue and Kong, Linghe and Sun, Lichao and Huang, Weiran},
  journal={arXiv preprint arXiv:2505.22334},
  year={2025}
}

WaltonFuture
/

Qwen2.5VL-7b-RLCS

Cold Start Stage

Setup

Prepare Data

SFT

RL Stage

Setup

GRPO Training (replace the checkpoint with the model after SFT)

Merge Checkpoint in Hugging Face Format

Data Access

Model Access

Acknowledgment

Contact

Citation

Model tree for WaltonFuture/Qwen2.5VL-7b-RLCS

Datasets used to train WaltonFuture/Qwen2.5VL-7b-RLCS