Align-DS-V-72B

🏠 Homepage | 🤗 Align-Anything Dataset | 🤗 T2T_Instruction-tuning Dataset | 🤗 TI2T_Instruction-tuning Dataset | 👍 Our Official Code Repo

Introduction

Align-DS-V-72B is an experimental vision-language model distilled from [DeepSeek-R1-671B] using ZeroRL methods, based on [Qwen2-VL-72B-Instruct], developed by the PKU-Alignment team, focusing on enhancing reasoning capabilities by all-modality alignment.

Citation

The reproduction script for Align-DS-V will be released in the align-anything repository.

Please cite the repo if you find the model or code in this repo useful 😊

@inproceedings{ji2024align,
  title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback},
  author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang},
  year={2024},
  url={https://arxiv.org/abs/2412.15838}
}

Deployment Scripts for Align-DS-V (Built with Gradio)

This document provides instructions for deploying the Align-DS-V model for inference using Gradio.

Set up the Conda environment: Follow the instructions in the PKU-Alignment/align-anything repository to configure your Conda environment.
Configure the model path: After setting up the environment, update the BASE_MODEL_PATH variable in deploy_align_ds_v.sh to point to your local Align-DS-V model directory.
Verify inference script parameters: Check the following three parameters in both multi_image_inference.py and stream_inference.py:
```
openai_api_key = "pku"  # Or your specific API key if needed
openai_api_base = "http://0.0.0.0:8231/v1" # Ensure this matches the deployment port
# NOTE: Replace with your own model path if not loaded via the API base
model = ''
```
These scripts utilize an OpenAI-compatible server approach. The deploy_align_ds_v.sh script launches the Align-DS-V model locally and exposes it on port 8231 for external access via the specified API base URL.

Running Inference:

Streamed Output:

bash deploy_align_ds_v.sh
python stream_inference.py

Multi-Image Output:

bash deploy_align_ds_v.sh
python multi_image_inference.py