---
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---
FP8-Dynamic quant created with llm-compressor, can run on 16 VRAM cards.
Update vLLM and Transformers:
```
pip install vllm>=0.7.2
pip install transformers>=4.49
```

Then run with:
```
vllm serve leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic --trust-remote-code
```