--- base_model: - Qwen/Qwen2.5-VL-7B-Instruct --- FP8-Dynamic quant created with llm-compressor, can run on 16 VRAM cards. Update vLLM and Transformers: ``` pip install vllm>=0.7.2 pip install transformers>=4.49 ``` Then run with: ``` vllm serve leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic --trust-remote-code ```