DFloat11/Qwen3-32B-DF11 · How to run this model with Vllm backend?

Hi @kentyuan123 , thank you for your interest in our work!

While I'd love to integrate DFloat11 into vLLM, I currently don't have the bandwidth to tackle this project. If you'd like to implement this integration yourself, I recommend taking a look at this specific function in our codebase: https://github.com/LeanModels/DFloat11/blob/75f7181dc1c7341920c50bd349bbe6949074675b/dfloat11/dfloat11.py#L173

This function handles replacing the original BFloat16 weights with DFloat11 weights and adds pre-forward hooks that perform on-the-fly decompression from DFloat11 to BFloat16.

Our models currently support inference with the transformers library. You can install the latest version via pip install -U dfloat11[cuda12] and follow the guide at https://github.com/LeanModels/DFloat11.