W4A16 quantization using llmcompressor. Run with:
vllm serve leon-se/gemma-3-27b-it-qat-W4A16-G128 --max-model-len 4096 --max-num-seqs 1
- Downloads last month
- 151
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for leon-se/gemma-3-27b-it-qat-W4A16-G128
Base model
google/gemma-3-27b-pt
Finetuned
google/gemma-3-27b-it