GPTQ or AWQ Quants
#12
by
guialfaro
- opened
Are there any GPTQ or AWQ available?
You can use llama.cpp to run the gguf quantization of the model.
💬 @xldistance AWQ delivers superior accuracy at equivalent quantization levels, even with higher bit-width implementations. vllm/sglang(:supporting awq) demonstrates significantly faster performance metrics compared to llama.cpp (or ollama : supporing GGUF) deployments.