GPTQ or AWQ Quants

#12
by guialfaro - opened

Are there any GPTQ or AWQ available?

You can use llama.cpp to run the gguf quantization of the model.

💬 @xldistance AWQ delivers superior accuracy at equivalent quantization levels, even with higher bit-width implementations. vllm/sglang(:supporting awq) demonstrates significantly faster performance metrics compared to llama.cpp (or ollama : supporing GGUF) deployments.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment