GPTQ or AWQ Quants

#12

by guialfaro - opened 5 days ago

Discussion

guialfaro

5 days ago

Are there any GPTQ or AWQ available?

xldistance

5 days ago

You can use llama.cpp to run the gguf quantization of the model.

hyunw55

about 2 hours ago

💬 @xldistance AWQ delivers superior accuracy at equivalent quantization levels, even with higher bit-width implementations. vllm/sglang(:supporting awq) demonstrates significantly faster performance metrics compared to llama.cpp (or ollama : supporing GGUF) deployments.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment