Why is the model for Q6 even slower to reason than the model for Q8?

by wanxiashiwanxia - opened 1 day ago

Discussion

wanxiashiwanxia

1 day ago

Q6 averages 50 seconds a step, Q8 averages 47 seconds a step... that's not very scientific.

wsbagnsv1

about 17 hours ago

Q6 averages 50 seconds a step, Q8 averages 47 seconds a step... that's not very scientific.

GGUFs are weird, they are basically like zip folders with some data loss, which means you also have decompression algorithms running for inference, but idk why q6 would be slower than q8 specifically

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment