Why is the model for Q6 even slower to reason than the model for Q8?
#8
by
wanxiashiwanxia
- opened
Q6 averages 50 seconds a step, Q8 averages 47 seconds a step... that's not very scientific.
Q6 averages 50 seconds a step, Q8 averages 47 seconds a step... that's not very scientific.
GGUFs are weird, they are basically like zip folders with some data loss, which means you also have decompression algorithms running for inference, but idk why q6 would be slower than q8 specifically