nisten
/

qwenv2-7b-inst-imatrix-gguf

Model card Files Files and versions

qwenv2-7b-inst-imatrix-gguf

Ctrl+K

Ctrl+K

1 contributor

History: 23 commits

nisten's picture

best speed/perplexity for mobile devices with int8 acceleration

9869461 verified 11 months ago

.gitattributes

3.32 kB

best speed/perplexity for mobile devices with int8 acceleration 11 months ago
8bitimatrix.dat

4.54 MB
xet

calculated imatrix in 8bit, was jsut as good as f16 imatrix 11 months ago
README.md

1.55 kB

Update README.md 11 months ago
qwen7bv2inst_iq4xs_embedding4xs_output6k.gguf

4.22 GB
xet

standard iq4xs imatrix quant from bf16 gguf so it has better perplexity 11 months ago
qwen7bv2inst_iq4xs_embedding4xs_output8bit.gguf

4.35 GB
xet

best speed/perplexity for mobile devices with int8 acceleration 11 months ago
qwen7bv2inst_iq4xs_embedding8_outputq8.gguf

4.64 GB
xet

great quant if your chip has 8bit acceleration, slightly better than 4k embedding 11 months ago
qwen7bv2inst_q4km_embedding4k_output8bit.gguf

4.82 GB
xet

very good quant for speed/perplexity, embedding is at q4k 11 months ago
qwen7bv2inst_q4km_embeddingf16_outputf16.gguf

6.11 GB
xet

Good speed reference quant for older CPUs, however not much improvement from f16 embedding 11 months ago
qwen7bv2instruct_bf16.gguf

15.2 GB
xet

Rename qwen7bf16.gguf to qwen7bv2instruct_bf16.gguf 11 months ago
qwen7bv2instruct_q5km.gguf

5.58 GB
xet

standard q5km conversions with 8bit output for reference. 11 months ago
qwen7bv2instruct_q8.gguf

8.1 GB
xet

Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants 11 months ago