Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
nisten
/
qwenv2-7b-inst-imatrix-gguf
like
3
GGUF
imatrix
conversational
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
qwenv2-7b-inst-imatrix-gguf
Ctrl+K
Ctrl+K
1 contributor
History:
23 commits
nisten
best speed/perplexity for mobile devices with int8 acceleration
9869461
verified
11 months ago
.gitattributes
3.32 kB
best speed/perplexity for mobile devices with int8 acceleration
11 months ago
8bitimatrix.dat
4.54 MB
xet
calculated imatrix in 8bit, was jsut as good as f16 imatrix
11 months ago
README.md
1.55 kB
Update README.md
11 months ago
qwen7bv2inst_iq4xs_embedding4xs_output6k.gguf
4.22 GB
xet
standard iq4xs imatrix quant from bf16 gguf so it has better perplexity
11 months ago
qwen7bv2inst_iq4xs_embedding4xs_output8bit.gguf
4.35 GB
xet
best speed/perplexity for mobile devices with int8 acceleration
11 months ago
qwen7bv2inst_iq4xs_embedding8_outputq8.gguf
4.64 GB
xet
great quant if your chip has 8bit acceleration, slightly better than 4k embedding
11 months ago
qwen7bv2inst_q4km_embedding4k_output8bit.gguf
4.82 GB
xet
very good quant for speed/perplexity, embedding is at q4k
11 months ago
qwen7bv2inst_q4km_embeddingf16_outputf16.gguf
6.11 GB
xet
Good speed reference quant for older CPUs, however not much improvement from f16 embedding
11 months ago
qwen7bv2instruct_bf16.gguf
15.2 GB
xet
Rename qwen7bf16.gguf to qwen7bv2instruct_bf16.gguf
11 months ago
qwen7bv2instruct_q5km.gguf
5.58 GB
xet
standard q5km conversions with 8bit output for reference.
11 months ago
qwen7bv2instruct_q8.gguf
8.1 GB
xet
Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants
11 months ago