Quantized Models
Collection
2 items
โข
Updated
Model creator: Qwen
Original model: Qwen2.5-7B-Instruct-1M
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Filename | Quant type | File Size | Split | Description |
---|---|---|---|---|
Qwen2.5-7B-Instruct-1M-F32.gguf | f32 | 30.5 GB | false | Full F32 weights. |
Qwen2.5-7B-Instruct-1M-F16.gguf | f16 | 15.24 GB | false | Full F16 weights. |
Qwen2.5-7B-Instruct-1M-Q8_0.gguf | Q8_0 | 8.10 GB | false | Extremely high quality, generally unneeded but max available quant. |
Qwen2.5-7B-Instruct-1M-Q6_K.gguf | Q6_K | 6.25 GB | false | Very high quality, near perfect, recommended. |
Qwen2.5-7B-Instruct-1M-Q5_K_M.gguf | Q5_K_M | 5.44 GB | false | High quality, recommended. |
Qwen2.5-7B-Instruct-1M-Q5_K_S.gguf | Q5_K_S | 5.32 GB | false | High quality, recommended. |
Qwen2.5-7B-Instruct-1M-Q4_1.gguf | Q4_1 | 4.87 GB | false | Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon. |
Qwen2.5-7B-Instruct-1M-Q4_K_M.gguf | Q4_K_M | 4.68 GB | false | Good quality, default size for most use cases, recommended. |
Qwen2.5-7B-Instruct-1M-Q4_K_S.gguf | Q4_K_S | 4.46 GB | false | Slightly lower quality with more space savings, recommended. |
Qwen2.5-7B-Instruct-1M-Q4_0.gguf | Q4_0 | 4.43 GB | false | Legacy format, offers online repacking for ARM and AVX CPU inference. |
Qwen2.5-7B-Instruct-1M-Q3_K_L.gguf | Q3_K_L | 4.09 GB | false | Lower quality but usable, good for low RAM availability. |
Qwen2.5-7B-Instruct-1M-Q3_K_M.gguf | Q3_K_M | 3.81 GB | false | Low quality. |
Qwen2.5-7B-Instruct-1M-Q3_K_S.gguf | Q3_K_S | 3.49 GB | false | Low quality, not recommended. |
Qwen2.5-7B-Instruct-1M-Q2_K.gguf | Q2_K | 3.02 GB | false | Very low quality but surprisingly usable. |
Supports a context length of up to 1M tokens.
Significantly improved performance in handling long-context tasks while maintaining its capability in short tasks.
Accuracy degradation may occur for sequences exceeding 262,144 tokens until improved support is added.
For more information, check their blog here.
First, make sure you have hugginface-cli installed:
pip install -U "huggingface_hub[cli]"
Then, you can target the specific file you want:
huggingface-cli download BabaK07/Qwen2.5-7b-Instruct-1M-Q4_K_M-gguf --include "Qwen2.5-7b-Instruct-1M-Q4_K_M.gguf" --local-dir ./
๐ Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit