|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- Qwen/Qwen3-32B |
|
--- |
|
# Qwen3-32B Quantized Model |
|
|
|
4-bit quantized version of [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) using gptqmodel. |
|
|
|
## Quantization |
|
|
|
```python |
|
from datasets import load_dataset |
|
from gptqmodel import GPTQModel, QuantizeConfig |
|
import sys |
|
|
|
model_id = sys.argv[1] |
|
quant_path = "quantized_model" |
|
|
|
# Load calibration data (1024 samples from C4) |
|
calibration_dataset = load_dataset( |
|
"allenai/c4", |
|
data_files="en/c4-train.00001-of-01024.json.gz", |
|
split="train" |
|
).select(range(1024))["text"] |
|
|
|
# Configure and run quantization |
|
quant_config = QuantizeConfig(bits=4, group_size=128) |
|
model = GPTQModel.load(model_id, quant_config) |
|
model.quantize(calibration_dataset, batch_size=2) |
|
model.save(quant_path) |
|
``` |
|
|
|
## License |
|
|
|
Apache-v2. See LICENSE.txt |