Qwen3-32B Quantized Model

4-bit quantized version of Qwen3-32B using gptqmodel.

Quantization

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig
import sys

model_id = sys.argv[1]
quant_path = "quantized_model"

# Load calibration data (1024 samples from C4)
calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(1024))["text"]

# Configure and run quantization
quant_config = QuantizeConfig(bits=4, group_size=128)
model = GPTQModel.load(model_id, quant_config)
model.quantize(calibration_dataset, batch_size=2)
model.save(quant_path)

License

Apache-v2. See LICENSE.txt

Downloads last month
33
Safetensors
Model size
5.74B params
Tensor type
I32
BF16
FP16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for sandman4/Qwen3-32B-GPTQ-4bit

Base model

Qwen/Qwen3-32B
Quantized
(63)
this model