sandman4
/

Qwen3-32B-GPTQ-4bit

4-bit precision

Model card Files Files and versions Community

sandman4 commited on 13 days ago

Commit

00dbfaf

·

1 Parent(s): df47298

add README.md

Files changed (1) hide show

README.md +30 -3

README.md CHANGED Viewed

@@ -1,3 +1,30 @@
----
-license: apache-2.0
----

+# Qwen3-32B GPTQ 4bit
+[GPTQModel](https://github.com/ModelCloud/GPTQModel)
+```python3
+from datasets import load_dataset
+from gptqmodel import GPTQModel, QuantizeConfig
+import sys
+model_id = sys.argv[1]
+print(model_id)
+quant_path = "quantized_model"
+calibration_dataset = load_dataset(
+    "allenai/c4",
+    data_files="en/c4-train.00001-of-01024.json.gz",
+    split="train"
+  ).select(range(1024))["text"]
+quant_config = QuantizeConfig(bits=4, group_size=128)
+model = GPTQModel.load(model_id, quant_config)
+# increase `batch_size` to match gpu/vram specs to speed up quantization
+model.quantize(calibration_dataset, batch_size=2)
+model.save(quant_path)
+```