Could you share the code script to convert the original Qwen3-8B to Qwen3-8B-AWQ?
#1
by
wenmin-wu
- opened
Hi Orion, thanks for sharing this model, could you please share the code script that converted the original Qwen3-8B to Qwen3-8B-AWQ? Many thanks.
Of course!
# make-awq.py
from awq import AutoAWQForCausalLM
from datasets import load_dataset
from transformers import AutoTokenizer
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument("--model", "-m", type=str, required=True)
args = parser.parse_args()
def load_wikitext():
data = load_dataset("wikitext", "wikitext-2-raw-v1", split="train")
return [
text
for text in data["text"]
if text.strip() != "" and len(text.split(" ")) > 20
]
model_path = args.model
quant_path = model_path + "-awq"
quant_config = {"zero_point": True,
"q_group_size": 128, "w_bit": 4, "version": "GEMM"}
# Load model
model = AutoAWQForCausalLM.from_pretrained(
model_path, device_map="auto", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Quantize
model.quantize(
tokenizer, # type: ignore
quant_config=quant_config,
calib_data=load_wikitext(), # type: ignore
n_parallel_calib_samples=4,
max_calib_samples=256,
)
# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
print(f'Model is quantized and saved at "{quant_path}"')
Then run:
python make-awq.py -m /path/to/your/model
The AWQ model will be saved at /path/to/your/model-awq