Could you share the code script to convert the original Qwen3-8B to Qwen3-8B-AWQ?

#1
by wenmin-wu - opened

Hi Orion, thanks for sharing this model, could you please share the code script that converted the original Qwen3-8B to Qwen3-8B-AWQ? Many thanks.

Of course!

# make-awq.py
from awq import AutoAWQForCausalLM
from datasets import load_dataset
from transformers import AutoTokenizer
from argparse import ArgumentParser

parser = ArgumentParser()
parser.add_argument("--model", "-m", type=str, required=True)
args = parser.parse_args()


def load_wikitext():
    data = load_dataset("wikitext", "wikitext-2-raw-v1", split="train")
    return [
        text
        for text in data["text"]
        if text.strip() != "" and len(text.split(" ")) > 20
    ]

model_path = args.model
quant_path = model_path + "-awq"
quant_config = {"zero_point": True,
                "q_group_size": 128, "w_bit": 4, "version": "GEMM"}

# Load model
model = AutoAWQForCausalLM.from_pretrained(
    model_path, device_map="auto", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(
    tokenizer,  # type: ignore
    quant_config=quant_config,
    calib_data=load_wikitext(),  # type: ignore
    n_parallel_calib_samples=4,
    max_calib_samples=256,
)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

Then run:

python make-awq.py -m /path/to/your/model

The AWQ model will be saved at /path/to/your/model-awq

Thank you so much, @Orion-zhen , for your prompt response!

Sign up or log in to comment