Qwen 3 Quantized Collection
Collection
AWQ-quantized Qwen3 models: 32B, 14B, 8B, and 4B
•
4 items
•
Updated
This repository provides AWQ (Activation-aware Weight Quantization) versions of Qwen3 models, optimized for efficient deployment on consumer hardware while maintaining strong performance.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("abhishekchohan/Qwen3-8B-AWQ", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("abhishekchohan/Qwen3-8B-AWQ")
messages = [{"role": "user", "content": "Explain quantum computing."}]
text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
vllm serve abhishekchohan/Qwen3-8B-AWQ \
--chat-template templates/chat_template.jinja \
--enable-expert-parallel \
--tensor-parallel-size 4
If you use these models, please cite:
@misc{qwen3,
title = {Qwen3 Technical Report},
author = {Qwen Team},
year = {2025},
url = {https://github.com/QwenLM/Qwen3}
}
Base model
Qwen/Qwen3-32B