Llama 3.2 3B Instruct - AWQ Quantized (4-bit)
This is a 4-bit AWQ quantized version of meta-llama/Llama-3.2-3B-Instruct.
Quantized using:
w_bit
: 4q_group_size
: 128zero_point
: true
Usage
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model = AutoAWQForCausalLM.from_quantized("Sumo10/Llama-3.2-3B-Instruct-AWQ-4bit")
tokenizer = AutoTokenizer.from_pretrained("Sumo10/Llama-3.2-3B-Instruct-AWQ-4bit")
input_ids = tokenizer("What is quantum computing?", return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support