Llama 3.2 3B Instruct - AWQ Quantized (4-bit)

This is a 4-bit AWQ quantized version of meta-llama/Llama-3.2-3B-Instruct.

Quantized using:

  • w_bit: 4
  • q_group_size: 128
  • zero_point: true

Usage

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model = AutoAWQForCausalLM.from_quantized("Sumo10/Llama-3.2-3B-Instruct-AWQ-4bit")
tokenizer = AutoTokenizer.from_pretrained("Sumo10/Llama-3.2-3B-Instruct-AWQ-4bit")

input_ids = tokenizer("What is quantum computing?", return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
Downloads last month
6
Safetensors
Model size
771M params
Tensor type
I32
BF16
FP16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support