huggingface/modules/transformers_modules/Llama3.1-253B/configuration_decilm.py", line 22, in
from .block_config import BlockConfig
ModuleNotFoundError: No module named 'transformers_modules.Llama3'

I tried transformers version mentioned in the readme, along with the latest. I can not get either to proceed with quantization. Is there something I am missing? Would be awesome if you could point me in the correct direction or someone would upload a quantized version.

from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "models/Llama3.1-253B"
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID, device_map="auto", torch_dtype="auto", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

from datasets import load_dataset

NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048

Load dataset.

ds = load_dataset("HuggingFaceFW/fineweb", split="train_sft")
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))

Preprocess the data into the format the model is trained with.

def preprocess(example):
return {
"text": tokenizer.apply_chat_template(
example["messages"],
tokenize=False,
)
}

ds = ds.map(preprocess)

Tokenize the data (be careful with bos tokens - we need add_special_tokens=False since the chat_template already added it).

def tokenize(sample):
return tokenizer(
sample["text"],
padding=False,
max_length=MAX_SEQUENCE_LENGTH,
truncation=True,
add_special_tokens=False,
)

ds = ds.map(tokenize, remove_columns=ds.column_names)

from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import GPTQModifier

Configure the quantization algorithm to run.

recipe = GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"])

Apply quantization.

oneshot(
model=model,
dataset=ds,
recipe=recipe,
max_seq_length=MAX_SEQUENCE_LENGTH,
num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)

Save to disk compressed.

SAVE_DIR = MODEL_ID.split("/")[1] + "-W4A16-G128"
model.save_pretrained(SAVE_DIR, save_compressed=True)
tokenizer.save_pretrained(SAVE_DIR)

Little chefs (miniature people), full of energy and enthusiasm, prepare traditional Syrian cheese dessert in a bright, imaginative kitchen. The scene is full of movement, vitality, and a joyful atmosphere—the little chefs pull and stretch cheese dough with team spirit and joy.

They stand on spoons, climb on mixing bowls, and roll the dough on cutting boards that appear enormous compared to their size. The kitchen atmosphere is playful and dreamy, with glowing pastel colors: soft white, warm beige, light turquoise, and pistachio green.

The ingredients—cylindrical halawa jibneh, cheese, sweet syrup, and ground pistachios—glow under the dim lighting, creating a magical, storybook atmosphere. The whole scene feels like a fun and lively culinary adventure in a miniature world!

nvidia
/

Llama-3_1-Nemotron-Ultra-253B-v1

AWQ OR GPTQ Quant