GGUF model with architecture gemma3 is not supported yet

#2
by kieransmith - opened

I'm using the following code to try and get this working:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "ZeroWw/gemma-3-4b-it-abliterated-GGUF"
filename = "gemma-3-4b-it-abliterated.q8q4.gguf"

torch_dtype = torch.float16
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename, torch_dtype=torch_dtype)

inputs = tokenizer.encode("Test message", return_tensors='pt')

outputs = model.generate(inputs, max_length=50, num_return_sequences=1)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(text)

But I get the following error:

Traceback (most recent call last):
  File "...", line 8, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
  File ".../Library/Python/3.9/lib/python/site-packages/transformers/models/auto/tokenization_auto.py", line 927, in from_pretrained
    config_dict = load_gguf_checkpoint(gguf_path, return_tensors=False)["config"]
  File ".../Library/Python/3.9/lib/python/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 401, in load_gguf_checkpoint
    raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma3 is not supported yet.

Are you able to help point me in the right direction with this please?

Owner

I use the quants with llama.cpp / koboldcpp.

ZeroWw changed discussion status to closed

Same error for me.
What is the solution?

Sign up or log in to comment