GGUF model with architecture gemma3 is not supported yet
#2
by
kieransmith
- opened
I'm using the following code to try and get this working:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "ZeroWw/gemma-3-4b-it-abliterated-GGUF"
filename = "gemma-3-4b-it-abliterated.q8q4.gguf"
torch_dtype = torch.float16
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename, torch_dtype=torch_dtype)
inputs = tokenizer.encode("Test message", return_tensors='pt')
outputs = model.generate(inputs, max_length=50, num_return_sequences=1)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)
But I get the following error:
Traceback (most recent call last):
File "...", line 8, in <module>
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
File ".../Library/Python/3.9/lib/python/site-packages/transformers/models/auto/tokenization_auto.py", line 927, in from_pretrained
config_dict = load_gguf_checkpoint(gguf_path, return_tensors=False)["config"]
File ".../Library/Python/3.9/lib/python/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 401, in load_gguf_checkpoint
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma3 is not supported yet.
Are you able to help point me in the right direction with this please?
I use the quants with llama.cpp / koboldcpp.
ZeroWw
changed discussion status to
closed
Same error for me.
What is the solution?