Model Details

This model is an int4 model with group_size 128 and symmetric quantization of google/gemma-3-27b-it generated by intel/auto-round algorithm.

Please follow the license of the original model.

Inference on CPU

we found the unquantized layer must run on BF16 or FP32, so cuda inference is not available now.

Requirements

pip install auto-round
pip uninstall intel-extension-for-pytorch
pip install intel-extension-for-transformers
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from PIL import Image
import requests
import torch
from auto_round import AutoRoundConfig

model_id = "OPEA/gemma-3-27b-it-int4-AutoRound-cpu"

quantization_config = AutoRoundConfig(backend="cpu")
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="cpu", quantization_config=quantization_config
).eval()

processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image",
             "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
"""
Here's a detailed description of the image:

**Overall Impression:**

The image is a close-up shot of a vibrant garden scene, focusing on a pink cosmos flower with a bumblebee actively collecting pollen. The composition is natural and slightly wild, with a mix of blooming and fading flowers.

**Detailed Description:**

*   **Main Subject:** A bright pink cosmos flower is the central focus. The petals are a delicate shade of pink with a slightly darker pink vein pattern. The
"""
Downloads last month
13
Safetensors
Model size
5.26B params
Tensor type
I32
·
BF16
·
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for OPEA/gemma-3-27b-it-int4-AutoRound-cpu

Quantized
(38)
this model

Dataset used to train OPEA/gemma-3-27b-it-int4-AutoRound-cpu