OPEA/gemma-3-27b-it-int4-AutoRound-cpu

Model Details

This model is an int4 model with group_size 128 and symmetric quantization of google/gemma-3-27b-it generated by intel/auto-round algorithm.

Please follow the license of the original model.

Inference on CPU

we found the unquantized layer must run on BF16 or FP32, so cuda inference is not available now.

Requirements

pip install auto-round
pip uninstall intel-extension-for-pytorch
pip install intel-extension-for-transformers

from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from PIL import Image
import requests
import torch
from auto_round import AutoRoundConfig

model_id = "OPEA/gemma-3-27b-it-int4-AutoRound-cpu"

quantization_config = AutoRoundConfig(backend="cpu")
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="cpu", quantization_config=quantization_config
).eval()

processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image",
             "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
"""
Here's a detailed description of the image:

**Overall Impression:**

The image is a close-up shot of a vibrant garden scene, focusing on a pink cosmos flower with a bumblebee actively collecting pollen. The composition is natural and slightly wild, with a mix of blooming and fading flowers.

**Detailed Description:**

*   **Main Subject:** A bright pink cosmos flower is the central focus. The petals are a delicate shade of pink with a slightly darker pink vein pattern. The
"""

OPEA
/

gemma-3-27b-it-int4-AutoRound-cpu

Model Details

Inference on CPU

Model tree for OPEA/gemma-3-27b-it-int4-AutoRound-cpu

Dataset used to train OPEA/gemma-3-27b-it-int4-AutoRound-cpu