OPEA
/

gemma-3-27b-it-int4-AutoRound-cpu

4-bit precision

intel/auto-round

Model card Files Files and versions Community

cicdatopea commited on 6 days ago

Commit

819c98b

·

verified ·

1 Parent(s): 0f6b7bb

Create README.md

Files changed (1) hide show

README.md +81 -0

README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+datasets:
+- NeelNanda/pile-10k
+base_model:
+- google/gemma-3-27b-it
+---
+## Model Details
+This model is an int4 model with group_size 128 and symmetric quantization of [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
+Please follow the license of the original model.
+### Inference on CPU
+we found the unquantized layer must run on BF16 or FP32, so cuda inference is not available now.
+Requirements
+```bash
+pip install auto-round
+pip uninstall intel-extension-for-pytorch
+pip install intel-extension-for-transformers
+```
+~~~python
+from transformers import AutoProcessor, Gemma3ForConditionalGeneration
+from PIL import Image
+import requests
+import torch
+from auto_round import AutoRoundConfig
+model_id = "OPEA/gemma-3-27b-it-int4-AutoRound-cpu"
+quantization_config = AutoRoundConfig(backend="cpu")
+model = Gemma3ForConditionalGeneration.from_pretrained(
+    model_id, torch_dtype=torch.bfloat16, device_map="cpu", quantization_config=quantization_config
+).eval()
+processor = AutoProcessor.from_pretrained(model_id)
+messages = [
+    {
+        "role": "system",
+        "content": [{"type": "text", "text": "You are a helpful assistant."}]
+    },
+    {
+        "role": "user",
+        "content": [
+            {"type": "image",
+             "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
+            {"type": "text", "text": "Describe this image in detail."}
+        ]
+    }
+]
+inputs = processor.apply_chat_template(
+    messages, add_generation_prompt=True, tokenize=True,
+    return_dict=True, return_tensors="pt"
+).to(model.device, dtype=torch.bfloat16)
+input_len = inputs["input_ids"].shape[-1]
+with torch.inference_mode():
+    generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
+    generation = generation[0][input_len:]
+decoded = processor.decode(generation, skip_special_tokens=True)
+print(decoded)
+"""
+Here's a detailed description of the image:
+**Overall Impression:**
+The image is a close-up shot of a vibrant garden scene, focusing on a pink cosmos flower with a bumblebee actively collecting pollen. The composition is natural and slightly wild, with a mix of blooming and fading flowers.
+**Detailed Description:**
+*   **Main Subject:** A bright pink cosmos flower is the central focus. The petals are a delicate shade of pink with a slightly darker pink vein pattern. The
+"""
+~~~