cicdatopea commited on
Commit
819c98b
·
verified ·
1 Parent(s): 0f6b7bb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - NeelNanda/pile-10k
4
+ base_model:
5
+ - google/gemma-3-27b-it
6
+ ---
7
+ ## Model Details
8
+
9
+ This model is an int4 model with group_size 128 and symmetric quantization of [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
10
+
11
+ Please follow the license of the original model.
12
+
13
+ ### Inference on CPU
14
+
15
+ we found the unquantized layer must run on BF16 or FP32, so cuda inference is not available now.
16
+
17
+ Requirements
18
+
19
+ ```bash
20
+ pip install auto-round
21
+ pip uninstall intel-extension-for-pytorch
22
+ pip install intel-extension-for-transformers
23
+ ```
24
+
25
+ ~~~python
26
+ from transformers import AutoProcessor, Gemma3ForConditionalGeneration
27
+ from PIL import Image
28
+ import requests
29
+ import torch
30
+ from auto_round import AutoRoundConfig
31
+
32
+ model_id = "OPEA/gemma-3-27b-it-int4-AutoRound-cpu"
33
+
34
+ quantization_config = AutoRoundConfig(backend="cpu")
35
+ model = Gemma3ForConditionalGeneration.from_pretrained(
36
+ model_id, torch_dtype=torch.bfloat16, device_map="cpu", quantization_config=quantization_config
37
+ ).eval()
38
+
39
+ processor = AutoProcessor.from_pretrained(model_id)
40
+
41
+ messages = [
42
+ {
43
+ "role": "system",
44
+ "content": [{"type": "text", "text": "You are a helpful assistant."}]
45
+ },
46
+ {
47
+ "role": "user",
48
+ "content": [
49
+ {"type": "image",
50
+ "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
51
+ {"type": "text", "text": "Describe this image in detail."}
52
+ ]
53
+ }
54
+ ]
55
+
56
+ inputs = processor.apply_chat_template(
57
+ messages, add_generation_prompt=True, tokenize=True,
58
+ return_dict=True, return_tensors="pt"
59
+ ).to(model.device, dtype=torch.bfloat16)
60
+
61
+ input_len = inputs["input_ids"].shape[-1]
62
+
63
+ with torch.inference_mode():
64
+ generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
65
+ generation = generation[0][input_len:]
66
+
67
+ decoded = processor.decode(generation, skip_special_tokens=True)
68
+ print(decoded)
69
+ """
70
+ Here's a detailed description of the image:
71
+
72
+ **Overall Impression:**
73
+
74
+ The image is a close-up shot of a vibrant garden scene, focusing on a pink cosmos flower with a bumblebee actively collecting pollen. The composition is natural and slightly wild, with a mix of blooming and fading flowers.
75
+
76
+ **Detailed Description:**
77
+
78
+ * **Main Subject:** A bright pink cosmos flower is the central focus. The petals are a delicate shade of pink with a slightly darker pink vein pattern. The
79
+ """
80
+
81
+ ~~~