hassenhamdi
/

granite-vision-3.1-2b-preview-4bit

Image-Text-to-Text

text-generation-inference

4-bit precision

Model card Files Files and versions Community

hassenhamdi commited on Feb 20

Commit

ca4c80f

·

verified ·

1 Parent(s): a727116

Update README.md

Files changed (1) hide show

README.md +53 -0

README.md CHANGED Viewed

@@ -15,3 +15,56 @@ tags:
 - Original model: [ibm-granite/granite-vision-3.1-2b-preview](https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview)
 - precision: 4-bit

 - Original model: [ibm-granite/granite-vision-3.1-2b-preview](https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview)
 - precision: 4-bit
+## Setup
+- You can run the quantized model with these steps:
+- Check requirements from the original. In particular, check python, cuda, and transformers versions.
+- Make sure that you have installed quantization related packages.
+```bash
+pip install bitsandbytes>=0.39.0
+pip install --upgrade accelerate transformers
+```
+- Load & run the model.
+```python
+from transformers import AutoProcessor, AutoModelForVision2Seq
+from huggingface_hub import hf_hub_download
+import torch
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = AutoModelForVision2Seq.from_pretrained('hassenhamdi/granite-vision-3.1-2b-preview-4bit', trust_remote_code=True).to(device)
+tokenizer = AutoProcessor.from_pretrained('ibm-granite/granite-vision-3.1-2b-preview')
+# prepare image and text prompt, using the appropriate prompt template
+img_path = hf_hub_download(repo_id=model_path, filename='example.png')
+conversation = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "url": img_path},
+            {"type": "text", "text": "What is the highest scoring model on ChartQA and what is its score?"},
+        ],
+    },
+]
+inputs = processor.apply_chat_template(
+    conversation,
+    add_generation_prompt=True,
+    tokenize=True,
+    return_dict=True,
+    return_tensors="pt"
+).to(device)
+# autoregressively complete prompt
+output = model.generate(**inputs, max_new_tokens=100)
+print(processor.decode(output[0], skip_special_tokens=True))
+```
+## Configurations
+- The configuration info are in config.json.