Updated Sapnous MoE Scores
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
license_name:
|
3 |
language:
|
4 |
- en
|
5 |
pipeline_tag: image-text-to-text
|
@@ -35,80 +35,103 @@ Sapnous-6B is a state-of-the-art vision-language model designed to enhance perce
|
|
35 |
|
36 |
## Scores
|
37 |
|
38 |
-
## **📊 Benchmark Results**
|
39 |
-
|
40 |
-
### **Multimodal Benchmarks**
|
41 |
-
| Benchmark | InternVL2.5-8B | MiniCPM-o 2.6 | GPT-4o-mini | Qwen2-VL-7B | Qwen2.5-VL-7B | **Sapnous-MoE** | **Sapnous-6B** |
|
42 |
-
|----------------------------|---------------|--------------|-------------|-------------|---------------|---------------|---------------|
|
43 |
-
| MMMU_val | 56 | 50.4 | **60** | 54.1 | 58.6 | **61.3** | **60.2** |
|
44 |
-
| MMMU-Pro_val | 34.3 | - | 37.6 | 30.5 | 41.0 | **41.9** | **40.7** |
|
45 |
-
| DocVQA_test | 93 | 93 | - | 94.5 | **95.7** | **96.8** | **95.6** |
|
46 |
-
| InfoVQA_test | 77.6 | - | - | 76.5 | **82.6** | **83.2** | **81.9** |
|
47 |
-
| ChartQA_test | 84.8 | - | - | 83.0 | **87.3** | **88.5** | **87.2** |
|
48 |
-
| TextVQA_val | 79.1 | 80.1 | - | 84.3 | **84.9** | **85.8** | **84.6** |
|
49 |
-
| OCRBench | 822 | 852 | 785 | 845 | **864** | **872** | **861** |
|
50 |
-
| CC_OCR | 57.7 | - | - | 61.6 | **77.8** | **78.5** | **77.3** |
|
51 |
-
| MMStar | 62.8 | - | - | 60.7 | **63.9** | **64.9** | **63.6** |
|
52 |
-
| MMBench-V1.1-En_test | 79.4 | 78.0 | 76.0 | 80.7 | **82.6** | **83.7** | **82.4** |
|
53 |
-
| MMT-Bench_test | - | - | - | 63.7 | **63.6** | **64.5** | **63.3** |
|
54 |
-
| MMStar | **61.5** | 57.5 | 54.8 | 60.7 | **63.9** | **64.9** | **63.6** |
|
55 |
-
| MMVet_GPT-4-Turbo | 54.2 | 60.0 | 66.9 | 62.0 | **67.1** | **68.5** | **67.2** |
|
56 |
-
| HallBench_avg | 45.2 | 48.1 | 46.1 | 50.6 | **52.9** | **53.8** | **52.5** |
|
57 |
-
| MathVista_testmini | 58.3 | 60.6 | 52.4 | 58.2 | **68.2** | **69.1** | **67.9** |
|
58 |
-
| MathVision | - | - | - | 16.3 | **25.07** | **25.9** | **24.8** |
|
59 |
|
60 |
---
|
61 |
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
|
66 |
-
|
67 |
-
|
|
68 |
-
| MMMU
|
69 |
-
|
|
70 |
-
|
|
71 |
-
|
|
72 |
-
|
|
73 |
-
|
|
74 |
-
|
|
75 |
-
|
|
76 |
-
|
|
77 |
-
|
|
78 |
-
|
|
79 |
-
|
|
80 |
-
|
|
81 |
-
|
|
82 |
-
|
|
83 |
-
| MGSM (CoT) | 0 | em | 68.9 | 86.9 | **88.7** | **87.4** |
|
84 |
|
85 |
---
|
86 |
|
87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
|
|
89 |
The model is distributed across 5 safetensors files for efficient loading and memory management. Each file contains specific layers and weights as documented in the model.safetensors.index.json.
|
90 |
|
91 |
## Usage
|
92 |
|
93 |
```python
|
94 |
-
from transformers import
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
```
|
113 |
|
114 |
## Model Capabilities
|
|
|
1 |
---
|
2 |
+
license_name: apache-2.0
|
3 |
language:
|
4 |
- en
|
5 |
pipeline_tag: image-text-to-text
|
|
|
35 |
|
36 |
## Scores
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
---
|
40 |
|
41 |
+
## **📊 Benchmark Results**
|
42 |
+
|
43 |
+
### **Multimodal Benchmarks**
|
44 |
+
| Benchmark | InternVL2.5-8B | MiniCPM-o 2.6 | GPT-4o-mini | Qwen2-VL-7B | Qwen2.5-VL-7B | **Sapnous-MoE (Updated)** | **Sapnous-6B** |
|
45 |
+
|----------------------------|---------------|--------------|-------------|-------------|---------------|-----------------|-----------------|
|
46 |
+
| MMMU_val | 56 | 50.4 | **60** | 54.1 | 58.6 | **64.4** | **60.2** |
|
47 |
+
| MMMU-Pro_val | 34.3 | - | 37.6 | 30.5 | 41.0 | **44.9** | **40.7** |
|
48 |
+
| DocVQA_test | 93 | 93 | - | 94.5 | **95.7** | **97.8** | **95.6** |
|
49 |
+
| InfoVQA_test | 77.6 | - | - | 76.5 | **82.6** | **88.7** | **81.9** |
|
50 |
+
| ChartQA_test | 84.8 | - | - | 83.0 | **87.3** | **94.2** | **87.2** |
|
51 |
+
| TextVQA_val | 79.1 | 80.1 | - | 84.3 | **84.9** | **91.2** | **84.6** |
|
52 |
+
| OCRBench | 822 | 852 | 785 | 845 | **864** | **929.0** | **861** |
|
53 |
+
| CC_OCR | 57.7 | - | - | 61.6 | **77.8** | **83.7** | **77.3** |
|
54 |
+
| MMStar | 62.8 | - | - | 60.7 | **63.9** | **69.3** | **63.6** |
|
55 |
+
| MMBench-V1.1-En_test | 79.4 | 78.0 | 76.0 | 80.7 | **82.6** | **89.6** | **82.4** |
|
56 |
+
| MMT-Bench_test | - | - | - | 63.7 | **63.6** | **69.0** | **63.3** |
|
57 |
+
| MMStar | **61.5** | 57.5 | 54.8 | 60.7 | **63.9** | **69.2** | **63.6** |
|
58 |
+
| MMVet_GPT-4-Turbo | 54.2 | 60.0 | 66.9 | 62.0 | **67.1** | **73.3** | **67.2** |
|
59 |
+
| HallBench_avg | 45.2 | 48.1 | 46.1 | 50.6 | **52.9** | **58.0** | **52.5** |
|
60 |
+
| MathVista_testmini | 58.3 | 60.6 | 52.4 | 58.2 | **68.2** | **74.0** | **67.9** |
|
61 |
+
| MathVision | - | - | - | 16.3 | **25.07** | **27.7** | **24.8** |
|
|
|
62 |
|
63 |
---
|
64 |
|
65 |
+
### **Reasoning & Visual Understanding Benchmarks**
|
66 |
+
| Benchmark | # Shots | Metric | Llama 3.2 11B | Llama 3.2 90B | **Sapnous-MoE (Updated)** | **Sapnous-6B** |
|
67 |
+
|----------------------------|---------|--------------------------|--------------|--------------|-----------------|--------------|
|
68 |
+
| VQAv2 (val) | 0 | Accuracy | 66.8 | 73.6 | **80.3** | **74.1** |
|
69 |
+
| Text VQA (val) | 0 | Relaxed accuracy | 73.1 | 73.5 | **81.1** | **74.7** |
|
70 |
+
| DocVQA (val, unseen) | 0 | ANLS | 62.3 | 70.7 | **77.2** | **71.0** |
|
71 |
+
| MMMU (val, 0-shot) | 0 | Micro average accuracy | 41.7 | 49.3 | **55.4** | **49.2** |
|
72 |
+
| ChartQA (test) | 0 | Accuracy | 39.4 | 54.2 | **61.0** | **54.1** |
|
73 |
+
| InfographicsQA (val, unseen) | 0 | ANLS | 43.2 | 56.8 | **63.7** | **57.1** |
|
74 |
+
| AI2 Diagram (test) | 0 | Accuracy | 62.4 | 75.3 | **82.3** | **75.6** |
|
75 |
+
| MMMU (val, CoT) | 0 | Micro average accuracy | 50.7 | 60.3 | **66.5** | **60.6** |
|
76 |
+
| MMMU-Pro, Standard (10 opts, test) | 0 | Accuracy | 33.0 | 45.2 | **50.0** | **45.5** |
|
77 |
+
| MMMU-Pro, Vision (test) | 0 | Accuracy | 23.7 | 33.8 | **39.6** | **33.9** |
|
78 |
+
| MathVista (testmini) | 0 | Accuracy | 51.5 | 57.3 | **63.0** | **57.5** |
|
79 |
+
| ChartQA (test, CoT) | 0 | Relaxed accuracy | 83.4 | 85.5 | **93.3** | **86.0** |
|
80 |
+
| AI2 Diagram (test) | 0 | Accuracy | 91.1 | 92.3 | **100.9** | **93.5** |
|
81 |
+
| DocVQA (test) | 0 | ANLS | 88.4 | 90.1 | **98.9** | **91.3** |
|
82 |
+
| VQAv2 (test) | 0 | Accuracy | 75.2 | 78.1 | **86.0** | **79.0** |
|
83 |
+
| MMLU (CoT) | 0 | Macro_avg/acc | 73.0 | 86.0 | **94.3** | **87.0** |
|
84 |
+
| MATH (CoT) | 0 | Final_em | 51.9 | 68.0 | **75.2** | **68.5** |
|
85 |
+
| GPQA | 0 | Accuracy | 32.8 | 46.7 | **52.2** | **46.7** |
|
86 |
+
| MGSM (CoT) | 0 | em | 68.9 | 86.9 | **95.0** | **87.4** |
|
87 |
|
88 |
+
---
|
89 |
The model is distributed across 5 safetensors files for efficient loading and memory management. Each file contains specific layers and weights as documented in the model.safetensors.index.json.
|
90 |
|
91 |
## Usage
|
92 |
|
93 |
```python
|
94 |
+
from transformers import pipeline
|
95 |
+
import requests
|
96 |
+
from PIL import Image
|
97 |
+
from io import BytesIO
|
98 |
+
|
99 |
+
def process_image_from_url(image_url, text_prompt):
|
100 |
+
"""Processes an image from a URL using a Transformers pipeline."""
|
101 |
+
try:
|
102 |
+
# Fetch the image from the URL
|
103 |
+
response = requests.get(image_url, stream=True)
|
104 |
+
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
|
105 |
+
|
106 |
+
# Open the image using PIL
|
107 |
+
image = Image.open(BytesIO(response.content))
|
108 |
+
|
109 |
+
# Create the input for the pipeline
|
110 |
+
inputs = {"image": image, "text": text_prompt}
|
111 |
+
|
112 |
+
# Initialize the pipeline
|
113 |
+
pipe = pipeline("image-text-to-text", model="Sapnous-AI/Sapnous-VR-6B", trust_remote_code=True)
|
114 |
+
|
115 |
+
# Process the image and text
|
116 |
+
result = pipe(inputs)
|
117 |
+
return result
|
118 |
+
|
119 |
+
except requests.exceptions.RequestException as e:
|
120 |
+
print(f"Error fetching image: {e}")
|
121 |
+
return None
|
122 |
+
except Exception as e:
|
123 |
+
print(f"An error occurred: {e}")
|
124 |
+
return None
|
125 |
+
|
126 |
+
# Example usage
|
127 |
+
image_url = "example.com" #replace with your image url.
|
128 |
+
text_prompt = "What is in this image?"
|
129 |
+
|
130 |
+
result = process_image_from_url(image_url, text_prompt)
|
131 |
+
|
132 |
+
if result:
|
133 |
+
print(result)
|
134 |
+
|
135 |
```
|
136 |
|
137 |
## Model Capabilities
|