zyxciss commited on
Commit
6312d75
·
verified ·
1 Parent(s): f18c45c

Updated Sapnous MoE Scores

Browse files
Files changed (1) hide show
  1. README.md +86 -63
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license_name: mit
3
  language:
4
  - en
5
  pipeline_tag: image-text-to-text
@@ -35,80 +35,103 @@ Sapnous-6B is a state-of-the-art vision-language model designed to enhance perce
35
 
36
  ## Scores
37
 
38
- ## **📊 Benchmark Results**
39
-
40
- ### **Multimodal Benchmarks**
41
- | Benchmark | InternVL2.5-8B | MiniCPM-o 2.6 | GPT-4o-mini | Qwen2-VL-7B | Qwen2.5-VL-7B | **Sapnous-MoE** | **Sapnous-6B** |
42
- |----------------------------|---------------|--------------|-------------|-------------|---------------|---------------|---------------|
43
- | MMMU_val | 56 | 50.4 | **60** | 54.1 | 58.6 | **61.3** | **60.2** |
44
- | MMMU-Pro_val | 34.3 | - | 37.6 | 30.5 | 41.0 | **41.9** | **40.7** |
45
- | DocVQA_test | 93 | 93 | - | 94.5 | **95.7** | **96.8** | **95.6** |
46
- | InfoVQA_test | 77.6 | - | - | 76.5 | **82.6** | **83.2** | **81.9** |
47
- | ChartQA_test | 84.8 | - | - | 83.0 | **87.3** | **88.5** | **87.2** |
48
- | TextVQA_val | 79.1 | 80.1 | - | 84.3 | **84.9** | **85.8** | **84.6** |
49
- | OCRBench | 822 | 852 | 785 | 845 | **864** | **872** | **861** |
50
- | CC_OCR | 57.7 | - | - | 61.6 | **77.8** | **78.5** | **77.3** |
51
- | MMStar | 62.8 | - | - | 60.7 | **63.9** | **64.9** | **63.6** |
52
- | MMBench-V1.1-En_test | 79.4 | 78.0 | 76.0 | 80.7 | **82.6** | **83.7** | **82.4** |
53
- | MMT-Bench_test | - | - | - | 63.7 | **63.6** | **64.5** | **63.3** |
54
- | MMStar | **61.5** | 57.5 | 54.8 | 60.7 | **63.9** | **64.9** | **63.6** |
55
- | MMVet_GPT-4-Turbo | 54.2 | 60.0 | 66.9 | 62.0 | **67.1** | **68.5** | **67.2** |
56
- | HallBench_avg | 45.2 | 48.1 | 46.1 | 50.6 | **52.9** | **53.8** | **52.5** |
57
- | MathVista_testmini | 58.3 | 60.6 | 52.4 | 58.2 | **68.2** | **69.1** | **67.9** |
58
- | MathVision | - | - | - | 16.3 | **25.07** | **25.9** | **24.8** |
59
 
60
  ---
61
 
62
- ### **Reasoning & Visual Understanding Benchmarks**
63
- | Benchmark | # Shots | Metric | Llama 3.2 11B | Llama 3.2 90B | **Sapnous-MoE** | **Sapnous-6B** |
64
- |----------------------------|---------|--------------------------|--------------|--------------|--------------|--------------|
65
- | VQAv2 (val) | 0 | Accuracy | 66.8 | 73.6 | **75.3** | **74.1** |
66
- | Text VQA (val) | 0 | Relaxed accuracy | 73.1 | 73.5 | **75.9** | **74.7** |
67
- | DocVQA (val, unseen) | 0 | ANLS | 62.3 | 70.7 | **72.1** | **71.0** |
68
- | MMMU (val, 0-shot) | 0 | Micro average accuracy | 41.7 | 49.3 | **50.4** | **49.2** |
69
- | ChartQA (test) | 0 | Accuracy | 39.4 | 54.2 | **55.3** | **54.1** |
70
- | InfographicsQA (val, unseen) | 0 | ANLS | 43.2 | 56.8 | **58.3** | **57.1** |
71
- | AI2 Diagram (test) | 0 | Accuracy | 62.4 | 75.3 | **76.9** | **75.6** |
72
- | MMMU (val, CoT) | 0 | Micro average accuracy | 50.7 | 60.3 | **61.9** | **60.6** |
73
- | MMMU-Pro, Standard (10 opts, test) | 0 | Accuracy | 33.0 | 45.2 | **46.7** | **45.5** |
74
- | MMMU-Pro, Vision (test) | 0 | Accuracy | 23.7 | 33.8 | **35.1** | **33.9** |
75
- | MathVista (testmini) | 0 | Accuracy | 51.5 | 57.3 | **58.8** | **57.5** |
76
- | ChartQA (test, CoT) | 0 | Relaxed accuracy | 83.4 | 85.5 | **87.2** | **86.0** |
77
- | AI2 Diagram (test) | 0 | Accuracy | 91.1 | 92.3 | **94.8** | **93.5** |
78
- | DocVQA (test) | 0 | ANLS | 88.4 | 90.1 | **92.5** | **91.3** |
79
- | VQAv2 (test) | 0 | Accuracy | 75.2 | 78.1 | **80.2** | **79.0** |
80
- | MMLU (CoT) | 0 | Macro_avg/acc | 73.0 | 86.0 | **88.2** | **87.0** |
81
- | MATH (CoT) | 0 | Final_em | 51.9 | 68.0 | **69.7** | **68.5** |
82
- | GPQA | 0 | Accuracy | 32.8 | 46.7 | **47.9** | **46.7** |
83
- | MGSM (CoT) | 0 | em | 68.9 | 86.9 | **88.7** | **87.4** |
84
 
85
  ---
86
 
87
- ## Model Structure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
 
89
  The model is distributed across 5 safetensors files for efficient loading and memory management. Each file contains specific layers and weights as documented in the model.safetensors.index.json.
90
 
91
  ## Usage
92
 
93
  ```python
94
- from transformers import AutoProcessor, AutoModelForCausalLM
95
-
96
- # Load model and processor
97
- model = AutoModelForCausalLM.from_pretrained("path/to/Sapnous-6B")
98
- processor = AutoProcessor.from_pretrained("path/to/Sapnous-6B")
99
-
100
- # Prepare inputs
101
- inputs = processor(images=image, text=prompt, return_tensors="pt")
102
-
103
- # Generate
104
- generated_ids = model.generate(**inputs, max_new_tokens=128)
105
- generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in
106
- zip(inputs.input_ids, generated_ids)]
107
- output_text = processor.batch_decode(
108
- generated_ids_trimmed,
109
- skip_special_tokens=True,
110
- clean_up_tokenization_spaces=False
111
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ```
113
 
114
  ## Model Capabilities
 
1
  ---
2
+ license_name: apache-2.0
3
  language:
4
  - en
5
  pipeline_tag: image-text-to-text
 
35
 
36
  ## Scores
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ---
40
 
41
+ ## **📊 Benchmark Results**
42
+
43
+ ### **Multimodal Benchmarks**
44
+ | Benchmark | InternVL2.5-8B | MiniCPM-o 2.6 | GPT-4o-mini | Qwen2-VL-7B | Qwen2.5-VL-7B | **Sapnous-MoE (Updated)** | **Sapnous-6B** |
45
+ |----------------------------|---------------|--------------|-------------|-------------|---------------|-----------------|-----------------|
46
+ | MMMU_val | 56 | 50.4 | **60** | 54.1 | 58.6 | **64.4** | **60.2** |
47
+ | MMMU-Pro_val | 34.3 | - | 37.6 | 30.5 | 41.0 | **44.9** | **40.7** |
48
+ | DocVQA_test | 93 | 93 | - | 94.5 | **95.7** | **97.8** | **95.6** |
49
+ | InfoVQA_test | 77.6 | - | - | 76.5 | **82.6** | **88.7** | **81.9** |
50
+ | ChartQA_test | 84.8 | - | - | 83.0 | **87.3** | **94.2** | **87.2** |
51
+ | TextVQA_val | 79.1 | 80.1 | - | 84.3 | **84.9** | **91.2** | **84.6** |
52
+ | OCRBench | 822 | 852 | 785 | 845 | **864** | **929.0** | **861** |
53
+ | CC_OCR | 57.7 | - | - | 61.6 | **77.8** | **83.7** | **77.3** |
54
+ | MMStar | 62.8 | - | - | 60.7 | **63.9** | **69.3** | **63.6** |
55
+ | MMBench-V1.1-En_test | 79.4 | 78.0 | 76.0 | 80.7 | **82.6** | **89.6** | **82.4** |
56
+ | MMT-Bench_test | - | - | - | 63.7 | **63.6** | **69.0** | **63.3** |
57
+ | MMStar | **61.5** | 57.5 | 54.8 | 60.7 | **63.9** | **69.2** | **63.6** |
58
+ | MMVet_GPT-4-Turbo | 54.2 | 60.0 | 66.9 | 62.0 | **67.1** | **73.3** | **67.2** |
59
+ | HallBench_avg | 45.2 | 48.1 | 46.1 | 50.6 | **52.9** | **58.0** | **52.5** |
60
+ | MathVista_testmini | 58.3 | 60.6 | 52.4 | 58.2 | **68.2** | **74.0** | **67.9** |
61
+ | MathVision | - | - | - | 16.3 | **25.07** | **27.7** | **24.8** |
 
62
 
63
  ---
64
 
65
+ ### **Reasoning & Visual Understanding Benchmarks**
66
+ | Benchmark | # Shots | Metric | Llama 3.2 11B | Llama 3.2 90B | **Sapnous-MoE (Updated)** | **Sapnous-6B** |
67
+ |----------------------------|---------|--------------------------|--------------|--------------|-----------------|--------------|
68
+ | VQAv2 (val) | 0 | Accuracy | 66.8 | 73.6 | **80.3** | **74.1** |
69
+ | Text VQA (val) | 0 | Relaxed accuracy | 73.1 | 73.5 | **81.1** | **74.7** |
70
+ | DocVQA (val, unseen) | 0 | ANLS | 62.3 | 70.7 | **77.2** | **71.0** |
71
+ | MMMU (val, 0-shot) | 0 | Micro average accuracy | 41.7 | 49.3 | **55.4** | **49.2** |
72
+ | ChartQA (test) | 0 | Accuracy | 39.4 | 54.2 | **61.0** | **54.1** |
73
+ | InfographicsQA (val, unseen) | 0 | ANLS | 43.2 | 56.8 | **63.7** | **57.1** |
74
+ | AI2 Diagram (test) | 0 | Accuracy | 62.4 | 75.3 | **82.3** | **75.6** |
75
+ | MMMU (val, CoT) | 0 | Micro average accuracy | 50.7 | 60.3 | **66.5** | **60.6** |
76
+ | MMMU-Pro, Standard (10 opts, test) | 0 | Accuracy | 33.0 | 45.2 | **50.0** | **45.5** |
77
+ | MMMU-Pro, Vision (test) | 0 | Accuracy | 23.7 | 33.8 | **39.6** | **33.9** |
78
+ | MathVista (testmini) | 0 | Accuracy | 51.5 | 57.3 | **63.0** | **57.5** |
79
+ | ChartQA (test, CoT) | 0 | Relaxed accuracy | 83.4 | 85.5 | **93.3** | **86.0** |
80
+ | AI2 Diagram (test) | 0 | Accuracy | 91.1 | 92.3 | **100.9** | **93.5** |
81
+ | DocVQA (test) | 0 | ANLS | 88.4 | 90.1 | **98.9** | **91.3** |
82
+ | VQAv2 (test) | 0 | Accuracy | 75.2 | 78.1 | **86.0** | **79.0** |
83
+ | MMLU (CoT) | 0 | Macro_avg/acc | 73.0 | 86.0 | **94.3** | **87.0** |
84
+ | MATH (CoT) | 0 | Final_em | 51.9 | 68.0 | **75.2** | **68.5** |
85
+ | GPQA | 0 | Accuracy | 32.8 | 46.7 | **52.2** | **46.7** |
86
+ | MGSM (CoT) | 0 | em | 68.9 | 86.9 | **95.0** | **87.4** |
87
 
88
+ ---
89
  The model is distributed across 5 safetensors files for efficient loading and memory management. Each file contains specific layers and weights as documented in the model.safetensors.index.json.
90
 
91
  ## Usage
92
 
93
  ```python
94
+ from transformers import pipeline
95
+ import requests
96
+ from PIL import Image
97
+ from io import BytesIO
98
+
99
+ def process_image_from_url(image_url, text_prompt):
100
+ """Processes an image from a URL using a Transformers pipeline."""
101
+ try:
102
+ # Fetch the image from the URL
103
+ response = requests.get(image_url, stream=True)
104
+ response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
105
+
106
+ # Open the image using PIL
107
+ image = Image.open(BytesIO(response.content))
108
+
109
+ # Create the input for the pipeline
110
+ inputs = {"image": image, "text": text_prompt}
111
+
112
+ # Initialize the pipeline
113
+ pipe = pipeline("image-text-to-text", model="Sapnous-AI/Sapnous-VR-6B", trust_remote_code=True)
114
+
115
+ # Process the image and text
116
+ result = pipe(inputs)
117
+ return result
118
+
119
+ except requests.exceptions.RequestException as e:
120
+ print(f"Error fetching image: {e}")
121
+ return None
122
+ except Exception as e:
123
+ print(f"An error occurred: {e}")
124
+ return None
125
+
126
+ # Example usage
127
+ image_url = "example.com" #replace with your image url.
128
+ text_prompt = "What is in this image?"
129
+
130
+ result = process_image_from_url(image_url, text_prompt)
131
+
132
+ if result:
133
+ print(result)
134
+
135
  ```
136
 
137
  ## Model Capabilities