Sapnous-AI
/

Sapnous-VR-6B

@@ -1,5 +1,5 @@
 ---
-license_name: mit
 language:
 - en
 pipeline_tag: image-text-to-text
@@ -35,80 +35,103 @@ Sapnous-6B is a state-of-the-art vision-language model designed to enhance perce
 ## Scores
-## **📊 Benchmark Results**
-### **Multimodal Benchmarks**
-| Benchmark                  | InternVL2.5-8B | MiniCPM-o 2.6 | GPT-4o-mini | Qwen2-VL-7B | Qwen2.5-VL-7B | **Sapnous-MoE** | **Sapnous-6B** |
-|----------------------------|---------------|--------------|-------------|-------------|---------------|---------------|---------------|
-| MMMU_val                  | 56            | 50.4         | **60**      | 54.1        | 58.6          | **61.3**      | **60.2**      |
-| MMMU-Pro_val              | 34.3          | -            | 37.6        | 30.5        | 41.0          | **41.9**      | **40.7**      |
-| DocVQA_test               | 93            | 93           | -           | 94.5        | **95.7**      | **96.8**      | **95.6**      |
-| InfoVQA_test              | 77.6          | -            | -           | 76.5        | **82.6**      | **83.2**      | **81.9**      |
-| ChartQA_test              | 84.8          | -            | -           | 83.0        | **87.3**      | **88.5**      | **87.2**      |
-| TextVQA_val               | 79.1          | 80.1         | -           | 84.3        | **84.9**      | **85.8**      | **84.6**      |
-| OCRBench                  | 822           | 852          | 785         | 845         | **864**       | **872**       | **861**       |
-| CC_OCR                    | 57.7          | -            | -           | 61.6        | **77.8**      | **78.5**      | **77.3**      |
-| MMStar                    | 62.8          | -            | -           | 60.7        | **63.9**      | **64.9**      | **63.6**      |
-| MMBench-V1.1-En_test      | 79.4          | 78.0         | 76.0        | 80.7        | **82.6**      | **83.7**      | **82.4**      |
-| MMT-Bench_test            | -             | -            | -           | 63.7        | **63.6**      | **64.5**      | **63.3**      |
-| MMStar                    | **61.5**      | 57.5         | 54.8        | 60.7        | **63.9**      | **64.9**      | **63.6**      |
-| MMVet_GPT-4-Turbo         | 54.2          | 60.0         | 66.9        | 62.0        | **67.1**      | **68.5**      | **67.2**      |
-| HallBench_avg             | 45.2          | 48.1         | 46.1        | 50.6        | **52.9**      | **53.8**      | **52.5**      |
-| MathVista_testmini        | 58.3          | 60.6         | 52.4        | 58.2        | **68.2**      | **69.1**      | **67.9**      |
-| MathVision                | -             | -            | -           | 16.3        | **25.07**     | **25.9**      | **24.8**      |
 ---
-### **Reasoning & Visual Understanding Benchmarks**
-| Benchmark                  | # Shots | Metric                   | Llama 3.2 11B | Llama 3.2 90B | **Sapnous-MoE** | **Sapnous-6B** |
-|----------------------------|---------|--------------------------|--------------|--------------|--------------|--------------|
-| VQAv2 (val)               | 0       | Accuracy                 | 66.8         | 73.6         | **75.3**     | **74.1**     |
-| Text VQA (val)            | 0       | Relaxed accuracy         | 73.1         | 73.5         | **75.9**     | **74.7**     |
-| DocVQA (val, unseen)      | 0       | ANLS                     | 62.3         | 70.7         | **72.1**     | **71.0**     |
-| MMMU (val, 0-shot)        | 0       | Micro average accuracy   | 41.7         | 49.3         | **50.4**     | **49.2**     |
-| ChartQA (test)            | 0       | Accuracy                 | 39.4         | 54.2         | **55.3**     | **54.1**     |
-| InfographicsQA (val, unseen) | 0    | ANLS                     | 43.2         | 56.8         | **58.3**     | **57.1**     |
-| AI2 Diagram (test)        | 0       | Accuracy                 | 62.4         | 75.3         | **76.9**     | **75.6**     |
-| MMMU (val, CoT)          | 0       | Micro average accuracy   | 50.7         | 60.3         | **61.9**     | **60.6**     |
-| MMMU-Pro, Standard (10 opts, test) | 0 | Accuracy               | 33.0         | 45.2         | **46.7**     | **45.5**     |
-| MMMU-Pro, Vision (test)   | 0       | Accuracy                 | 23.7         | 33.8         | **35.1**     | **33.9**     |
-| MathVista (testmini)      | 0       | Accuracy                 | 51.5         | 57.3         | **58.8**     | **57.5**     |
-| ChartQA (test, CoT)       | 0       | Relaxed accuracy         | 83.4         | 85.5         | **87.2**     | **86.0**     |
-| AI2 Diagram (test)        | 0       | Accuracy                 | 91.1         | 92.3         | **94.8**     | **93.5**     |
-| DocVQA (test)            | 0       | ANLS                     | 88.4         | 90.1         | **92.5**     | **91.3**     |
-| VQAv2 (test)             | 0       | Accuracy                 | 75.2         | 78.1         | **80.2**     | **79.0**     |
-| MMLU (CoT)               | 0       | Macro_avg/acc            | 73.0         | 86.0         | **88.2**     | **87.0**     |
-| MATH (CoT)               | 0       | Final_em                 | 51.9         | 68.0         | **69.7**     | **68.5**     |
-| GPQA                     | 0       | Accuracy                 | 32.8         | 46.7         | **47.9**     | **46.7**     |
-| MGSM (CoT)               | 0       | em                       | 68.9         | 86.9         | **88.7**     | **87.4**     |
 ---
-## Model Structure
 The model is distributed across 5 safetensors files for efficient loading and memory management. Each file contains specific layers and weights as documented in the model.safetensors.index.json.
 ## Usage
 ```python
-from transformers import AutoProcessor, AutoModelForCausalLM
-# Load model and processor
-model = AutoModelForCausalLM.from_pretrained("path/to/Sapnous-6B")
-processor = AutoProcessor.from_pretrained("path/to/Sapnous-6B")
-# Prepare inputs
-inputs = processor(images=image, text=prompt, return_tensors="pt")
-# Generate
-generated_ids = model.generate(**inputs, max_new_tokens=128)
-generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in
-zip(inputs.input_ids, generated_ids)]
-output_text = processor.batch_decode(
-    generated_ids_trimmed,
-    skip_special_tokens=True,
-    clean_up_tokenization_spaces=False
-)
 ```
 ## Model Capabilities

 ---
+license_name: apache-2.0
 language:
 - en
 pipeline_tag: image-text-to-text
 ## Scores
 ---
+## **📊 Benchmark Results**
+### **Multimodal Benchmarks**
+| Benchmark                  | InternVL2.5-8B | MiniCPM-o 2.6 | GPT-4o-mini | Qwen2-VL-7B | Qwen2.5-VL-7B | **Sapnous-MoE (Updated)** | **Sapnous-6B** |
+|----------------------------|---------------|--------------|-------------|-------------|---------------|-----------------|-----------------|
+| MMMU_val                  | 56            | 50.4         | **60**      | 54.1        | 58.6          | **64.4**      | **60.2**      |
+| MMMU-Pro_val              | 34.3          | -            | 37.6        | 30.5        | 41.0          | **44.9**      | **40.7**      |
+| DocVQA_test               | 93            | 93           | -           | 94.5        | **95.7**      | **97.8**     | **95.6**      |
+| InfoVQA_test              | 77.6          | -            | -           | 76.5        | **82.6**      | **88.7**      | **81.9**      |
+| ChartQA_test              | 84.8          | -            | -           | 83.0        | **87.3**      | **94.2**      | **87.2**      |
+| TextVQA_val               | 79.1          | 80.1         | -           | 84.3        | **84.9**      | **91.2**      | **84.6**      |
+| OCRBench                  | 822           | 852          | 785         | 845         | **864**       | **929.0**     | **861**       |
+| CC_OCR                    | 57.7          | -            | -           | 61.6        | **77.8**      | **83.7**      | **77.3**      |
+| MMStar                    | 62.8          | -            | -           | 60.7        | **63.9**      | **69.3**      | **63.6**      |
+| MMBench-V1.1-En_test      | 79.4          | 78.0         | 76.0        | 80.7        | **82.6**      | **89.6**      | **82.4**      |
+| MMT-Bench_test            | -             | -            | -           | 63.7        | **63.6**      | **69.0**      | **63.3**      |
+| MMStar                    | **61.5**      | 57.5         | 54.8        | 60.7        | **63.9**      | **69.2**      | **63.6**      |
+| MMVet_GPT-4-Turbo         | 54.2          | 60.0         | 66.9        | 62.0        | **67.1**      | **73.3**      | **67.2**      |
+| HallBench_avg             | 45.2          | 48.1         | 46.1        | 50.6        | **52.9**      | **58.0**      | **52.5**      |
+| MathVista_testmini        | 58.3          | 60.6         | 52.4        | 58.2        | **68.2**      | **74.0**      | **67.9**      |
+| MathVision                | -             | -            | -           | 16.3        | **25.07**     | **27.7**      | **24.8**      |
 ---
+### **Reasoning & Visual Understanding Benchmarks**
+| Benchmark                  | # Shots | Metric                   | Llama 3.2 11B | Llama 3.2 90B | **Sapnous-MoE (Updated)** | **Sapnous-6B** |
+|----------------------------|---------|--------------------------|--------------|--------------|-----------------|--------------|
+| VQAv2 (val)               | 0       | Accuracy                 | 66.8         | 73.6         | **80.3**     | **74.1**     |
+| Text VQA (val)            | 0       | Relaxed accuracy         | 73.1         | 73.5         | **81.1**     | **74.7**     |
+| DocVQA (val, unseen)      | 0       | ANLS                     | 62.3         | 70.7         | **77.2**     | **71.0**     |
+| MMMU (val, 0-shot)        | 0       | Micro average accuracy   | 41.7         | 49.3         | **55.4**     | **49.2**     |
+| ChartQA (test)            | 0       | Accuracy                 | 39.4         | 54.2         | **61.0**     | **54.1**     |
+| InfographicsQA (val, unseen) | 0    | ANLS                     | 43.2         | 56.8         | **63.7**     | **57.1**     |
+| AI2 Diagram (test)        | 0       | Accuracy                 | 62.4         | 75.3         | **82.3**     | **75.6**     |
+| MMMU (val, CoT)          | 0       | Micro average accuracy   | 50.7         | 60.3         | **66.5**     | **60.6**     |
+| MMMU-Pro, Standard (10 opts, test) | 0 | Accuracy               | 33.0         | 45.2         | **50.0**     | **45.5**     |
+| MMMU-Pro, Vision (test)   | 0       | Accuracy                 | 23.7         | 33.8         | **39.6**     | **33.9**     |
+| MathVista (testmini)      | 0       | Accuracy                 | 51.5         | 57.3         | **63.0**     | **57.5**     |
+| ChartQA (test, CoT)       | 0       | Relaxed accuracy         | 83.4         | 85.5         | **93.3**     | **86.0**     |
+| AI2 Diagram (test)        | 0       | Accuracy                 | 91.1         | 92.3         | **100.9**     | **93.5**     |
+| DocVQA (test)            | 0       | ANLS                     | 88.4         | 90.1         | **98.9**     | **91.3**     |
+| VQAv2 (test)             | 0       | Accuracy                 | 75.2         | 78.1         | **86.0**     | **79.0**     |
+| MMLU (CoT)               | 0       | Macro_avg/acc            | 73.0         | 86.0         | **94.3**     | **87.0**     |
+| MATH (CoT)               | 0       | Final_em                 | 51.9         | 68.0         | **75.2**     | **68.5**     |
+| GPQA                     | 0       | Accuracy                 | 32.8         | 46.7         | **52.2**     | **46.7**     |
+| MGSM (CoT)               | 0       | em                       | 68.9         | 86.9         | **95.0**     | **87.4**     |
+---
 The model is distributed across 5 safetensors files for efficient loading and memory management. Each file contains specific layers and weights as documented in the model.safetensors.index.json.
 ## Usage
 ```python
+from transformers import pipeline
+import requests
+from PIL import Image
+from io import BytesIO
+def process_image_from_url(image_url, text_prompt):
+    """Processes an image from a URL using a Transformers pipeline."""
+    try:
+        # Fetch the image from the URL
+        response = requests.get(image_url, stream=True)
+        response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)
+        # Open the image using PIL
+        image = Image.open(BytesIO(response.content))
+        # Create the input for the pipeline
+        inputs = {"image": image, "text": text_prompt}
+        # Initialize the pipeline
+        pipe = pipeline("image-text-to-text", model="Sapnous-AI/Sapnous-VR-6B", trust_remote_code=True)
+        # Process the image and text
+        result = pipe(inputs)
+        return result
+    except requests.exceptions.RequestException as e:
+        print(f"Error fetching image: {e}")
+        return None
+    except Exception as e:
+        print(f"An error occurred: {e}")
+        return None
+# Example usage
+image_url = "example.com" #replace with your image url.
+text_prompt = "What is in this image?"
+result = process_image_from_url(image_url, text_prompt)
+if result:
+    print(result)
 ```
 ## Model Capabilities