ByteDance-Seed
/

BAGEL-7B-MoT

Any-to-Any

Bagel

Model card Files Files and versions Community

tsutikgiau commited on 9 days ago

Commit

f7da2be

verified ·

1 Parent(s): 12656ca

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -5

README.md CHANGED Viewed

@@ -2,6 +2,10 @@
 license: apache-2.0
 ---
 <div align="left" style="line-height: 1;">
   <a href="https://bagel-ai.org/" target="_blank" style="margin: 2px;">
     <img alt="Homepage" src="https://img.shields.io/badge/BAGEL-Homepage-a468fe?color=a468fe&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
@@ -9,18 +13,19 @@ license: apache-2.0
   <a href="https://github.com/ByteDance-Seed/BAGEL/blob/main/BAGEL-Technical-Report.pdf" target="_blank" style="margin: 2px;">
     <img alt="Technical Report" src="https://img.shields.io/badge/(upcoming)-Technical%20Report-brightgreen?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
   </a>
   <a href="https://github.com/bytedance-seed/BAGEL" target="_blank" style="margin: 2px;">
-      <img alt="Github" src="https://img.shields.io/badge/Github-Repo-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"/>
   </a>
 </div>
-# 🥯 BAGEL • Unified Model for Multimodal Understanding and Generation
 > We present **BAGEL**, an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data. BAGEL outperforms the current top‑tier open‑source VLMs like Qwen2.5-VL and InternVL-2.5 on standard multimodal understanding leaderboards, and delivers text‑to‑image quality that is competitive with strong specialist generators such as SD3.
 Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.
 Below is a showcase of BAGEL's qualitative performance.
 ## 📊 Benchmarks
@@ -40,11 +45,10 @@ Below is a showcase of BAGEL's qualitative performance.
 ### 3. Image Editing
 | Benchmark                | Step1X-Edit | Gemini-2-exp. | **BAGEL** | **BAGEL + CoT** |
 | ------------------------ | ----------: | ------------: | --------: | --------------: |
-| **GEdit-Bench-EN** (↑)   |        7.09 |             – |  **7.36** |               – |
 | **IntelligentBench** (↑) |        14.9 |          57.6 |      44.0 |        **55.3** |
 ## License
 BAGEL is licensed under the Apache 2.0 license. It is finetuned from [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), and uses the [FLUX.1-schnell VAE model](https://huggingface.co/black-forest-labs/FLUX.1-schnell) and the [siglip-so400m-14-980-flash-attn2-navit](https://huggingface.co/HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit) model, all under Apache 2.0.
 ## ✍️ Citation

 license: apache-2.0
 ---
+# 🥯 BAGEL • Unified Model for Multimodal Understanding and Generation
 <div align="left" style="line-height: 1;">
   <a href="https://bagel-ai.org/" target="_blank" style="margin: 2px;">
     <img alt="Homepage" src="https://img.shields.io/badge/BAGEL-Homepage-a468fe?color=a468fe&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
   <a href="https://github.com/ByteDance-Seed/BAGEL/blob/main/BAGEL-Technical-Report.pdf" target="_blank" style="margin: 2px;">
     <img alt="Technical Report" src="https://img.shields.io/badge/(upcoming)-Technical%20Report-brightgreen?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
   </a>
   <a href="https://github.com/bytedance-seed/BAGEL" target="_blank" style="margin: 2px;">
+      <img alt="Github" src="https://img.shields.io/badge/GitGub-Repo-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"/>
   </a>
 </div>
 > We present **BAGEL**, an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data. BAGEL outperforms the current top‑tier open‑source VLMs like Qwen2.5-VL and InternVL-2.5 on standard multimodal understanding leaderboards, and delivers text‑to‑image quality that is competitive with strong specialist generators such as SD3.
 Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.
 Below is a showcase of BAGEL's qualitative performance.
+This repository hosts the model weights for **BAGEL**.
+For installation, usage instructions, and further documentation, please visit our [GitHub repository](https://github.com/bytedance-seed/BAGEL).
 ## 📊 Benchmarks
 ### 3. Image Editing
 | Benchmark                | Step1X-Edit | Gemini-2-exp. | **BAGEL** | **BAGEL + CoT** |
 | ------------------------ | ----------: | ------------: | --------: | --------------: |
+| **GEdit-Bench-EN** (↑)   |    **6.70** |          6.32 |      6.52 |               – |
 | **IntelligentBench** (↑) |        14.9 |          57.6 |      44.0 |        **55.3** |
 ## License
 BAGEL is licensed under the Apache 2.0 license. It is finetuned from [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), and uses the [FLUX.1-schnell VAE model](https://huggingface.co/black-forest-labs/FLUX.1-schnell) and the [siglip-so400m-14-980-flash-attn2-navit](https://huggingface.co/HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit) model, all under Apache 2.0.
 ## ✍️ Citation