Any-to-Any
Bagel
tsutikgiau commited on
Commit
fc0030d
·
verified ·
1 Parent(s): 85d0fb4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -9
README.md CHANGED
@@ -6,17 +6,35 @@ license: apache-2.0
6
 
7
  # 🥯 BAGEL • Unified Model for Multimodal Understanding and Generation
8
 
9
- <div align="left" style="line-height: 1;">
10
- <a href="https://bagel-ai.org/" target="_blank" style="margin: 2px;">
11
- <img alt="Homepage" src="https://img.shields.io/badge/BAGEL-Homepage-a468fe?color=a468fe&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
 
 
 
 
 
12
  </a>
13
- <a href="https://github.com/ByteDance-Seed/BAGEL/blob/main/BAGEL-Technical-Report.pdf" target="_blank" style="margin: 2px;">
14
- <img alt="Technical Report" src="https://img.shields.io/badge/(upcoming)-Technical%20Report-brightgreen?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
 
 
 
15
  </a>
16
  <a href="https://github.com/bytedance-seed/BAGEL" target="_blank" style="margin: 2px;">
17
- <img alt="Github" src="https://img.shields.io/badge/GitGub-Repo-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"/>
 
 
 
18
  </a>
19
- </div>
 
 
 
 
 
 
 
20
 
21
  > We present **BAGEL**, an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data. BAGEL outperforms the current top‑tier open‑source VLMs like Qwen2.5-VL and InternVL-2.5 on standard multimodal understanding leaderboards, and delivers text‑to‑image quality that is competitive with strong specialist generators such as SD3.
22
  Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.
@@ -44,8 +62,10 @@ This repository hosts the model weights for **BAGEL**. For installation, usage i
44
  ### 3. Image Editing
45
  | Benchmark | Step1X-Edit | Gemini-2-exp. | **BAGEL** | **BAGEL + CoT** |
46
  | ------------------------ | ----------: | ------------: | --------: | --------------: |
47
- | **GEdit-Bench-EN** (↑) | **6.70** | 6.32 | 6.52 | – |
48
- | **IntelligentBench** (↑) | 14.9 | 57.6 | 44.0 | **55.3** |
 
 
49
 
50
  ## License
51
  BAGEL is licensed under the Apache 2.0 license. It is finetuned from [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and [siglip-so400m-14-980-flash-attn2-navit](https://huggingface.co/HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit) model, and uses the [FLUX.1-schnell VAE model](https://huggingface.co/black-forest-labs/FLUX.1-schnell), all under Apache 2.0.
 
6
 
7
  # 🥯 BAGEL • Unified Model for Multimodal Understanding and Generation
8
 
9
+
10
+
11
+ <p align="left">
12
+ <a href="https://bagel-ai.org/">
13
+ <img
14
+ src="https://img.shields.io/badge/BAGEL-Website-0A66C2?logo=safari&logoColor=white" style="display: inline-block; vertical-align: middle;"
15
+ alt="BAGEL Website"
16
+ />
17
  </a>
18
+ <a href="https://github.com/ByteDance-Seed/BAGEL/blob/main/BAGEL-Technical-Report.pdf">
19
+ <img
20
+ src="https://img.shields.io/badge/BAGEL-Paper-red?logo=arxiv&logoColor=red" style="display: inline-block; vertical-align: middle;"
21
+ alt="BAGEL Paper on arXiv"
22
+ />
23
  </a>
24
  <a href="https://github.com/bytedance-seed/BAGEL" target="_blank" style="margin: 2px;">
25
+ <img
26
+ alt="Github" src="https://img.shields.io/badge/BAGEL-Codebase-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"
27
+ alt="BAGEL Codebase"
28
+ />
29
  </a>
30
+ <a href="https://demo.bagel-ai.org/">
31
+ <img
32
+ src="https://img.shields.io/badge/BAGEL-Demo-blue?logo=googleplay&logoColor=white" style="display: inline-block; vertical-align: middle;"
33
+ alt="BAGEL Demo"
34
+ />
35
+ </a>
36
+ </p>
37
+
38
 
39
  > We present **BAGEL**, an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data. BAGEL outperforms the current top‑tier open‑source VLMs like Qwen2.5-VL and InternVL-2.5 on standard multimodal understanding leaderboards, and delivers text‑to‑image quality that is competitive with strong specialist generators such as SD3.
40
  Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.
 
62
  ### 3. Image Editing
63
  | Benchmark | Step1X-Edit | Gemini-2-exp. | **BAGEL** | **BAGEL + CoT** |
64
  | ------------------------ | ----------: | ------------: | --------: | --------------: |
65
+ | **GEdit-Bench-EN (SC)** (↑) | 7.09 | 6.73 | **7.36** | – |
66
+ | **GEdit-Bench-EN (PQ)** (↑) | 6.76 | 6.61 | **6.83** | – |
67
+ | **GEdit-Bench-EN (O)** (↑) | **6.70** | 6.32 | 6.52 | – |
68
+ | **IntelligentBench** (↑) | 14.9 | **57.6** | 44.0 | 55.3 |
69
 
70
  ## License
71
  BAGEL is licensed under the Apache 2.0 license. It is finetuned from [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and [siglip-so400m-14-980-flash-attn2-navit](https://huggingface.co/HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit) model, and uses the [FLUX.1-schnell VAE model](https://huggingface.co/black-forest-labs/FLUX.1-schnell), all under Apache 2.0.