🐳 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

---
base_model:
- internlm/internlm2-chat-7b
- OpenGVLab/InternViT-300M-448px
---

<p align="center">
  <h1 align="center">🐳 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text</h1>
</p>

- Repository: https://github.com/OpenGVLab/OmniCorpus
- Paper (ICLR 2025 Spotlight): https://arxiv.org/abs/2406.08418

# Citation

```
@inproceedings{li2024omnicorpus,
  title={OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text},
  author={Li, Qingyun and Chen, Zhe and Wang, Weiyun and Wang, Wenhai and Ye, Shenglong and Jin, Zhenjiang and others},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}
```