--- base_model: - internlm/internlm2-chat-7b - OpenGVLab/InternViT-300M-448px ---

🐳 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

- Repository: https://github.com/OpenGVLab/OmniCorpus - Paper (ICLR 2025 Spotlight): https://arxiv.org/abs/2406.08418 # Citation ``` @inproceedings{li2024omnicorpus, title={OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text}, author={Li, Qingyun and Chen, Zhe and Wang, Weiyun and Wang, Wenhai and Ye, Shenglong and Jin, Zhenjiang and others}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025} } ```