MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
WavePulse: Real-time Content Analytics of Radio Livestreams Paper • 2412.17998 • Published Dec 23, 2024 • 11
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published Dec 16, 2024 • 44
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 107
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published Jan 8 • 54
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published Dec 31, 2024 • 31
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper • 2501.09751 • Published Jan 16 • 49
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published Jan 30 • 20
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published Feb 11 • 29
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published Feb 7 • 44
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published Feb 11 • 45
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published Feb 20 • 13
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 66
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 98
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 77
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs Paper • 2504.00072 • Published Mar 31 • 7
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 284
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning Paper • 2504.09081 • Published Apr 12 • 17
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published Apr 20 • 29
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper • 2505.19297 • Published 9 days ago • 73
PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models Paper • 2505.22523 • Published 6 days ago • 6
HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published 5 days ago • 39