Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published 12 days ago • 65
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published 29 days ago • 13
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 94
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Paper • 2501.14818 • Published Jan 20 • 5
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Paper • 2412.12075 • Published Dec 16, 2024 • 1
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation Paper • 2111.02394 • Published Nov 3, 2021 • 2
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published 12 days ago • 65
Image Inpainting for Irregular Holes Using Partial Convolutions Paper • 1804.07723 • Published Apr 20, 2018
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models Paper • 2305.10474 • Published May 17, 2023 • 1
DiffiT: Diffusion Vision Transformers for Image Generation Paper • 2312.02139 • Published Dec 4, 2023 • 16
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Paper • 2501.14818 • Published Jan 20 • 5
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Paper • 2503.14734 • Published Mar 18 • 1
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published 29 days ago • 13
Slow-Fast Architecture for Video Multi-Modal Large Language Models Paper • 2504.01328 • Published Apr 2 • 8
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published 12 days ago • 65
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 94
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 94
view post Post 2840 Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203)Critic-V has been accepted by CVPR2025!Bonus! VRI-160K uploaded now! di-zhang-fdu/R1-Vision-Reasoning-Instructions See translation 🔥 4 4 + Reply