Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published about 1 month ago • 54
EgoLife Collection CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/ • 10 items • Updated Mar 7 • 17
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 44
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23 • 26
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published Nov 22, 2024 • 22
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published Dec 6, 2024 • 48
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22, 2024 • 17
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 76
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 42
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3, 2024 • 38
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published Oct 3, 2024 • 55
LLaVA-OneVision Collection a model good at arbitrary types of visual input • 15 items • Updated Oct 5, 2024 • 24
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 36