T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published 2 days ago • 25
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning Paper • 2504.14509 • Published 13 days ago • 49
I-Con: A Unifying Framework for Representation Learning Paper • 2504.16929 • Published 10 days ago • 29
FlowReasoner: Reinforcing Query-Level Meta-Agents Paper • 2504.15257 • Published 12 days ago • 46
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search Paper • 2504.09130 • Published 21 days ago • 12
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 19 days ago • 252
STEVE: AStep Verification Pipeline for Computer-use Agent Training Paper • 2503.12532 • Published Mar 16 • 15
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16 • 65
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 36
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published Mar 13 • 17
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5 • 232
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published Mar 10 • 68
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published Feb 14 • 35