Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published about 23 hours ago • 82
Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published 6 days ago • 50
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models Paper • 2505.18536 • Published 10 days ago • 18
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published 15 days ago • 35
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Paper • 2504.13055 • Published Apr 17 • 19
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback Paper • 2505.06548 • Published 24 days ago • 30