Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published 6 days ago • 50
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining Paper • 2410.00564 • Published Oct 1, 2024 • 1
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Paper • 2504.15275 • Published Apr 21 • 1
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining Paper • 2410.00564 • Published Oct 1, 2024 • 1
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published 6 days ago • 109
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Paper • 2504.15275 • Published Apr 21 • 1
PURE Collection PRM and fine-tuned LLM used in our PURE github repo: https://github.com/CJReinforce/PURE • 5 items • Updated 12 days ago • 2
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published Mar 31 • 62
PURE Collection PRM and fine-tuned LLM used in our PURE github repo: https://github.com/CJReinforce/PURE • 5 items • Updated 12 days ago • 2