Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers Paper • 2505.04842 • Published 6 days ago • 12
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Paper • 2505.04588 • Published 6 days ago • 55
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published 13 days ago • 45
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Paper • 2505.01441 • Published 16 days ago • 35
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think Paper • 2504.20708 • Published 15 days ago • 22
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published 25 days ago • 122
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 29 days ago • 84
Iterative Self-Training for Code Generation via Reinforced Re-Ranking Paper • 2504.09643 • Published about 1 month ago • 34
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published 15 days ago • 35
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published 28 days ago • 60
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models Paper • 2504.04718 • Published Apr 7 • 41
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published 12 days ago • 24
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published 13 days ago • 22