Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Abstract
Reinforcement Learning with turn-level credit assignment enhances Large Language Model reasoning capabilities in multi-turn tool-use scenarios.
This paper investigates approaches to enhance the reasoning capabilities of Large Language Model (LLM) agents using Reinforcement Learning (RL). Specifically, we focus on multi-turn tool-use scenarios, which can be naturally modeled as Markov Decision Processes (MDPs). While existing approaches often train multi-turn LLM agents with trajectory-level advantage estimation in bandit settings, they struggle with turn-level credit assignment across multiple decision steps, limiting their performance on multi-turn reasoning tasks. To address this, we introduce a fine-grained turn-level advantage estimation strategy to enable more precise credit assignment in multi-turn agent interactions. The strategy is general and can be incorporated into various RL algorithms such as Group Relative Preference Optimization (GRPO). Our experimental evaluation on multi-turn reasoning and search-based tool-use tasks with GRPO implementations highlights the effectiveness of the MDP framework and the turn-level credit assignment in advancing the multi-turn reasoning capabilities of LLM agents in complex decision-making settings. Our method achieves 100% success in tool execution and 50% accuracy in exact answer matching, significantly outperforming baselines, which fail to invoke tools and achieve only 20-30% exact match accuracy.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning (2025)
- ToolRL: Reward is All Tool Learning Needs (2025)
- Group-in-Group Policy Optimization for LLM Agent Training (2025)
- LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization (2025)
- Interleaved Reasoning for Large Language Models via Reinforcement Learning (2025)
- RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning (2025)
- An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper