Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Abstract
The Guided by Gut (GG) framework enhances LLM reasoning efficiently using intrinsic signals and token-level confidence, outperforming PRM-based methods with faster inference and lower memory usage.
Test-Time Scaling (TTS) methods for enhancing Large Language Model (LLM) reasoning often incur substantial computational costs, primarily due to extensive reliance on external Process Reward Models (PRMs) or sampling methods like Best-of-N (BoN). This paper introduces Guided by Gut (GG), an efficient self-guided TTS framework that achieves PRM-level performance without costly external verifier models. Our method employs a lightweight tree search guided solely by intrinsic LLM signals, token-level confidence and step novelty. One critical innovation is improving the reliability of internal confidence estimates via a targeted reinforcement learning fine-tuning phase. Empirical evaluations on challenging mathematical reasoning benchmarks demonstrate that GG enables smaller models (e.g., 1.5B parameters) to achieve accuracy matching or surpassing significantly larger models (e.g., 32B-70B parameters), while reducing GPU memory usage by up to 10x. Compared to PRM-based methods, GG achieves comparable accuracy with 8x faster inference speeds and 4-5x lower memory usage. Additionally, GG reduces KV cache memory usage by approximately 50% compared to the BoN strategy, facilitating more efficient and practical deployment of TTS techniques.
Community
TL;DR: "Guided by Gut (GG)" is an efficient, PRM-free search method that boosts small LLMs (1.5B) to outperform much larger models (32B–70B). Leveraging GRPO-based reinforcement learning to calibrate internal confidence, GG enables efficient, fast, and better reasoning without costly external verifiers.📄✨
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier (2025)
- Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets (2025)
- Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers (2025)
- GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models (2025)
- GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2025)
- Temporal Sampling for Forgotten Reasoning in LLMs (2025)
- PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper