LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Abstract
A framework called PIR refines the importance of reasoning steps in large language models by pruning low-importance functional elements, leading to more concise reasoning chains with improved accuracy and reduced computational demands.
Large language models (LLMs) have demonstrated remarkable reasoning capabilities through test-time scaling approaches, particularly when fine-tuned with chain-of-thought (CoT) data distilled from more powerful large reasoning models (LRMs). However, these reasoning chains often contain verbose elements that mirror human problem-solving, categorized as progressive reasoning (the essential solution development path) and functional elements (verification processes, alternative solution approaches, and error corrections). While progressive reasoning is crucial, the functional elements significantly increase computational demands during test-time inference. We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step based on its impact on answer prediction confidence. PIR systematically identifies and selectively prunes only low-importance functional steps while preserving progressive reasoning components, creating optimized training data that maintains the integrity of the core solution path while reducing verbosity. Models fine-tuned on PIR-optimized data exhibit superior test-time scaling properties, generating more concise reasoning chains while achieving improved accuracy (+0.9\% to +6.6\%) with significantly reduced token usage (-3\% to -41\%) across challenging reasoning benchmarks (AIME, AMC, and GPQA Diamond). Our approach demonstrates strong generalizability across different model sizes, data sources, and token budgets, offering a practical solution for deploying reasoning-capable LLMs in scenarios where efficient test-time scaling, response time, and computational efficiency are valuable constraints.
Community
Large Language Models (LLMs) have demonstrated impressive reasoning abilities through chain-of-thought (CoT) approaches, particularly when fine-tuned on high-quality reasoning data from more powerful Large Reasoning Models (LRMs). However, reasoning chains distilled from LRMs often contain numerous functional elements that, while mimicking human problem-solving processes, result in unnecessarily verbose outputs.
LIMOPro introduces PIR (Perplexity-based Importance Refinement), a novel framework that systematically refines reasoning chains to optimize the balance between efficiency and effectiveness. Our approach:
- Classifies functional patterns in reasoning chains into four distinct modes: progressive reasoning and three types of functional steps (verification, multi-method validation, and error correction)
- Quantitatively measures each functional step's contribution using the PIR metric, which evaluates answer perplexity changes when specific steps are removed
- Selectively removes low-importance functional steps while preserving the essential progressive reasoning chain
Models fine-tuned on PIR-optimized datasets maintain or enhance accuracy while significantly reducing response length compared to models trained on unrefined data, achieving up to 55% efficiency improvement across challenging reasoning benchmarks.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling (2025)
- Scalable Chain of Thoughts via Elastic Reasoning (2025)
- Dynamic Early Exit in Reasoning Models (2025)
- Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning (2025)
- Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier (2025)
- Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning (2025)
- Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper