arxiv:2505.19187

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

Published on May 25

· Submitted by

YangXiao-nlp on May 29

Upvote

Authors:

Yang Xiao ,

Abstract

A framework called PIR refines the importance of reasoning steps in large language models by pruning low-importance functional elements, leading to more concise reasoning chains with improved accuracy and reduced computational demands.

AI-generated summary

Large language models (LLMs) have demonstrated remarkable reasoning capabilities through test-time scaling approaches, particularly when fine-tuned with chain-of-thought (CoT) data distilled from more powerful large reasoning models (LRMs). However, these reasoning chains often contain verbose elements that mirror human problem-solving, categorized as progressive reasoning (the essential solution development path) and functional elements (verification processes, alternative solution approaches, and error corrections). While progressive reasoning is crucial, the functional elements significantly increase computational demands during test-time inference. We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step based on its impact on answer prediction confidence. PIR systematically identifies and selectively prunes only low-importance functional steps while preserving progressive reasoning components, creating optimized training data that maintains the integrity of the core solution path while reducing verbosity. Models fine-tuned on PIR-optimized data exhibit superior test-time scaling properties, generating more concise reasoning chains while achieving improved accuracy (+0.9\% to +6.6\%) with significantly reduced token usage (-3\% to -41\%) across challenging reasoning benchmarks (AIME, AMC, and GPQA Diamond). Our approach demonstrates strong generalizability across different model sizes, data sources, and token budgets, offering a practical solution for deploying reasoning-capable LLMs in scenarios where efficient test-time scaling, response time, and computational efficiency are valuable constraints.

View arXiv page View PDF GitHub repository Add to collection

Community

YangXiao-nlp

Paper author Paper submitter 4 days ago

Large Language Models (LLMs) have demonstrated impressive reasoning abilities through chain-of-thought (CoT) approaches, particularly when fine-tuned on high-quality reasoning data from more powerful Large Reasoning Models (LRMs). However, reasoning chains distilled from LRMs often contain numerous functional elements that, while mimicking human problem-solving processes, result in unnecessarily verbose outputs.

LIMOPro introduces PIR (Perplexity-based Importance Refinement), a novel framework that systematically refines reasoning chains to optimize the balance between efficiency and effectiveness. Our approach:

Classifies functional patterns in reasoning chains into four distinct modes: progressive reasoning and three types of functional steps (verification, multi-method validation, and error correction)
Quantitatively measures each functional step's contribution using the PIR metric, which evaluates answer perplexity changes when specific steps are removed
Selectively removes low-importance functional steps while preserving the essential progressive reasoning chain

Models fine-tuned on PIR-optimized datasets maintain or enhance accuracy while significantly reducing response length compared to models trained on unrefined data, achieving up to 55% efficiency improvement across challenging reasoning benchmarks.