Papers
arxiv:2505.19187

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

Published on May 25
· Submitted by YangXiao-nlp on May 29
Authors:
,
,
,
,
,

Abstract

A framework called PIR refines the importance of reasoning steps in large language models by pruning low-importance functional elements, leading to more concise reasoning chains with improved accuracy and reduced computational demands.

AI-generated summary

Large language models (LLMs) have demonstrated remarkable reasoning capabilities through test-time scaling approaches, particularly when fine-tuned with chain-of-thought (CoT) data distilled from more powerful large reasoning models (LRMs). However, these reasoning chains often contain verbose elements that mirror human problem-solving, categorized as progressive reasoning (the essential solution development path) and functional elements (verification processes, alternative solution approaches, and error corrections). While progressive reasoning is crucial, the functional elements significantly increase computational demands during test-time inference. We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step based on its impact on answer prediction confidence. PIR systematically identifies and selectively prunes only low-importance functional steps while preserving progressive reasoning components, creating optimized training data that maintains the integrity of the core solution path while reducing verbosity. Models fine-tuned on PIR-optimized data exhibit superior test-time scaling properties, generating more concise reasoning chains while achieving improved accuracy (+0.9\% to +6.6\%) with significantly reduced token usage (-3\% to -41\%) across challenging reasoning benchmarks (AIME, AMC, and GPQA Diamond). Our approach demonstrates strong generalizability across different model sizes, data sources, and token budgets, offering a practical solution for deploying reasoning-capable LLMs in scenarios where efficient test-time scaling, response time, and computational efficiency are valuable constraints.

Community

Paper author Paper submitter

Large Language Models (LLMs) have demonstrated impressive reasoning abilities through chain-of-thought (CoT) approaches, particularly when fine-tuned on high-quality reasoning data from more powerful Large Reasoning Models (LRMs). However, reasoning chains distilled from LRMs often contain numerous functional elements that, while mimicking human problem-solving processes, result in unnecessarily verbose outputs.

LIMOPro introduces PIR (Perplexity-based Importance Refinement), a novel framework that systematically refines reasoning chains to optimize the balance between efficiency and effectiveness. Our approach:

  1. Classifies functional patterns in reasoning chains into four distinct modes: progressive reasoning and three types of functional steps (verification, multi-method validation, and error correction)
  2. Quantitatively measures each functional step's contribution using the PIR metric, which evaluates answer perplexity changes when specific steps are removed
  3. Selectively removes low-importance functional steps while preserving the essential progressive reasoning chain

Models fine-tuned on PIR-optimized datasets maintain or enhance accuracy while significantly reducing response length compared to models trained on unrefined data, achieving up to 55% efficiency improvement across challenging reasoning benchmarks.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.19187 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.19187 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.19187 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.