Papers
arxiv:2505.18149

First Finish Search: Efficient Test-Time Scaling in Large Language Models

Published on May 23
ยท Submitted by aradhye on May 29
Authors:
,

Abstract

First Finish Search improves accuracy in large language models by stopping inference at the first completed sample, significantly outperforming other decoding strategies in reasoning tasks.

AI-generated summary

Test-time scaling (TTS), which involves dynamic allocation of compute during inference, offers a promising way to improve reasoning in large language models. While existing TTS methods work well, they often rely on long decoding paths or require a large number of samples to be generated, increasing the token usage and inference latency. We observe the surprising fact that for reasoning tasks, shorter traces are much more likely to be correct than longer ones. Motivated by this, we introduce First Finish Search (FFS), a training-free parallel decoding strategy that launches n independent samples and returns as soon as any one completes. We evaluate FFS alongside simple decoding, beam search, majority voting, and budget forcing on four reasoning models (DeepSeek-R1, R1-Distill-Qwen-32B, QwQ-32B and Phi-4-Reasoning-Plus) and across four datasets (AIME24, AIME25-I, AIME25-II and GPQA Diamond). With DeepSeek-R1, FFS achieves 82.23% accuracy on the AIME datasets, a 15% improvement over DeepSeek-R1's standalone accuracy, nearly matching OpenAI's o4-mini performance. Our theoretical analysis explains why stopping at the shortest trace is likely to yield a correct answer and identifies the conditions under which early stopping may be suboptimal. The elegance and simplicity of FFS demonstrate that straightforward TTS strategies can perform remarkably well, revealing the untapped potential of simple approaches at inference time.

Community

Paper author Paper submitter

๐Ÿ“ข New Paper Alert: First Finish Search โ€“ Efficient Test-Time Scaling in LLMs

We introduce First Finish Search (FFS), a simple yet surprisingly effective test-time decoding strategy for improving reasoning in large language models (LLMs). FFS launches multiple decoding paths in parallel and stops as soon as any one of them finishes, requiring no beam search or reranking.

๐Ÿ” Key Insights:

  • Shorter reasoning traces are often more accurate than longer ones.
  • FFS is training-free, parallelizable, and drastically reduces latency and token usage.
  • Achieves 82.23% accuracy on AIME datasets using DeepSeek-R1 โ€” a 15% gain over the base model, rivaling much larger models like o4-mini.

๐Ÿ“Š We benchmark FFS against beam search, majority voting, and budget forcing across 4 reasoning models and 4 challenging datasets (AIME24, AIME25-I/II, GPQA-Diamond).

๐Ÿง  Our theoretical analysis explains why stopping early often works, and when it might not.

๐Ÿ”— Read the paper: https://arxiv.org/abs/2505.18149

๐Ÿ‘ฅ Authors: Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty
๐Ÿ“ฌ Happy to discuss or collaborate! Feel free to reach out or ask questions.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.18149 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.18149 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.18149 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.