Papers
arxiv:2505.20286

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

Published on May 26
· Submitted by ChrisJuan on May 28
Authors:
,
,
,
,
,
,
,
,

Abstract

Alita, a simplicity-driven generalist agent, achieves high performance across multiple benchmarks through minimal predefinition and self-evolution using task-related model context protocols.

AI-generated summary

Recent advances in large language models (LLMs) have enabled agents to autonomously perform complex, open-ended tasks. However, many existing frameworks depend heavily on manually predefined tools and workflows, which hinder their adaptability, scalability, and generalization across domains. In this work, we introduce Alita--a generalist agent designed with the principle of "Simplicity is the ultimate sophistication," enabling scalable agentic reasoning through minimal predefinition and maximal self-evolution. For minimal predefinition, Alita is equipped with only one component for direct problem-solving, making it much simpler and neater than previous approaches that relied heavily on hand-crafted, elaborate tools and workflows. This clean design enhances its potential to generalize to challenging questions, without being limited by tools. For Maximal self-evolution, we enable the creativity of Alita by providing a suite of general-purpose components to autonomously construct, refine, and reuse external capabilities by generating task-related model context protocols (MCPs) from open source, which contributes to scalable agentic reasoning. Notably, Alita achieves 75.15% pass@1 and 87.27% pass@3 accuracy, which is top-ranking among general-purpose agents, on the GAIA benchmark validation dataset, 74.00% and 52.00% pass@1, respectively, on Mathvista and PathVQA, outperforming many agent systems with far greater complexity. More details will be updated at https://github.com/CharlesQ9/Alita{https://github.com/CharlesQ9/Alita}.

Community

Paper author Paper submitter

Alita achieves 75.15% pass@1 and 87.27% pass@3 accuracy, which is top-ranking among general-purpose agents, on the GAIA benchmark validation dataset.

outcome.png

The reliance on large-scale manually predefined tools and workflows introduces several critical limitations:

  1. Incomplete Coverage: It is impractical, if not impossible, to predefine all the tools required for the wide variety of real-world tasks an agent might encounter.
  2. Limited Creativity and Flexibility: Many complex tasks require agents to creatively compose new tools or leverage existing ones in novel ways, while pre-designed workflows and hardcoded components constrain this compositional flexibility and inhibit the development of adaptive behaviors.
  3. Mismatch: It is not always the case that the interface or environment of different tools is compatible with the agent. For example, many useful tools are not written in Python, which makes it difficult, though not entirely impossible, for them to be pre-connected to the mainstream agent frameworks that are primarily written in Python.

Together, these challenges ultimately hinder the scalability, adaptability, and generalization of existing generalist agents.

WechatIMG1061.jpg

In contrast to the prevailing trend of growing complexity, we propose a radically simple design philosophy built on two principles:

  1. Minimal Predefinition: Equip the agent with only a minimal set of core capabilities, avoiding manually engineered components for specific tasks or modalities.
  2. Maximal Self-Evolution: Empower the agent to autonomously create, refine, and reuse external capabilities as needed.

We instantiate this vision through Alita, a generalist agent built with a single core capability (i.e., the web agent) and a small set of general-purpose modules that enable self-directed capability expansion. Specifically, we take advantage of the Model Context Protocols (MCPs), which is an open protocol that standardizes how different systems provide context to LLMs, and empower Alita to dynamically generate, adapt, and reuse MCPs based on the demands of each task rather than relying on static, predefined tools. This shift from manually designed capabilities to on-the-fly MCP construction unlocks a new path for building agents that are simple yet profoundly capable.

WechatIMG1059.jpg

Alita-generated MCP Box has two benefits.

  1. Agent Distillation: The reuse of auto-generated MCPs can be viewed as a way of distillation, which is much cheaper and easier than traditional distillation.
    Stronger Agent teaches Weaker Agent: These MCPs can be reused by other weaker agents and improve their performance since Alita, instead of human developers, designs a set of useful MCPs fit to GAIA by trial and error.
    Agent with Larger LLMs teaches Agent with Smaller LLMs: These MCPs can also be reused by agents with smaller LLMs and significantly improve the performance.
  2. Make Pass@1 approach Pass@N: The MCP Box can also be connected to Alita, and makes pass@1 approach Pass@N.

Screenshot 2025-05-27 at 6.12.09 PM.png

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Hello Team,
Thank you for the great work.
Did you submit the result to GAIA Leaderboard ?
https://huggingface.co/spaces/gaia-benchmark/leaderboard

·
Paper author

Hi,
Please refer to comment 10 and comment 13 on my github: https://github.com/CharlesQ9/Alita

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.20286 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.20286 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.20286 in a Space README.md to link it from this page.

Collections including this paper 1