Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution
Abstract
Alita, a simplicity-driven generalist agent, achieves high performance across multiple benchmarks through minimal predefinition and self-evolution using task-related model context protocols.
Recent advances in large language models (LLMs) have enabled agents to autonomously perform complex, open-ended tasks. However, many existing frameworks depend heavily on manually predefined tools and workflows, which hinder their adaptability, scalability, and generalization across domains. In this work, we introduce Alita--a generalist agent designed with the principle of "Simplicity is the ultimate sophistication," enabling scalable agentic reasoning through minimal predefinition and maximal self-evolution. For minimal predefinition, Alita is equipped with only one component for direct problem-solving, making it much simpler and neater than previous approaches that relied heavily on hand-crafted, elaborate tools and workflows. This clean design enhances its potential to generalize to challenging questions, without being limited by tools. For Maximal self-evolution, we enable the creativity of Alita by providing a suite of general-purpose components to autonomously construct, refine, and reuse external capabilities by generating task-related model context protocols (MCPs) from open source, which contributes to scalable agentic reasoning. Notably, Alita achieves 75.15% pass@1 and 87.27% pass@3 accuracy, which is top-ranking among general-purpose agents, on the GAIA benchmark validation dataset, 74.00% and 52.00% pass@1, respectively, on Mathvista and PathVQA, outperforming many agent systems with far greater complexity. More details will be updated at https://github.com/CharlesQ9/Alita{https://github.com/CharlesQ9/Alita}.
Community
Alita achieves 75.15% pass@1 and 87.27% pass@3 accuracy, which is top-ranking among general-purpose agents, on the GAIA benchmark validation dataset.
The reliance on large-scale manually predefined tools and workflows introduces several critical limitations:
- Incomplete Coverage: It is impractical, if not impossible, to predefine all the tools required for the wide variety of real-world tasks an agent might encounter.
- Limited Creativity and Flexibility: Many complex tasks require agents to creatively compose new tools or leverage existing ones in novel ways, while pre-designed workflows and hardcoded components constrain this compositional flexibility and inhibit the development of adaptive behaviors.
- Mismatch: It is not always the case that the interface or environment of different tools is compatible with the agent. For example, many useful tools are not written in Python, which makes it difficult, though not entirely impossible, for them to be pre-connected to the mainstream agent frameworks that are primarily written in Python.
Together, these challenges ultimately hinder the scalability, adaptability, and generalization of existing generalist agents.
In contrast to the prevailing trend of growing complexity, we propose a radically simple design philosophy built on two principles:
- Minimal Predefinition: Equip the agent with only a minimal set of core capabilities, avoiding manually engineered components for specific tasks or modalities.
- Maximal Self-Evolution: Empower the agent to autonomously create, refine, and reuse external capabilities as needed.
We instantiate this vision through Alita, a generalist agent built with a single core capability (i.e., the web agent) and a small set of general-purpose modules that enable self-directed capability expansion. Specifically, we take advantage of the Model Context Protocols (MCPs), which is an open protocol that standardizes how different systems provide context to LLMs, and empower Alita to dynamically generate, adapt, and reuse MCPs based on the demands of each task rather than relying on static, predefined tools. This shift from manually designed capabilities to on-the-fly MCP construction unlocks a new path for building agents that are simple yet profoundly capable.
Alita-generated MCP Box has two benefits.
- Agent Distillation: The reuse of auto-generated MCPs can be viewed as a way of distillation, which is much cheaper and easier than traditional distillation.
Stronger Agent teaches Weaker Agent: These MCPs can be reused by other weaker agents and improve their performance since Alita, instead of human developers, designs a set of useful MCPs fit to GAIA by trial and error.
Agent with Larger LLMs teaches Agent with Smaller LLMs: These MCPs can also be reused by agents with smaller LLMs and significantly improve the performance. - Make Pass@1 approach Pass@N: The MCP Box can also be connected to Alita, and makes pass@1 approach Pass@N.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Agent Context Protocols Enhance Collective Inference (2025)
- Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution (2025)
- Multi-Agent Collaboration via Evolving Orchestration (2025)
- MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision (2025)
- Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution (2025)
- MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems (2025)
- PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Hello Team,
Thank you for the great work.
Did you submit the result to GAIA Leaderboard ?
https://huggingface.co/spaces/gaia-benchmark/leaderboard
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper