MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Abstract
We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing benchmarks that primarily rely on static datasets or single-attempt evaluations, MLE-Dojo provides an interactive environment enabling agents to iteratively experiment, debug, and refine solutions through structured feedback loops. Built upon 200+ real-world Kaggle challenges, MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios such as data processing, architecture search, hyperparameter tuning, and code debugging. Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning, facilitating iterative experimentation, realistic data sampling, and real-time outcome verification. Extensive evaluations of eight frontier LLMs reveal that while current models achieve meaningful iterative improvements, they still exhibit significant limitations in autonomously generating long-horizon solutions and efficiently resolving complex errors. Furthermore, MLE-Dojo's flexible and extensible architecture seamlessly integrates diverse data sources, tools, and evaluation protocols, uniquely enabling model-based agent tuning and promoting interoperability, scalability, and reproducibility. We open-source our framework and benchmarks to foster community-driven innovation towards next-generation MLE agents.
Community
Introducing MLE-Dojo!๐ฅ๐ฅ
๐ Paper: https://arxiv.org/abs/2505.07782
โจ Code: https://github.com/MLE-Dojo/MLE-Dojo
MLE-Dojo is a Gym-style framework paving the ground for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Built upon 200+ real-world Kaggle challenges. MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic Machine Learning Engineering scenarios.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning (2025)
- MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? (2025)
- ActionStudio: A Lightweight Framework for Data and Training of Large Action Models (2025)
- Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors (2025)
- SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science (2025)
- APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay (2025)
- ReTool: Reinforcement Learning for Strategic Tool Use in LLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper