InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 149
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Paper • 2502.18890 • Published Feb 26 • 30
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published 8 days ago • 82
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published 9 days ago • 24
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs Paper • 2505.19075 • Published 9 days ago • 21
Text2Grad: Reinforcement Learning from Natural Language Feedback Paper • 2505.22338 • Published 6 days ago • 6
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published 8 days ago • 99
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published 7 days ago • 90
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms Paper • 2505.20322 • Published 11 days ago • 14
VideoGameBench: Can Vision-Language Models complete popular video games? Paper • 2505.18134 • Published 11 days ago • 6
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution Paper • 2505.20286 • Published 8 days ago • 6
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI Paper • 2505.19443 • Published 9 days ago • 14
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction Paper • 2505.10887 • Published 18 days ago • 10