Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making Paper • 2410.07166 • Published Oct 9, 2024 • 2
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Paper • 2504.13203 • Published 25 days ago • 31
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Paper • 2504.01005 • Published Apr 1 • 15
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Paper • 2504.01005 • Published Apr 1 • 15
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 23
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 23
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence Paper • 2503.05037 • Published Mar 6 • 4
STIV: Scalable Text and Image Conditioned Video Generation Paper • 2412.07730 • Published Dec 10, 2024 • 75