GraphicBench: A Planning Benchmark for Graphic Design with Language Agents Paper • 2504.11571 • Published 26 days ago • 1
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published 20 days ago • 19
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published 20 days ago • 19
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published 20 days ago • 19 • 4
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published 20 days ago • 19 • 4
GraphicBench: A Planning Benchmark for Graphic Design with Language Agents Paper • 2504.11571 • Published 26 days ago • 1
Exploring Expert Failures Improves LLM Agent Tuning Paper • 2504.13145 • Published 24 days ago • 12 • 4
Exploring Expert Failures Improves LLM Agent Tuning Paper • 2504.13145 • Published 24 days ago • 12 • 4
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Paper • 2410.21259 • Published Oct 28, 2024 • 1
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 45
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 45
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 45 • 4
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 45 • 4
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 45 • 4
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published Apr 7 • 10
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 27 days ago • 40