Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published 9 days ago • 61
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models Paper • 2504.16074 • Published 15 days ago • 35