Generative AI Act II: Test Time Scaling Drives Cognition Engineering Paper • 2504.13828 • Published 19 days ago • 16
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18 • 17
Grounded Persuasive Language Generation for Automated Marketing Paper • 2502.16810 • Published Feb 24 • 12
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published Feb 11 • 49
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World Paper • 2412.17589 • Published Dec 23, 2024 • 12
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Paper • 2412.15204 • Published Dec 19, 2024 • 38
Measuring Mathematical Problem Solving With the MATH Dataset Paper • 2103.03874 • Published Mar 5, 2021 • 5
What Matters in Transformers? Not All Attention is Needed Paper • 2406.15786 • Published Jun 22, 2024 • 32
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 63
Data Contamination Report from the 2024 CONDA Shared Task Paper • 2407.21530 • Published Jul 31, 2024 • 10
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published Jun 18, 2024 • 14
Benchmarking Benchmark Leakage in Large Language Models Paper • 2404.18824 • Published Apr 29, 2024 • 6