MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 47
HellaSwag: Can a Machine Really Finish Your Sentence? Paper • 1905.07830 • Published May 19, 2019 • 4