Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ydnysh 's Collections
Scaling Literature
The Deepseek AI Collection
Benchmarks and Evals

Benchmarks and Evals

updated Apr 4

Awesome Collection of Benchmarks and Evaluation Papers

Upvote
-

  • Measuring Massive Multitask Language Understanding

    Paper • 2009.03300 • Published Sep 7, 2020 • 3

  • MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

    Paper • 2406.01574 • Published Jun 3, 2024 • 47

  • GPQA: A Graduate-Level Google-Proof Q&A Benchmark

    Paper • 2311.12022 • Published Nov 20, 2023 • 31

  • HellaSwag: Can a Machine Really Finish Your Sentence?

    Paper • 1905.07830 • Published May 19, 2019 • 4
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs