Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
vincentkoc 's Collections
LLM Agent and Prompt Optimizers
Tiny Datasets
Evaluation for Generative AI

Evaluation for Generative AI

updated 14 days ago

Papers and resources that are dealing with the evaluation of large language models and generative AI.

Upvote
1

  • Humanity's Last Exam

    Paper • 2501.14249 • Published Jan 24 • 75

  • RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

    Paper • 2501.14492 • Published Jan 24 • 34

  • vincentkoc/tiny_qa_benchmark

    Viewer • Updated 14 days ago • 52 • 110 • 1

  • vincentkoc/tiny_qa_benchmark_pp

    Viewer • Updated 14 days ago • 662 • 324 • 1

  • Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation

    Paper • 2505.12058 • Published 17 days ago • 6

  • tinyBenchmarks: evaluating LLMs with fewer examples

    Paper • 2402.14992 • Published Feb 22, 2024 • 14
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs