Evaluation for Generative AI - a vincentkoc Collection

vincentkoc 's Collections

LLM Agent and Prompt Optimizers

Evaluation for Generative AI

Evaluation for Generative AI

updated 14 days ago

Papers and resources that are dealing with the evaluation of large language models and generative AI.

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 75
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Paper • 2501.14492 • Published Jan 24 • 34
vincentkoc/tiny_qa_benchmark

Viewer • Updated 14 days ago • 52 • 110 • 1
vincentkoc/tiny_qa_benchmark_pp

Viewer • Updated 14 days ago • 662 • 324 • 1
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation

Paper • 2505.12058 • Published 17 days ago • 6
tinyBenchmarks: evaluating LLMs with fewer examples

Paper • 2402.14992 • Published Feb 22, 2024 • 14