Linguini: A benchmark for language-agnostic linguistic reasoning Paper • 2409.12126 • Published Sep 18, 2024
LCFO: Long Context and Long Form Output Dataset and Benchmarking Paper • 2412.08268 • Published Dec 11, 2024
Large Concept Models: Language Modeling in a Sentence Representation Space Paper • 2412.08821 • Published Dec 11, 2024 • 14
BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation Paper • 2502.04314 • Published Feb 6
Cluster and Predict Latents Patches for Improved Masked Image Modeling Paper • 2502.08769 • Published Feb 12 • 4
CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text Paper • 1908.06177 • Published Aug 16, 2019
Learning an Unreferenced Metric for Online Dialogue Evaluation Paper • 2005.00583 • Published May 1, 2020
How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts Paper • 2205.10762 • Published May 22, 2022