Evaluations - a delmaksym Collection

delmaksym 's Collections

Interpretability

Evaluations

updated Feb 19

True Detective: A Deep Abductive Reasoning Benchmark Undoable for GPT-3 and Challenging for GPT-4

Paper • 2212.10114 • Published Dec 20, 2022