Do Large Language Models Latently Perform Multi-Hop Reasoning? Paper โข 2402.16837 โข Published Feb 26, 2024 โข 30
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper โข 2501.08319 โข Published Jan 14 โข 11
QE4PE: Word-level Quality Estimation for Human Post-Editing Paper โข 2503.03044 โข Published Mar 4 โข 6
A Primer on the Inner Workings of Transformer-based Language Models Paper โข 2405.00208 โข Published Apr 30, 2024 โข 10
๐ Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized โข 109 items โข Updated 3 days ago โข 99