Inseq: An Interpretability Toolkit for Sequence Generation Models Paper • 2302.13942 • Published Feb 27, 2023 • 1
LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools Paper • 2401.12576 • Published Jan 23, 2024 • 2
Free-text Rationale Generation under Readability Level Control Paper • 2407.01384 • Published Jul 1, 2024
Do Large Language Models Latently Perform Multi-Hop Reasoning? Paper • 2402.16837 • Published Feb 26, 2024 • 30
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper • 2501.08319 • Published Jan 14 • 11
Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools Paper • 2108.13961 • Published Aug 31, 2021
Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods Paper • 2210.07222 • Published Oct 13, 2022
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations Paper • 2310.05592 • Published Oct 9, 2023
A Primer on the Inner Workings of Transformer-based Language Models Paper • 2405.00208 • Published Apr 30, 2024 • 10