Towards eliciting latent knowledge from LLMs with mechanistic interpretability Paper • 2505.14352 • Published 14 days ago • 9
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Paper • 2411.14257 • Published Nov 21, 2024 • 13
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9, 2024 • 40
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders Paper • 2407.14435 • Published Jul 19, 2024 • 7