Towards eliciting latent knowledge from LLMs with mechanistic interpretability Paper • 2505.14352 • Published 14 days ago • 9
Precise Parameter Localization for Textual Generation in Diffusion Models Paper • 2502.09935 • Published Feb 14 • 12
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces Paper • 2502.04959 • Published Feb 7 • 11
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders Paper • 2501.18052 • Published Jan 29 • 8