Official PyTorch implementation of "AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features" (ICLR 2026). Trains ReLU, JumpReLU, TopK, Gated and AbsTopK SAEs for LLM interpretability.
pytorch dictionary-learning sparse-autoencoders sae mechanistic-interpretability llm-interpretability iclr-2026 abstopk
-
Updated
Jun 16, 2026 - Python