Automates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.
-
Updated
Jun 5, 2026 - Jupyter Notebook
Automates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.
Reproducible case study of pitfalls in contrastive SAE discovery and steering for "consciousness" features (GemmaScope SAEs, Gemma 3 4B/12B): reconstruction confound, delta-steering fix, matched controls, and false-positive scaling law vs dataset size.
Explore limitations of contrastive SAE steering in identifying causal consciousness features and introduce delta-steering to improve experiment validity.
Finding SAE features in Gemma 3 vision-language model — autonomous AI vs human-guided AI comparison using GemmaScope 2
Use GemmaScope and Neuronpedia features to steer Gemma 3 4B IT locally
Add a description, image, and links to the neuronpedia topic page so that developers can more easily learn about it.
To associate your repository with the neuronpedia topic, visit your repo's landing page and select "manage topics."