neuronpedia

Here are 5 public repositories matching this topic...

peppinob-ol / attribution-graph-probing

Automates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.

graph-analysis sparse-autoencoders mechanistic-interpretability llm-interpretability research-tooling circuit-tracing attribution-graphs probe-prompting prompt-probing neuronpedia feature-activation supernodes cross-layer-transcoder

Updated Jun 5, 2026
Jupyter Notebook

nulone / sae-consciousness-steering-pitfalls

Star

Reproducible case study of pitfalls in contrastive SAE discovery and steering for "consciousness" features (GemmaScope SAEs, Gemma 3 4B/12B): reconstruction confound, delta-steering fix, matched controls, and false-positive scaling law vs dataset size.

gemma sae sparse-autoencoder contrastive-learning mechanistic-interpretability feature-steering neuronpedia null-result gemmascope delta-steering

Updated Feb 26, 2026
Python

myregistercd / sae-consciousness-steering-pitfalls

Star

Explore limitations of contrastive SAE steering in identifying causal consciousness features and introduce delta-steering to improve experiment validity.

gemma sae sparse-autoencoder contrastive-learning mechanistic-interpretability feature-steering neuronpedia null-result gemmascope delta-steering