Skip to content

[DRAFT] Evo 2 SAE feature explorer — visualization mockup#1582

Draft
polinabinder1 wants to merge 13 commits into
NVIDIA-BioNeMo:mainfrom
polinabinder1:evo2-sae-dashboard
Draft

[DRAFT] Evo 2 SAE feature explorer — visualization mockup#1582
polinabinder1 wants to merge 13 commits into
NVIDIA-BioNeMo:mainfrom
polinabinder1:evo2-sae-dashboard

Conversation

@polinabinder1
Copy link
Copy Markdown
Collaborator

@polinabinder1 polinabinder1 commented May 26, 2026

How to use

Prerequisites — install Node.js + npm

Check if you already have it:

node --version && npm --version

If either prints "command not found", install via one of:

Ubuntu / Debian / dev containers (most Lepton-style pods):

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash -
sudo apt-get install -y nodejs        # ships npm alongside

macOS (Homebrew):

brew install node

Any OS, version-manager route (nvm — recommended if you juggle Node versions):

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc       # or restart your shell
nvm install --lts

You need Node 18+; this project is tested on Node 20.

Run the dashboard

cd bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup
npm install            # one-time, ~30 s
npm run dev            # starts Vite on http://localhost:5176

Then open http://localhost:5176/#preview — the #preview hash route surfaces the tabbed mockup; the bare / URL still renders the unchanged main dashboard.

⚠️ Mockup — synthetic data, not a real SAE result

This PR ships a demo-only visualization shell at bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/evo2_dashboard_mockup/. There is no real SAE inference involved. Everything you see in the dashboard is generated by scripts/make_mockup_features.py from a fixed seed.

A yellow MOCKUP — synthetic data, not from a real SAE run banner is rendered at the top of every page so nobody mistakes it for real model output.

Summary

This branch contains a multi-tab visualization mockup for the evo2 SAE feature explorer. The latest commit adds:

  • Steering mode toggle (Position-restricted / Global all positions). In Global mode the per-position bar chart smears toward a low-confidence distribution scaled by |clamp|, and the "FLIPPED" badge becomes "no clean flip — degraded".
  • SequenceStrip that renders only in Global mode: baseline vs steered argmax across the entire seed sequence, with flipped positions highlighted on a red background. Makes the "global clamp degrades everywhere" point visible at a glance, without needing per-position simulation data.
  • Scrubbed external author / paper / lab names from UI copy, the MOCKUP banner, the tab label, and stale code comments.

Earlier commits ship the SAE summary table, feature catalog, UMAP atlas, gene-feature G.bin matrix, and the first version of the steering tab.

Test plan

  • npm run dev starts cleanly, no console errors at #preview
  • Main tab still renders the feature catalog + atlas + WebLogos
  • Toggle Position-restricted ↔ Global; verify the Neighbors control disables in Global mode
  • Drag clamp slider: bar chart smears progressively in Global mode; SequenceStrip flip-fraction increases with |clamp|
  • Switch seeds and features; SequenceStrip is deterministic per (seed, feature, clamp)

🤖 Generated with Claude Code

polinabinder1 and others added 6 commits May 21, 2026 00:42
torch 2.6 changed the default of `weights_only` to True. The Savanna
checkpoint pickle includes numpy globals (`numpy.core.multiarray._reconstruct`),
which the safer loader rejects. The converter then exits 0 with no output
written and the error gets buried in stderr — silent failure.

The Savanna repos under arcinstitute/* are trusted sources, so load with
weights_only=False.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing esm2 / codonfm SAE recipes. Pipeline:

  chunk -> convert (Savanna->MBridge) -> predict_evo2 -> pt_to_parquet -> train

Differences from esm2/codonfm are forced by Evo2 specifics:
  - Hyena/Megatron-Core model, no HF AutoModel path => reuses the
    existing `predict_evo2` CLI for inference instead of writing
    a custom extract.py
  - `pt_to_parquet.py` shim bridges predict_evo2's .pt output to
    the universal `sae.activation_store` parquet contract
  - `chunk_fasta.py` preprocessor keeps inputs within the model's
    trained context length (8192 bp for 1B); Hyena fftconv OOMs
    on long sequences even at micro-batch=1
  - `train.py` is the same as codonfm's, copied verbatim per
    bionemo-recipes' KISS-over-DRY convention

Validated end-to-end on 100 organelle sequences (Evo2 1B layer 12):
loss 0.67 -> 0.045, FVU 0.90 -> 0.10, var_exp 0.10 -> 0.90, 2m14s wall.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The recipe currently has no model-specific Python module — the extractor
is upstream (`predict_evo2`) and the two scripts are simple CLIs in
scripts/. Drop the empty package and adjust pyproject.toml so setuptools
doesn't try to discover anything. Will reintroduce when there's actual
library code to put there (eval, dashboard, dataloaders).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fork of recipes/codonfm/codon_dashboard adapted for DNA + Evo 2,
populated with synthetic data. Demo-able artifact, not a real result.

What's here:
  - scripts/make_mockup_features.py: deterministic synthetic data generator
    (seed 42). Writes features_atlas.parquet, feature_metadata.parquet,
    feature_examples.parquet to evo2_dashboard_mockup/public/. Fixtures
    are committed for one-step npm-only setup.
  - evo2_dashboard_mockup/: Vite/React SPA forked from codon_dashboard
    with these swaps:
      * Removed molstar dep + MolstarThumbnail.jsx
      * Renamed ProteinSequence.jsx -> SequenceView.jsx; per-base
        rendering (no codon framing, no AA translation)
      * Renamed ProteinDetailModal.jsx -> RegionDetailModal.jsx;
        UniProt content swapped for genomic-region content
      * utils.js: getRegionLabel + parseBases (replacing
        getAccession/uniprotUrl/parseCodons/codonToAA)
      * MOCKUP banner at top of App
      * "Evo 2 SAE Feature Explorer (Mockup)" title
  - v2 roadmap placeholders (greyed em-dashes with hover tooltips):
      * FeatureCard: Annotation, Sensitivity, Recon Δ stats
      * FeatureDetailPage: Annotations, Conservation sections

Quick start: cd evo2_dashboard_mockup && npm install && npm run dev

The synthetic data schema is the contract the future real eval pipeline
will need to target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed features

Three changes on top of the initial mockup commit:

1. Drop codonfm-specific scaffolding from forked components.
   - .gitignore the auto-generated package-lock.json (regenerates on `npm install`)
   - FeatureCard.jsx: 793 -> 508 lines. Removed dead stat tiles (Hi-Score,
     Variant/Site/Local deltas, ClinVar, PhyloP, GC, Trinuc/Gene entropy),
     codonfm vocab-logits chart, codonfm GSEA tags, codonfm CSV export
     sections — all conditional on fields our synthetic data doesn't provide.
   - FeatureDetailPage.jsx: 522 -> 187 lines. Replaced codonfm-specific
     VocabLogitChart / CodonAnnotations / FeatureMetrics components with a
     simpler DNA-friendly detail view.

2. Refine the synthetic feature set.
   - 11 labeled DNA-native features in 3 thematic UMAP clusters:
     * eukaryotic regulatory (TATA box, polyA signal, CpG island,
       splice donor, splice acceptor)
     * bacterial regulatory (-10 box, -35 box, Shine-Dalgarno)
     * codon context (start ATG, stop TAA, stop TAG)
   - 9 unlabeled features in a 4th diffuse cluster (label=NULL,
     db_source=NULL) — mimics the realistic case where most SAE
     features are uninterpreted.
   - New `db_source` column on each feature (RefSeq / JASPAR-ENCODE /
     bacterial annotation / RefSeq UTR / ENCODE-RefSeq / NULL).

3. Bug fixes for cross-pod port-forward demo:
   - App.jsx defaults: `selectedCategory` and `histMetric3` were
     hardcoded to codonfm's `mean_variant_1bcdwt` column, which doesn't
     exist in our atlas and threw Binder errors. Switched to `cluster_id`.
   - Atlas column rename: `cluster` -> `cluster_id` to match what
     App.jsx queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: decdc87f-5a62-4d55-bcbe-69b2d67983ee

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

polinabinder1 and others added 7 commits May 27, 2026 18:02
Each of the 11 labeled features now ships with a PWM-driven sequence
logo rendered into public/logos/feature_{id}.png by logomaker. Central
signatures are spec'd per label (Kozak ATG, TATA, polyA, CpG, Shine-
Dalgarno, bacterial -10/-35, splice donor/acceptor, stop TAA/TAG);
flanks are uniform 0-bit so the logos read as clean motif summaries
rather than noisy speckle. Unlabeled features get no logo — their
cards skip the section entirely.

make_mockup_features.py grows _build_pwm() and _render_logo(); the
metadata/atlas parquets carry a logo_path column; App.jsx detects it
optionally and excludes it from category detection; FeatureCard's
expanded view and FeatureDetailPage display the logo above the top-
activating-sequences list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new visualizations for the SAE interpretability dashboard, plus the
offline pipeline that produces the gene-UMAP precompute bundle.

scripts/generate_fake_genes.py
  500-row genes.tsv stand-in (gene_symbol, species, sequence) until a real
  curated catalog lands. Realistic-ish distributions across 7 species.

scripts/gene_umap_precompute.py
  End-to-end offline pipeline: genes.tsv -> Evo2 1B layer-20 -> TopK SAE
  encode -> mean per gene -> UMAP (cosine) -> HDBSCAN clusters -> per-feature
  firing stats. Writes G.npz, genes_umap.parquet, feature_stats.parquet,
  manifest.json. Reuses predict_evo2 via torchrun subprocess; aggregates
  .pt files by seq_idx + pad_mask. Idempotent (skips predict if .pt
  files exist).

src/ColoredSequence.jsx
  React component: paste a DNA sequence -> each base background-colored
  by its top-firing SAE feature, opacity scaled by activation strength.
  Two modes: top-feature (default), single-feature lookup. Builds mock
  activations internally when no `analysis` prop is supplied so the
  component works standalone before the /analyze backend is wired.
  Tableau-10 colorblind palette, hover tooltip with top-5 features,
  legend sorted by per-color position count.

src/GeneUMAPView.jsx
  Renders the 500-gene UMAP via canvas. Loads G.bin (raw float32),
  genes_meta.json, feature_stats.json from public/gene_umap/. Click a
  feature in the sidebar -> instant recolor by activation strength
  (no recompute). Click Reorganize -> re-runs UMAP client-side with
  feature-weighted vectors (umap-js, ~2-5s at N=500), animates the
  transition with ease-in-out cubic. Hover shows gene metadata + top 5
  firing features.

src/Preview.jsx + src/index.jsx
  Tabbed entry at /#preview: "Main" (the existing dashboard, untouched),
  "ColoredSequence", "Gene UMAP". Hash-gated so /  still goes to the
  unchanged production layout. The ColoredSequence tab includes a paste
  textarea so users can drop their own sequences in.

public/gene_umap/
  Precomputed bundle for the GeneUMAPView (G.bin 30 MB, plus small JSON
  metadata + per-feature stats filtered to n_firing >= 10).

Dep change: umap-js for client-side reorganize. Generated genes are
synthetic; replace fake_genes.tsv with a real curated 500-gene list and
re-run the precompute when one is available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Force-added past the *.bin gitignore so coworkers can pull and run the
dashboard end-to-end without re-running the GPU precompute. Without
this file GeneUMAPView fails to load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three-column comparison view for steering a chosen SAE feature at a
masked position. All synthetic — 14 hand-rolled (seed, feature) pairs
in public/steering_examples.json, including 6 deliberately marked as
null results so the demo shows honestly that not every steering attempt
works.

- Instant-apply controls (no cosmetic Run button)
- A/C/G/T probability bars (DNA tokenization, matches Evo2)
- Sticky diff summary above columns with effect-size badge
- 16S × kanamycin_resistance pair illustrates the A1408G mutation
- Disabled feature options for pairs without data; graceful fallback
  message when an unsupported combination is selected
- 4th tab in Preview.jsx, reuses existing tab pattern (no router added)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pares the preview tabs down to the two that matter for now:
  /#preview tab 1: Main (existing feature catalog + atlas + WebLogos)
  /#preview tab 2: Steering explorer (slider + per-position P(ACGT) heatmap)

Removes the ColoredSequence, Gene UMAP, and SAE Summary tabs along with
their data, scripts, and components. The full 5-tab version is preserved
on the evo2-sae-dashboard-full-mockup branch if we want to revive any of
those views later.

Removed:
- src/ColoredSequence.jsx, src/GeneUMAPView.jsx, src/SteeringComparison.jsx, src/SAESummary.jsx
- public/gene_umap/ (G.bin, genes_meta.json, feature_stats.json)
- public/steering_examples.json (replaced by steering_data.json)
- public/sae_qc_summary.json
- scripts/gene_umap_precompute.py, scripts/generate_fake_genes.py

Kept / added:
- src/SteeringExplorer.jsx (slider + per-position heatmap)
- public/steering_data.json (14 pairs × 200 positions × 4 clamps mock)
- scripts/generate_steering_data.py (regenerates the JSON)
- src/Preview.jsx trimmed to 2 tabs, no more state-heavy local logic

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final UX pass on the SteeringDemo:
- Feature catalog trimmed to the two AMR features
  (kanamycin_resistance, streptomycin_resistance) — non-AMR features
  removed since this demo specifically reproduces the Hutchinson 2025
  A1408G headline
- "Feature to steer" is a dropdown picking the primary feature
- "Also clamp" checkboxes let users co-clamp the other AMR feature
  alongside the primary; clamp slider applies to all selected
- Neighbors-clamped buttons extended from 0/1/2 to 0/1/2/3/4
- Selectivity table + narrative callout removed earlier in the same
  iteration; just the dropdown + co-clamp + bar comparison stays

JSON updated: comparisons now cover all 4 seeds × 2 AMR features
(8 pairs total). Non-AMR seeds (promoter / brca1_exon / random) show
null-result distributions — demonstrating that AMR features don't
shift predictions where they have no biological purchase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- add Steering mode toggle (Position-restricted / Global all positions)
- global mode smears the per-position bar chart toward a low-confidence
  distribution scaled by |clamp|, swaps the FLIPPED badge for
  "no clean flip - degraded"
- add SequenceStrip showing baseline vs steered argmax across the whole
  seed sequence with flipped positions highlighted; only renders in
  global mode
- remove author names / paper titles / external model labels from UI
  copy, banner, tab label, and code comments

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant