Skip to content

fix(#176): splice + var_filter parity between open() and with_settings()#179

Merged
d-laub merged 11 commits into
mainfrom
worktree-fix-176-splice-exonic-filter
May 22, 2026
Merged

fix(#176): splice + var_filter parity between open() and with_settings()#179
d-laub merged 11 commits into
mainfrom
worktree-fix-176-splice-exonic-filter

Conversation

@d-laub
Copy link
Copy Markdown
Collaborator

@d-laub d-laub commented May 22, 2026

Summary

Closes #176. Three concrete bugs, one refactor:

  • Bug 1with_settings(var_filter=...) silently failed to propagate to _recon. After .with_seqs("haplotypes"), _recon is a separate Haps instance with the user-chosen kind. __getitem__ uses _recon, not _seqs, so updating only _seqs meant the filter was a no-op for the code path that actually ran. (This is why the reporter's "Path B" appeared to work — it was producing un-filtered haplotypes.)
  • Bug 2choose_exonic_variants mis-indexed 2-D SVAR offsets: geno_offsets[o_idx] returned a length-n_slices row, not a 2-tuple, producing garbage o_s/o_e and MemoryError / negative-dim ValueError. Mirror the sibling filter_af kernel: geno_offsets[:, o_idx].
  • Bug 3 — the regression test from fix: ndim guard on geno_offsets in choose_exonic_variants second loop #170 used the wrong 2-D layout ((total_variants, 2)), which coincidentally made the wrong indexing return 2 elements per row. Rebuilt on the real (2, n_slices) SVAR layout with n_slices > 2 so wrong indexing fails loudly.
  • RefactorDataset.open now delegates splice_info / var_filter configuration to self.with_settings(...), eliminating the two-code-path divergence that masked Dataset.open(splice_info, var_filter) breaks choose_exonic_variants; with_settings(splice_info, var_filter) works #176 in the first place.

One in-flight scope expansion

After the refactor, Dataset.open(splice_info=...) now reaches _check_valid_state via with_settings, which rejects splicing with the default sequence_type == "variants". Dataset.open auto-promotes to with_seqs("haplotypes") in that case (narrow guard: only when splice_info is set, _seqs is Haps, and sequence_type is the default "variants"). This eliminates a pre-existing silent footgun where the OLD code returned a Dataset in an invalid state until the user called .with_seqs("haplotypes") themselves.

Files changed

File Change
python/genvarloader/_dataset/_genotypes.py 2-D indexing fix in choose_exonic_variants
python/genvarloader/_dataset/_impl.py with_settings propagates var_filter to _recon; Dataset.open delegates splice/filter to with_settings; auto-promote to haplotypes when splice_info is supplied
tests/dataset/genotypes/test_choose_exonic_variants.py Rebuilt 2-D regression on real (2, n_slices) SVAR layout
tests/dataset/test_with_settings_var_filter.py New unit tests for var_filter_recon propagation
tests/dataset/test_open_vs_settings_parity.py New end-to-end parity tests

Test plan

  • pixi run -e dev pytest tests/dataset/genotypes/test_choose_exonic_variants.py — both pass on the real (2, n_slices) layout
  • pixi run -e dev pytest tests/dataset/test_with_settings_var_filter.py — direct _recon.filter propagation probes pass
  • pixi run -e dev pytest tests/dataset/test_open_vs_settings_parity.py — both paths produce identical state and byte-for-byte identical [0,:] output on SVAR-backed data
  • pixi run -e dev test — full pytest + cargo, 459 passed, 5 skipped, 2 xfailed
  • pixi run -e dev ruff check python/ tests/ — clean

Out of scope

  • The min_af/max_af/var_fields recon-rebuild block at _impl.py:427-434 has a separate latent bug (clobbers _recon.kind). This fix deliberately does not touch it.

🤖 Generated with Claude Code

d-laub and others added 11 commits May 21, 2026 23:39
Three bugs identified:
1. with_settings(var_filter=...) doesn't propagate to _recon
2. choose_exonic_variants mis-indexes 2-D SVAR offsets
3. Existing regression test used wrong 2-D layout

Plan: unify open/with_settings, fix kernel indexing, rebuild tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks: rebuild 2-D regression test, fix choose_exonic_variants
indexing, fix with_settings var_filter propagation, unify
Dataset.open through with_settings, add end-to-end parity test,
verify.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After with_seqs(...), _recon is a separate Haps instance with the
user-chosen kind. __getitem__ uses _recon, not _seqs, so updating
only _seqs silently dropped the filter (issue #176 path B).

Mirror the propagation for the HapsTracks recon shape.

Fixes #176 (with_settings half).
One configuration path. Eliminates the open()/with_settings divergence
that masked #176 — open() previously reached into Haps(filter=...) and
constructed SpliceIndexer inline; with_settings had its own copy
(broken for var_filter). Both now flow through with_settings.
SVAR offsets are constructed as offsets.reshape(2, -1) in
Haps.from_path. geno_offsets[o_idx] returns a length-n_slices row,
not a 2-tuple, producing garbage o_s/o_e -> MemoryError or
negative-dim ValueError. Mirror filter_af's geno_offsets[:, o_idx].

Fixes #176 (kernel half).
…info)

Dataset.open(splice_info=...) now calls with_seqs("haplotypes") internally
before delegating to with_settings, so the caller does not need to
interleave with_seqs between open and with_settings. Prevents
_check_valid_state raising "Splicing is not supported with variants" when
splice_info is passed to open on a genotype+reference-backed dataset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utput

End-to-end regression: state probe (filter on _recon, splice indexer
present) and output equality on a small SVAR-backed dataset with
per-region single-exon transcripts. Catches both halves of #176 —
the kernel mis-indexing and the with_settings propagation gap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s) SVAR layout

The previous fixture used (total_variants, 2) which accidentally made
geno_offsets[o_idx] return a 2-element row. Real SVAR offsets are
(2, n_slices) per Haps.from_path; with n_slices > 2 the existing
indexing returns a row that can't unpack into (o_s, o_e), exposing #176.
The next assertion (`is not None`) subsumes the `(None) == (None)`
check. Review nit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ce-exonic-filter

# Conflicts:
#	python/genvarloader/_dataset/_impl.py
@d-laub d-laub merged commit e43de06 into main May 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset.open(splice_info, var_filter) breaks choose_exonic_variants; with_settings(splice_info, var_filter) works

1 participant