Skip to content

Study-area mapping_code parity (tunnel-free) + cheap post-consolidate recompute (#175, #205)#206

Merged
NewGraphEnvironment merged 18 commits into
mainfrom
175-promote-with-mapping-code-flag-to-stand
May 26, 2026
Merged

Study-area mapping_code parity (tunnel-free) + cheap post-consolidate recompute (#175, #205)#206
NewGraphEnvironment merged 18 commits into
mainfrom
175-promote-with-mapping-code-flag-to-stand

Conversation

@NewGraphEnvironment

Copy link
Copy Markdown
Owner

Summary

  • Tunnel-free per-segment mapping_code parity for the 3 FWCP study areas (Peace / Fraser / Skeena, 50 drainage-closed WSGs). Authoritative parity: median 99.66%, mean 99.11%, 130/148 rows ≥99%. Authoritative CSV: data-raw/logs/study_area_run/20260526_055645_compare.csv. Numbers + methodology in research/provincial_parity_2026_05_25.md + research/study_area_run.md.
  • Lean tunnel-free M1-dispatch runner (data-raw/study_area_run.sh + study_area_wsgs.R / wsg_run_one.R / wsg_recompute_one.R / study_area_compare.R): spin cyphers → DS-first per-host runs → consolidate → burn → cheap recompute-ALL → tunnel-free compare → CSV. No old M4-centric orchestrator, no :63333 tunnel.
  • lnk_access — new export, the missing twin of lnk_mapping_code (@family compare, table_<role> params). Portable, schema-aware access builder; merge = TRUE is the surgical UPDATE that powers the cheap post-consolidate recompute (Cheap access-only post-consolidate recompute (bulletproof cross-WSG mapping_code parity, efficiently) #205): ~8× faster than the full-pipeline recompute (FINA 11.9 s wall vs ~90 s, bcfp parity identical).
  • Bug-class fixes surfaced + landed:
  • Operational gotchas documented for future sessions:
    • RUNBOOK.md §6: orphaned frs_network_features backends + statement_timeout / lock_timeout; pkill <R> ≠ cancel query; view-vs-table planner direction; #203 persist cartesian.
    • soul/conventions/code-check.md Docker/Postgres: 6 cross-repo Postgres + R-client lessons (pkill ≠ cancel; set statement_timeout + lock_timeout; function-as-join-predicate inlineability; per-tenant key joins are cartesian; view vs real-table planner; two-statement DELETE/INSERT atomicity).

Related Issues

Test plan

  • Stage A (dispatcher-only, PARS, $0): driver + tunnel-free pre-flight + DS-first run + tunnel-free compare validated; PARS emits ACCESS;DAM;INTERMITTENT (Bennett dams in PCEA/UPCE, DS-first, no recompute needed for the closure); BT match 99.0%.
  • Full 3-area run on M1 + 2 cyphers (20260526_055645): 50/50 WSGs, zero errors / [WARN]s, cyphers BURNED clean (✓ doctl: no cypher droplets). Median match 99.66%.
  • Cheap recompute (lnk_access(merge=TRUE)) reproduces full-pipeline recompute exactly on FINA (0/26,094 mismatches, identical bcfp parity 99.8% / 57 diffs / ACCESS;DAM top).
  • devtools::document() regenerates NAMESPACE + man/; devtools::test() 1216 PASS / 1 FAIL (the known M1-env test-lnk_db_conn, unrelated).
  • Reviewer pass on research/study_area_run.md procedure + RUNBOOK.md §6 new gotchas.

Notes

  • The two recompute-stable divergences are taxonomy candidates, not regressions: SETN salmon (CH/CM/CO/PK/SK/ST) ~94% (SK-geography class), UNRS BT 61.8% (Kenney reservoir / dam-override). Next session: lnk_parity_annotate against bcfp_divergence_taxonomy.yml.
  • The lean runner intentionally bypasses the old M4-centric wsgs_run_pipeline.sh (kept untouched for the existing rollup / full-province flow). study_area_run.sh is the new path for per-segment mapping_code parity, validated end-to-end on 50 WSGs.
  • Methodology = correctness regardless of bucketing. Distribute (any bucketing) → consolidate → recompute → compare. The recompute is the correctness guarantee; bucketing is a speed knob. Recompute-ALL is bulletproof now that the access-only path is ~10 s/WSG.
  • Pinning: link branch 175-promote-with-mapping-code-flag-to-stand HEAD d74d992. bcfp reference v0.7.15-14-ge12c1a5 (snapshot loaded via snapshot_bcfp.sh --with-bcfp-views).

🤖 Generated with Claude Code

NewGraphEnvironment and others added 18 commits May 24, 2026 01:49
… cartesian

New export lnk_compare_mapping_code() — segment-level mapping_code parity that
reads the bcfp reference from the LOCAL snapshot fresh.streams_vw_bcfp (no
:63333 tunnel, no conn_ref by default). Diffs <persist>.streams_mapping_code
vs the snapshot on (blue_line_key, downstream_route_measure) per WSG-active
species. .lnk_compare_wsg_mapping_code_diff now delegates; shared merge/match
in .lnk_mc_diff. Tunnel path kept (pass conn_ref) for back-compat.

Caught + fixed a real id_segment-collision bug: id_segment is per-WSG (80,555
distinct / 1.5M persist rows), unique only on the PK (id_segment,
watershed_group_code). Joining persist tables on id_segment alone is a ~22x
cartesian. Fixed lnk_compare_rollup's 3 habitat joins to the full PK (PARS BT
spawning_km 36,820 -> 1,681). Added WSG-active species resolution so absent
species (link "" vs bcfp NULL) don't register a spurious 0% match.

Root fix (globally-unique position-derived id_segment, bcfp-style) filed as
#203. Live PARS BT 98.95% reproduced tunnel-free; 1216 tests pass (lone fail
is the env-only db_conn tunnel test). /code-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… family

lnk_compare_wsg(mapping_code = TRUE) now diffs against the LOCAL bcfp
snapshot (fresh.streams_vw_bcfp) via lnk_compare_mapping_code with no
conn_ref — the mapping_code lens is tunnel-free. conn_ref is still required
for the rollup (lnk_compare_rollup needs bcfp habitat_linear, not in the
snapshot). Species now auto-resolve to the WSG-active set rather than a
hardcoded 8. Removed the now-dead .lnk_compare_wsg_mapping_code_diff helper
(merge/match lives in .lnk_mc_diff); fixed the lnk_mapping_code doc ref.

data-raw/wsg_compare.R: added wsg_compare_mapping_code() — tunnel-free
(local conn only, no PG_PASS_SHARE/:63333). This is the per-segment
mapping_code compare the orchestrator will run on the dispatcher after
consolidate (cyphers just run + persist). Verified live: PARS BT 98.95%
with PG_PASS_SHARE unset.

Composition test repointed to mock lnk_compare_mapping_code. 93 compare /
1216 total tests pass (lone fail = env-only db_conn). /code-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3a: make persist/consolidate host- and species-count-agnostic so the
3-WSG smoke's cross-host wide-table drift can't recur.

- schema_consolidate.R: COPY by shared column name (runtime intersection,
  dest ordinal order) instead of positional SELECT-star / FROM STDIN.
  Handles hosts whose streams_access/streams_mapping_code carry different
  species column sets. Nothing hardcoded (cols/species/host discovered at
  runtime).
- cypher_prep.sh: seed lnk_persist_init from cfg$species (mirrors
  lnk_pipeline_run.R:157), not parameters_fresh (11 sp incl CT/DV/RB) -
  removes the drift at source.

Surfaced by the 3-WSG smoke (CRKD/LCHL/ZYMO): cyphers' wide tables had
30 cols vs M1's 24 -> positional COPY failed. Filed #204 for the deeper
class (persist_init blind to species-column-set drift). /code-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patch (#175)

Phase 3 REVISED: instead of refactoring the 1594-line M4-centric orchestrator
(per user "are these already dealt with in our start-to-finish scripts?"),
productionize the proven smoke flow into 4 small reusable scripts that reuse
every existing piece (cypher_up/prep/down, schema_consolidate, lnk_pipeline_run,
wsg_compare_mapping_code).

Cross-WSG ;DAM solved WITHOUT a post-consolidate recompute: each host gets a
drainage-CLOSED bucket (focal + downstream closure via public.wsg_outlet) run
DOWNSTREAM-FIRST, so a WSG's downstream dam barriers are persisted before its
access/mapping_code is computed. Validated on PARS (depth 3) -> its Bennett-dam
WSGs PCEA/UPCE/LPCE (depth 2) come first. One study area per host; areas are
drainage-independent (roots 100/200/400).

New:
- study_area_wsgs.R   closure + DS-first list (public.wsg_outlet)
- wsg_run_one.R       lnk_pipeline_run(mapping_code=TRUE) for one WSG, local,
                      host-agnostic (LNK_LOAD=loadall dispatcher / library cyphers)
- study_area_compare.R tunnel-free wsg_compare_mapping_code loop -> CSV
- study_area_run.sh   driver: tunnel-free pre-flight -> spin -> prep -> run
                      DS-first buckets (dispatcher + cyphers) -> consolidate ->
                      BURN (minimise idle) -> compare -> CSV; trap-EXIT burn

No M4, no ssh m1, no :63333/PG_PASS_SHARE. /code-check clean (fixed burn-verify
pipefail, added bucket-overlap warning).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aunched (#175)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ea-on-dispatcher note (#175)

Fresh cyphers race: cypher_up returns when the IP is assigned, before sshd is
up, so scp of cypher_prep.sh hit "Connection closed". Poll ssh (up to ~150s,
accept-new host key) before scp. Also document: put the largest study area on
the dispatcher (fast/free M1); cyphers are slow+paid, give them smaller areas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…CH (#175)

Cyphers git checkout main by default, but the driver scripts (wsg_run_one.R
etc.) + branch link live on the feature branch. Pass the dispatcher`s current
branch to cypher_prep so cyphers carry the same ref. Branch must be pushed
first (cypher_prep does git fetch origin + reset --hard origin/$BRANCH).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…157)

Root-causes the 2026-05-25 data loss: the drainage closure pulled in a
species-less WSG (LEUT) -> lnk_pipeline_run errored "No species resolved for
AOI" -> dispatcher loop `|| exit 1` -> driver FATAL -> trap burned the cyphers
WITH their un-consolidated Peace+Skeena data. One bad WSG lost a whole run.

Per the records (research/provincial_run_runbook.md, data-raw/wsgs_run_host.R
:88 #157) the proven runner already solved both:
- study_area_wsgs.R: filter the closure to bundle-species presence
  (cfg$species in wsg_species_presence) — drops species-less closure WSGs
  (Fraser: LEUT, LNRS). Matches wsgs_run_host.R exactly.
- study_area_run.sh: per-WSG SOFT-FAIL — a WSG error logs WARN and the loop
  continues; a non-zero host exit is logged, not fatal. Always reaches
  consolidate so a late failure can't burn cyphers with unconsolidated data.
  Mirrors wsgs_run_host.R resume-safe behaviour.
- wsg_run_one.R: defensive exit-0 skip when lnk_pipeline_species() is empty.

/code-check clean (1 round, 0 findings).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
research/study_area_run.md — lean tunnel-free M1-dispatch study-area parity
procedure + the 2026-05-25 gotchas (trap-burn data loss, species filter #157,
soft-fail, sshd race, wide-table drift, cypher $0.06/hr). data-raw/README.md
drivers table gains study_area_run.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…arantee (#175, #205)

The deliverable is a methodology correct regardless of machine count + WSG
bucketing. Drainage-closed + DS-first per-host is NOT sufficient: a WSG's
downstream barriers can be cross-bucket or arrive late in DS-first order, so
its access (token1/token2) is computed against an incomplete barrier set.
Caught 2026-05-25: FINA 75.5% / PARA 68.6% per-host -> 99%+ after re-modelling
on the full consolidated barrier set.

So: distribute (any bucketing) -> consolidate -> POST-CONSOLIDATE RECOMPUTE the
diverged WSGs (any species <99%) on the dispatcher with the complete barrier
set -> re-compare. Bucketing becomes a speed knob, not a correctness lever.

Recompute uses the full pipeline today (slow); filed #205 for the cheap
access-only recompute (reuse persisted streams/habitat) that makes recompute-ALL
bulletproof + fast. Docs: research/study_area_run.md (procedure corrected).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Correct the task_plan claim "cross-WSG ;DAM solved without recompute" — full run
disproved it (FINA 75.5%/PARA 68.6% per-host -> 99%+ after recompute). Record
authoritative post-recompute parity (median 99.66%) + the methodology finding +
genuine divergences (UNRS reservoir, SETN salmon) in
research/provincial_parity_2026_05_25.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nsolidate recompute, #205)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-consolidate recompute (#175) was running the full pipeline on diverged
WSGs — ~2× cost on those WSGs because it re-derived streams/habitat (already
correct) just to redo the cheap access step. #205 implements the access-only
recompute that reuses persisted streams/habitat/barriers; FINA validated:
11.86s wall (vs ~90s full pipeline = ~8× faster), bcfp parity 99.8% / 57 diffs
/ ACCESS;DAM top — IDENTICAL to the full-pipeline recompute.

Five things had to be right; four are real gotchas worth knowing (RUNBOOK §6):

- R/lnk_access.R (new export, @family compare): portable access builder, twin
  of lnk_mapping_code (table_<role> params). Builds per-species _access +
  source `_unified` views internally via lnk_barriers_views over the persist
  barriers; merge=FALSE overwrites, merge=TRUE surgically UPDATEs cross-WSG
  cols (has_barriers_*, dam_dnstr_ind) on the target while PRESERVING
  remediated_dnstr_ind and observed access_<sp>=2 from the prior compute.
- lnk_access materialises the AOI streams as a REAL TABLE (CREATE TABLE +
  index id_segment, wscode_ltree GiST, localcode_ltree GiST, blue_line_key,
  ANALYZE), NOT a view. A view didn't carry small-table stats so the planner
  picked the ~800k-row barriers as the nested-loop outer driver, blowing
  cost ~1000× (>10min). With the real table, planner picks the 26k AOI
  streams as outer, walk takes seconds.
- R/lnk_persist_init.R: persist streams + barriers now get the same
  wscode_ltree / localcode_ltree GIST + btree indexes that fresh::utils.R
  builds on its working network table (frs_network_features needs them).
- R/lnk_mapping_code.R: #203 cross-WSG cartesian fix. The access read was
  `SELECT * FROM <access> WHERE id_segment IN (SELECT id_segment FROM streams
  WHERE wsg=aoi)`. Against persist (where id_segment is per-WSG, not globally
  unique) this matched N×WSGs of duplicates → 50× rows in mc_scratch → PK
  violation on persist write. Filter by watershed_group_code when the table
  carries it.
- data-raw/wsg_recompute_one.R (new): sibling of wsg_run_one.R. Sets
  statement_timeout (600s) + lock_timeout (60s) so a runaway/locked query
  cancels server-side instead of orphaning a backend. data-raw/study_area_run.sh
  wired to call it + switched to recompute-ALL (cheap → bulletproof; bucketing
  is now a speed knob, not a correctness lever).

Docs: research/study_area_run.md procedure updated; RUNBOOK.md §6 gotchas
(orphaned backend / statement_timeout + pkill ≠ cancel query; view-vs-table
planner gotcha; #203 cartesian-on-persist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ata-loss followup)

Without the transaction, a failed INSERT (e.g. the #203 PK-violation that
caused FINA mc data loss 2026-05-25) leaves the WSG`s rows deleted but not
re-inserted. dbWithTransaction makes the pair atomic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NewGraphEnvironment NewGraphEnvironment merged commit 0a332cc into main May 26, 2026
1 check passed
@NewGraphEnvironment NewGraphEnvironment deleted the 175-promote-with-mapping-code-flag-to-stand branch May 26, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Promote with_mapping_code flag to stand-alone lnk_compare_mapping_code() export

1 participant