Study-area mapping_code parity (tunnel-free) + cheap post-consolidate recompute (#175, #205) by NewGraphEnvironment · Pull Request #206 · NewGraphEnvironment/link

NewGraphEnvironment · 2026-05-26T09:33:24Z

Summary

Tunnel-free per-segment mapping_code parity for the 3 FWCP study areas (Peace / Fraser / Skeena, 50 drainage-closed WSGs). Authoritative parity: median 99.66%, mean 99.11%, 130/148 rows ≥99%. Authoritative CSV: data-raw/logs/study_area_run/20260526_055645_compare.csv. Numbers + methodology in research/provincial_parity_2026_05_25.md + research/study_area_run.md.
Lean tunnel-free M1-dispatch runner (data-raw/study_area_run.sh + study_area_wsgs.R / wsg_run_one.R / wsg_recompute_one.R / study_area_compare.R): spin cyphers → DS-first per-host runs → consolidate → burn → cheap recompute-ALL → tunnel-free compare → CSV. No old M4-centric orchestrator, no :63333 tunnel.
lnk_access — new export, the missing twin of lnk_mapping_code (@family compare, table_<role> params). Portable, schema-aware access builder; merge = TRUE is the surgical UPDATE that powers the cheap post-consolidate recompute (Cheap access-only post-consolidate recompute (bulletproof cross-WSG mapping_code parity, efficiently) #205): ~8× faster than the full-pipeline recompute (FINA 11.9 s wall vs ~90 s, bcfp parity identical).
Bug-class fixes surfaced + landed:
- Persist+consolidate must be host-/species-count-agnostic: persist_init blind to species-column-set drift #204 persist/consolidate must be host- + species-count-agnostic — cypher_prep aligned to cfg$species; schema_consolidate COPYs shared columns by name (shape-tolerant).
- Make id_segment globally unique (position-derived, bcfp-style) — id_segment-alone persist joins cartesian across WSGs #203 cross-WSG cartesian — lnk_mapping_code now filters access by watershed_group_code when present (the id_segment IN (…) query was 50×-duplicating against persist).
- ltree GIST + btree indexes on persist streams + barriers (lnk_persist_init) — frs_network_features traversal needs them; matches fresh::utils.R pattern.
Operational gotchas documented for future sessions:
- RUNBOOK.md §6: orphaned frs_network_features backends + statement_timeout / lock_timeout; pkill <R> ≠ cancel query; view-vs-table planner direction; #203 persist cartesian.
- soul/conventions/code-check.md Docker/Postgres: 6 cross-repo Postgres + R-client lessons (pkill ≠ cancel; set statement_timeout + lock_timeout; function-as-join-predicate inlineability; per-tenant key joins are cartesian; view vs real-table planner; two-statement DELETE/INSERT atomicity).

Related Issues

Closes Promote with_mapping_code flag to stand-alone lnk_compare_mapping_code() export #175
Open follow-ups filed this work: Persist+consolidate must be host-/species-count-agnostic: persist_init blind to species-column-set drift #204 (persist shape-drift methodology), Cheap access-only post-consolidate recompute (bulletproof cross-WSG mapping_code parity, efficiently) #205 (cheap access-only recompute — implemented here)
Relates to Make id_segment globally unique (position-derived, bcfp-style) — id_segment-alone persist joins cartesian across WSGs #203 (id_segment global uniqueness — narrowed lnk_mapping_code symptom; root issue still open)
Relates to NewGraphEnvironment/sred-2025-2026#24

Test plan

Stage A (dispatcher-only, PARS, $0): driver + tunnel-free pre-flight + DS-first run + tunnel-free compare validated; PARS emits ACCESS;DAM;INTERMITTENT (Bennett dams in PCEA/UPCE, DS-first, no recompute needed for the closure); BT match 99.0%.
Full 3-area run on M1 + 2 cyphers (20260526_055645): 50/50 WSGs, zero errors / [WARN]s, cyphers BURNED clean (✓ doctl: no cypher droplets). Median match 99.66%.
Cheap recompute (lnk_access(merge=TRUE)) reproduces full-pipeline recompute exactly on FINA (0/26,094 mismatches, identical bcfp parity 99.8% / 57 diffs / ACCESS;DAM top).
devtools::document() regenerates NAMESPACE + man/; devtools::test() 1216 PASS / 1 FAIL (the known M1-env test-lnk_db_conn, unrelated).
Reviewer pass on research/study_area_run.md procedure + RUNBOOK.md §6 new gotchas.

Notes

The two recompute-stable divergences are taxonomy candidates, not regressions: SETN salmon (CH/CM/CO/PK/SK/ST) ~94% (SK-geography class), UNRS BT 61.8% (Kenney reservoir / dam-override). Next session: lnk_parity_annotate against bcfp_divergence_taxonomy.yml.
The lean runner intentionally bypasses the old M4-centric wsgs_run_pipeline.sh (kept untouched for the existing rollup / full-province flow). study_area_run.sh is the new path for per-segment mapping_code parity, validated end-to-end on 50 WSGs.
Methodology = correctness regardless of bucketing. Distribute (any bucketing) → consolidate → recompute → compare. The recompute is the correctness guarantee; bucketing is a speed knob. Recompute-ALL is bulletproof now that the access-only path is ~10 s/WSG.
Pinning: link branch 175-promote-with-mapping-code-flag-to-stand HEAD d74d992. bcfp reference v0.7.15-14-ge12c1a5 (snapshot loaded via snapshot_bcfp.sh --with-bcfp-views).

🤖 Generated with Claude Code

… cartesian New export lnk_compare_mapping_code() — segment-level mapping_code parity that reads the bcfp reference from the LOCAL snapshot fresh.streams_vw_bcfp (no :63333 tunnel, no conn_ref by default). Diffs <persist>.streams_mapping_code vs the snapshot on (blue_line_key, downstream_route_measure) per WSG-active species. .lnk_compare_wsg_mapping_code_diff now delegates; shared merge/match in .lnk_mc_diff. Tunnel path kept (pass conn_ref) for back-compat. Caught + fixed a real id_segment-collision bug: id_segment is per-WSG (80,555 distinct / 1.5M persist rows), unique only on the PK (id_segment, watershed_group_code). Joining persist tables on id_segment alone is a ~22x cartesian. Fixed lnk_compare_rollup's 3 habitat joins to the full PK (PARS BT spawning_km 36,820 -> 1,681). Added WSG-active species resolution so absent species (link "" vs bcfp NULL) don't register a spurious 0% match. Root fix (globally-unique position-derived id_segment, bcfp-style) filed as #203. Live PARS BT 98.95% reproduced tunnel-free; 1216 tests pass (lone fail is the env-only db_conn tunnel test). /code-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… family lnk_compare_wsg(mapping_code = TRUE) now diffs against the LOCAL bcfp snapshot (fresh.streams_vw_bcfp) via lnk_compare_mapping_code with no conn_ref — the mapping_code lens is tunnel-free. conn_ref is still required for the rollup (lnk_compare_rollup needs bcfp habitat_linear, not in the snapshot). Species now auto-resolve to the WSG-active set rather than a hardcoded 8. Removed the now-dead .lnk_compare_wsg_mapping_code_diff helper (merge/match lives in .lnk_mc_diff); fixed the lnk_mapping_code doc ref. data-raw/wsg_compare.R: added wsg_compare_mapping_code() — tunnel-free (local conn only, no PG_PASS_SHARE/:63333). This is the per-segment mapping_code compare the orchestrator will run on the dispatcher after consolidate (cyphers just run + persist). Verified live: PARS BT 98.95% with PG_PASS_SHARE unset. Composition test repointed to mock lnk_compare_mapping_code. 93 compare / 1216 total tests pass (lone fail = env-only db_conn). /code-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3a: make persist/consolidate host- and species-count-agnostic so the 3-WSG smoke's cross-host wide-table drift can't recur. - schema_consolidate.R: COPY by shared column name (runtime intersection, dest ordinal order) instead of positional SELECT-star / FROM STDIN. Handles hosts whose streams_access/streams_mapping_code carry different species column sets. Nothing hardcoded (cols/species/host discovered at runtime). - cypher_prep.sh: seed lnk_persist_init from cfg$species (mirrors lnk_pipeline_run.R:157), not parameters_fresh (11 sp incl CT/DV/RB) - removes the drift at source. Surfaced by the 3-WSG smoke (CRKD/LCHL/ZYMO): cyphers' wide tables had 30 cols vs M1's 24 -> positional COPY failed. Filed #204 for the deeper class (persist_init blind to species-column-set drift). /code-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…patch (#175) Phase 3 REVISED: instead of refactoring the 1594-line M4-centric orchestrator (per user "are these already dealt with in our start-to-finish scripts?"), productionize the proven smoke flow into 4 small reusable scripts that reuse every existing piece (cypher_up/prep/down, schema_consolidate, lnk_pipeline_run, wsg_compare_mapping_code). Cross-WSG ;DAM solved WITHOUT a post-consolidate recompute: each host gets a drainage-CLOSED bucket (focal + downstream closure via public.wsg_outlet) run DOWNSTREAM-FIRST, so a WSG's downstream dam barriers are persisted before its access/mapping_code is computed. Validated on PARS (depth 3) -> its Bennett-dam WSGs PCEA/UPCE/LPCE (depth 2) come first. One study area per host; areas are drainage-independent (roots 100/200/400). New: - study_area_wsgs.R closure + DS-first list (public.wsg_outlet) - wsg_run_one.R lnk_pipeline_run(mapping_code=TRUE) for one WSG, local, host-agnostic (LNK_LOAD=loadall dispatcher / library cyphers) - study_area_compare.R tunnel-free wsg_compare_mapping_code loop -> CSV - study_area_run.sh driver: tunnel-free pre-flight -> spin -> prep -> run DS-first buckets (dispatcher + cyphers) -> consolidate -> BURN (minimise idle) -> compare -> CSV; trap-EXIT burn No M4, no ssh m1, no :63333/PG_PASS_SHARE. /code-check clean (fixed burn-verify pipefail, added bucket-overlap warning). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…aunched (#175) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ea-on-dispatcher note (#175) Fresh cyphers race: cypher_up returns when the IP is assigned, before sshd is up, so scp of cypher_prep.sh hit "Connection closed". Poll ssh (up to ~150s, accept-new host key) before scp. Also document: put the largest study area on the dispatcher (fast/free M1); cyphers are slow+paid, give them smaller areas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…CH (#175) Cyphers git checkout main by default, but the driver scripts (wsg_run_one.R etc.) + branch link live on the feature branch. Pass the dispatcher`s current branch to cypher_prep so cyphers carry the same ref. Branch must be pushed first (cypher_prep does git fetch origin + reset --hard origin/$BRANCH). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…157) Root-causes the 2026-05-25 data loss: the drainage closure pulled in a species-less WSG (LEUT) -> lnk_pipeline_run errored "No species resolved for AOI" -> dispatcher loop `|| exit 1` -> driver FATAL -> trap burned the cyphers WITH their un-consolidated Peace+Skeena data. One bad WSG lost a whole run. Per the records (research/provincial_run_runbook.md, data-raw/wsgs_run_host.R :88 #157) the proven runner already solved both: - study_area_wsgs.R: filter the closure to bundle-species presence (cfg$species in wsg_species_presence) — drops species-less closure WSGs (Fraser: LEUT, LNRS). Matches wsgs_run_host.R exactly. - study_area_run.sh: per-WSG SOFT-FAIL — a WSG error logs WARN and the loop continues; a non-zero host exit is logged, not fatal. Always reaches consolidate so a late failure can't burn cyphers with unconsolidated data. Mirrors wsgs_run_host.R resume-safe behaviour. - wsg_run_one.R: defensive exit-0 skip when lnk_pipeline_species() is empty. /code-check clean (1 round, 0 findings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

research/study_area_run.md — lean tunnel-free M1-dispatch study-area parity procedure + the 2026-05-25 gotchas (trap-burn data loss, species filter #157, soft-fail, sshd race, wide-table drift, cypher $0.06/hr). data-raw/README.md drivers table gains study_area_run.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…arantee (#175, #205) The deliverable is a methodology correct regardless of machine count + WSG bucketing. Drainage-closed + DS-first per-host is NOT sufficient: a WSG's downstream barriers can be cross-bucket or arrive late in DS-first order, so its access (token1/token2) is computed against an incomplete barrier set. Caught 2026-05-25: FINA 75.5% / PARA 68.6% per-host -> 99%+ after re-modelling on the full consolidated barrier set. So: distribute (any bucketing) -> consolidate -> POST-CONSOLIDATE RECOMPUTE the diverged WSGs (any species <99%) on the dispatcher with the complete barrier set -> re-compare. Bucketing becomes a speed knob, not a correctness lever. Recompute uses the full pipeline today (slow); filed #205 for the cheap access-only recompute (reuse persisted streams/habitat) that makes recompute-ALL bulletproof + fast. Docs: research/study_area_run.md (procedure corrected). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Correct the task_plan claim "cross-WSG ;DAM solved without recompute" — full run disproved it (FINA 75.5%/PARA 68.6% per-host -> 99%+ after recompute). Record authoritative post-recompute parity (median 99.66%) + the methodology finding + genuine divergences (UNRS reservoir, SETN salmon) in research/provincial_parity_2026_05_25.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nsolidate recompute, #205) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The post-consolidate recompute (#175) was running the full pipeline on diverged WSGs — ~2× cost on those WSGs because it re-derived streams/habitat (already correct) just to redo the cheap access step. #205 implements the access-only recompute that reuses persisted streams/habitat/barriers; FINA validated: 11.86s wall (vs ~90s full pipeline = ~8× faster), bcfp parity 99.8% / 57 diffs / ACCESS;DAM top — IDENTICAL to the full-pipeline recompute. Five things had to be right; four are real gotchas worth knowing (RUNBOOK §6): - R/lnk_access.R (new export, @family compare): portable access builder, twin of lnk_mapping_code (table_<role> params). Builds per-species _access + source `_unified` views internally via lnk_barriers_views over the persist barriers; merge=FALSE overwrites, merge=TRUE surgically UPDATEs cross-WSG cols (has_barriers_*, dam_dnstr_ind) on the target while PRESERVING remediated_dnstr_ind and observed access_<sp>=2 from the prior compute. - lnk_access materialises the AOI streams as a REAL TABLE (CREATE TABLE + index id_segment, wscode_ltree GiST, localcode_ltree GiST, blue_line_key, ANALYZE), NOT a view. A view didn't carry small-table stats so the planner picked the ~800k-row barriers as the nested-loop outer driver, blowing cost ~1000× (>10min). With the real table, planner picks the 26k AOI streams as outer, walk takes seconds. - R/lnk_persist_init.R: persist streams + barriers now get the same wscode_ltree / localcode_ltree GIST + btree indexes that fresh::utils.R builds on its working network table (frs_network_features needs them). - R/lnk_mapping_code.R: #203 cross-WSG cartesian fix. The access read was `SELECT * FROM <access> WHERE id_segment IN (SELECT id_segment FROM streams WHERE wsg=aoi)`. Against persist (where id_segment is per-WSG, not globally unique) this matched N×WSGs of duplicates → 50× rows in mc_scratch → PK violation on persist write. Filter by watershed_group_code when the table carries it. - data-raw/wsg_recompute_one.R (new): sibling of wsg_run_one.R. Sets statement_timeout (600s) + lock_timeout (60s) so a runaway/locked query cancels server-side instead of orphaning a backend. data-raw/study_area_run.sh wired to call it + switched to recompute-ALL (cheap → bulletproof; bucketing is now a speed knob, not a correctness lever). Docs: research/study_area_run.md procedure updated; RUNBOOK.md §6 gotchas (orphaned backend / statement_timeout + pkill ≠ cancel query; view-vs-table planner gotcha; #203 cartesian-on-persist). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ata-loss followup) Without the transaction, a failed INSERT (e.g. the #203 PK-violation that caused FINA mc data loss 2026-05-25) leaves the WSG`s rows deleted but not re-inserted. dbWithTransaction makes the pair atomic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

NewGraphEnvironment and others added 18 commits May 24, 2026 01:49

Archive #200 PWF (v0.40.4 shipped via PR #202)

e31fb0a

Initialize PWF baseline for #175 (tunnel-free compare + orchestrator)

6cc34e3

PWF: Stage A pass (PARS ;DAM confirmed, 99% match); full 3-area run l…

e84b4a2

…aunched (#175) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PWF: data-loss incident root-caused + records-based fixes (#175)

b45d398

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CLAUDE.md: refresh Status handoff to #175 (study-area parity, post-co…

a8e2928

…nsolidate recompute, #205) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PWF: #205 Phase 7 baseline — lnk_access cheap recompute (plan approved)

2095310

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

NewGraphEnvironment merged commit 0a332cc into main May 26, 2026
1 check passed

NewGraphEnvironment deleted the 175-promote-with-mapping-code-flag-to-stand branch May 26, 2026 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Study-area mapping_code parity (tunnel-free) + cheap post-consolidate recompute (#175, #205)#206

Study-area mapping_code parity (tunnel-free) + cheap post-consolidate recompute (#175, #205)#206
NewGraphEnvironment merged 18 commits into
mainfrom
175-promote-with-mapping-code-flag-to-stand

NewGraphEnvironment commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NewGraphEnvironment commented May 26, 2026

Summary

Related Issues

Test plan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant