eospost coords#140
Open
oshaughnessy-junior wants to merge 19 commits into
Open
Conversation
…t target
Bug fix: create_event_parameter_pipeline_BasicIteration imports dag_utils_generic
(aliased dag_utils), so --calmarg-pilot crashed with AttributeError -- write_calpilot_sub
was only added to dag_utils.py. Add it to dag_utils_generic.py too (the module the
builder actually uses). Caught by the new local DAG build test below.
demo/rift/calmarg/Makefile: new `make dag-build` / `dag-validate` / `dag-run` targets
that build a real RIFT DAG with --calmarg-pilot (3-IFO + GPU + AV, mirroring the
known-good ILE-GPU-Paper batch_gpu target) and validate the cal-pilot topology without
needing condor. Verified the produced DAG:
- CALPILOT jobs for it=0 (macroiterationprev=-1) and it=1 (=0), running util_CalPilotStage.py
- PARENT unify_0 CHILD CALPILOT_0 (pilot after iteration 0 composite, ∥ CIP_0)
- PARENT CALPILOT_0 CHILD <wide ILE it=1> (the seed barrier: wide_{N+1} waits on pilot_N)
- ILE.sub seeded via --calibration-proposal-breadcrumb cal_consolidated_$(macroiterationprev).npz
`make dag-run` submits it to local condor (GPU); select the card with CUDA_VISIBLE_DEVICES.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…err) + robust harvest
Local condor smoke test of the cal-pilot DAG failed at the CIP_0 and CALPILOT_0 nodes,
both with parse errors on the iteration-0 composite ("could not convert 'Import' to
float" / "number of columns changed from 13 to 10"). Root cause: lalsimutils.py printed
the missing-`precession` warning to STDOUT, and since many RIFT tools build .dat/.composite
files from stdout, that line corrupted the data (one ragged 10-col row among 13-col rows).
Not a calmarg bug, but it breaks any pipeline run on a precession-less environment.
- RIFT/lalsimutils.py: print the precession ImportError warning to sys.stderr, not stdout.
- bin/util_CalHarvestGrid.py: read defensively (genfromtxt usecols=indx..lnL,
invalid_raise=False, drop NaN rows) so a ragged/dirty composite degrades gracefully
instead of failing the pilot. Verified on the actual corrupted composite (6/27 pts).
Caught by `make dag-run` in demo/rift/calmarg (local condor + GPU on cardassia).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… start
Pipeline blocker found by the local cal-pilot DAG run: iteration 0 draws cal realizations
from the broad PRIOR (no pilot proposal learned yet), so its per-point Monte-Carlo error
is large (~0.7-0.9 here). CIP's default --sigma-cut 0.6 then strips EVERY point
("Stripped size (0,)"), and the DAG dies. This is generic to any calmarg run, not the demo.
Long-term fix (as suggested -- at the helper level, using the iteration structure):
- helper_LDG_Events.py: new --calmarg-first-cip-sigma-cut; when set, relax ONLY the FIRST
CIP stage's --sigma-cut (that stage runs the cold-start/prior-cal iterations). Later
stages run on pilot-seeded iterations with normal errors and keep the default cut.
- util_RIFT_pseudo_pipe.py: --calmarg-first-cip-sigma-cut (default 100); passed to the
helper automatically when --calmarg-pilot is enabled.
Short-term (demo): demo/rift/calmarg Makefile DAG_CIP_ARGS gets --sigma-cut 5 so the local
smoke test flows past CIP_0 into the seeded iteration 1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…almarg path
The cal-pilot DAG run exposed pilot starvation (a 60-dim cal proposal fit from only ~20
realizations -> near-degenerate covariance -> pathological seeded iteration-1 lnL). Per
review priority, add the cleaner production path first: VANILLA in-loop calmarg, no pilots.
Makefile dag-build/dag-validate now take PILOT={0,1} and FUSED={0,1} (parse-time ifeq):
- PILOT=0 omits --calmarg-pilot and the proposal-breadcrumb seed (every iteration uses
prior cal draws -- correct for the no-pilot production path).
- FUSED=1 adds --calibration-fused-kernel (Option C fused GPU path).
- dag-validate branches: PILOT=1 asserts CALPILOT+breadcrumb; PILOT=0 asserts envelope +
(optional) fused flag and NO CALPILOT.
Priority demo: make dag-build PILOT=0 FUSED=1 && make dag-run PILOT=0 FUSED=1
(verified the build: ILE.sub carries --calibration-fused-kernel + envelope, no breadcrumb,
no CALPILOT.) Pilot starvation backstop (larger pilot n_cal + prior-shrinkage covariance)
is tracked separately.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…arved-cov collapse) Corroborated the seeded-iteration pathology: the pilot fit a 60-D cal Gaussian from only ~20 realizations, so the covariance was rank-deficient and its ~40 uninformed directions collapsed to the cov_floor (1e-8) -- a near-delta proposal -> seeded log(prior/proposal) blew up -> iteration-1 lnL collapsed (+109 -> -131). Fix (robustness, the real backstop): adaptive.fit_proposal gains prior_sigma + shrink. When prior_sigma is given it shrinks the fitted covariance toward diag(prior_sigma**2) with weight rho = (dim+1)/(dim+1+neff): ~all-prior when starved (neff << dim), ->all-data when neff >> dim. Uninformed directions now keep ~prior width (log_w ~ 0) instead of collapsing. Verified on a 60-D/neff=1 starved fit: min cov eigenvalue 1e-8 -> 0.98 (prior var). Threaded prior_sigma through util_CalPilotFit and adaptive.adaptive_cal. Existing tests unchanged: pilot brute-vs-seeded |dlogZ|=0.01 x254; adaptive_cal neff 140/300. Also (demo, per review): N_COPIES_DAG default 1 (drop builder's default-2 redundancy, ~2x turnaround on a single-card test); threaded --n-copies into the demo DAG build. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Design framing (per review) for an ILE-level burn-in in analyze_event: cal marginalization -- especially cold-start PRIOR cal on iteration 0 -- makes the extrinsic integral hard to converge (low n_eff), so we fail to seed the grid. The extrinsic posterior is ~cal- independent, so burn the AV sampler in on the cheap ZERO-CAL (n_cal=1) likelihood to a target n_eff first, then switch to the full cal-marg likelihood reusing the adapted proposal. Two mechanisms documented (two-phase integrate; robust warm-start via the existing update_sampling_prior/oracle path) + proposed --calibration-burn-in-neff flag. Composes with the cal pilots (this seeds the EXTRINSIC proposal in-job; pilots seed the CAL proposal across iterations). Implementation tracked separately (needs a GPU test). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… tuning) Per review, tuned settings on a single-event ILE at the injection (new `tune-single` target) before trusting a full DAG. Found and fixed three issues in the DAG wide-ILE args (all inherited from the ILE-GPU-Paper STANDARD_ILE_OPTS, wrong for calmarg): 1. --no-adapt-distance WITH no distance marginalization -> distance neither adapted nor marginalized -> catastrophic n_eff. Removed the freeze flags; added a DMARG_DAG toggle (default ON: --distance-marginalization + lookup table). 2. --adapt-weight-exponent 0.1 over-tempers the adaptive proposal so it cannot concentrate on the high-dynamic-range cal-marginalized peak. Dropped it -> ILE default 1.0. 3. (separately) the single-event tune must run on the real GPU: cardassia has ONE card (device 0, NVS 510 sm_30); CUDA_VISIBLE_DEVICES must be 0 (3 -> no device -> cupy off -> --gpu silently disabled -> non-GPU path ignores distmarg -> 6-arg likelihood crash). New `make tune-single` runs one ILE at the injection with the exact wide args (lnL col -4, n_eff col -1) -- the fast way to validate settings before an hour-long DAG. Also added --d-min 1 to match the working demo COMMON. N_COPIES default 1, DMARG_DAG default 1. Remaining: even fixed, this SNR~17.5 source + cal gives low single-event n_eff (~3 at 100k) -- high-SNR+cal is intrinsically hard, motivating the zero-cal burn-in (task oshaughn#24). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Correcting mischaracterizations from the previous commit (per review), to avoid misleading future readers: - --no-adapt-distance is NOT a freeze: distance is then UNIFORMLY sampled (fine for low SNR with sensible d-min/d-max), and is moot once distance marginalization is on (the default here). It did not "cripple n_eff". - --adapt-weight-exponent is a NO-OP for the AV sampler (it matters for GMM/portfolio -- not dropped globally, just not set in the demo). It did not change convergence here. - SNR ~17.5 is moderate, not loud (loud = 40+); 100k samples is SHORT for RIFT (production runs use millions and let AV creep n_eff up). n_eff ~3 at 100k is early, not pathology. The valid finding stands: baseline (no cal) and calmarg give ~equal n_eff, so cal is not the bottleneck. DMARG_DAG=1 (analytic distance) remains the clean default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n-in-neff) Implements task oshaughn#24 (the generally-useful extrinsic-seeding idea). Cal marginalization -- especially cold-start prior cal -- makes the extrinsic integral slow to converge; the extrinsic posterior is ~cal-independent, so burn the sampler in on the cheap ZERO-CAL (n_cal=1) likelihood first, then run the full cal-marginalized integral reusing the adapted proposal. ILE (analyze_event): --calibration-burn-in-neff (+ --calibration-burn-in-nmax cap). When set and calmarg active, before the production sampler.integrate it toggles the analyze_event-local n_cal_for_likelihood to 1 (the likelihood closures read it, so the SAME like_to_integrate now evaluates the fast baseline), runs a burn-in integrate to the target n_eff with a capped nmax, then restores n_cal_for_likelihood and runs the full cal-marg integral. Correctness is preserved regardless of whether the AV sampler retains adaptation across the two integrate() calls -- worst case the burn-in is wasted; the production integral is always the full cal one. Default off. Threaded --calmarg-burn-in-neff through util_RIFT_pseudo_pipe.py (-> args_ile.txt) and a BURN_IN_NEFF toggle in the demo Makefile (tune-single/DAG). Compiles; flags register; demo emits the flag. NEEDS a GPU smoke test of the burn-in->production handoff (does AV reuse/seed adaptation across the two integrates; else add the update_sampling_prior warm-start documented in DESIGN_adaptive_driver.md). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…xible-AV work Per review: AV (the default, most-efficient extrinsic sampler) COMPLETELY RESETS between integrate() calls -- no seedable AV exists -- so the zero-cal burn-in gives AV no speedup (correctness-safe overhead only). Re-seeding AV is also dangerous: AV can only CONTRACT its volume, never EXPAND or SHIFT boundaries, so an off/too-tight warm start would trap the production phase. GMM/portfolio can reuse sampling models (update_sampling_prior/gmm_dict) but are less efficient. Breadcrumbed in DESIGN_adaptive_driver.md + the --calibration-burn-in-neff help: the burn-in is parked (gated, harmless, ready) pending future work that is broadly useful beyond calmarg -- (1) a seedable AV, (2) a boundary-shifting AV that can expand/translate its volume, not only contract. Until then the cal PILOT (across-iteration proposal learning) + the prior-shrinkage backstop are the load-bearing cal pieces. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n_eff creep Per review: a full DAG is overkill for single-event settings tuning, and an interactive background launch buffers stdout (can't watch n_eff creep) and is subject to the NVS 510 display watchdog. tune-condor writes ONE submit file running `python -u <ILE> <wide args> --sim-xml <injection>` with request_GPUs=1 + getenv (cupy), persistent output/error/log, initialdir in rundir_tune. Unbuffered -> live progress; condor -> watchdog-isolated + restartable. Verified: submits, runs on the GPU, streams the iteration/Neff table live. conda run -n rift_gpu2 make tune-condor FUSED=1 DMARG_DAG=1 NMAX_DAG=4000000 tail -f rundir_tune/tune_condor.out ; cat rundir_tune/tune_out*.dat (lnL col -4, neff -1) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Thread RiftFloat through the ILE executables, related waveform and posterior utilities, and likelihood sampler tests so platforms without numpy.float128 use the centralized float64 fallback.
…tion math) Per review: RIFT's reported n_eff is a deliberately CONSERVATIVE lower bound; the true ESS is meaningfully larger, so n_eff(us)=100 yields appreciably more usable fair-draw points. Implications recorded in DESIGN_adaptive_driver.md: the earlier "low n_eff" worry was over-pessimistic (tune-condor reached n_eff>200 on the moderate-SNR injection with the fixed settings, and ESS is larger still); pilot harvesting is LESS starved than the conservative count implied -- a real run can likely pull out enough high-quality points to inform the cal proposal. The d(d+1)/2 full-covariance requirement still holds, but the prior-shrinkage backstop covers the residual unconstrained directions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ac/no-float128) Brings the single commit 702569f (Use portable RIFT float dtype in likelihood tools): threads RiftFloat (RIFT.precision, a float64 fallback where numpy.float128 is unavailable, e.g. macOS) through the ILE executables + posterior/waveform utilities + sampler tests. Clean merge (git merge-tree: 0 conflicts): the float128->RiftFloat swaps are all in the likelihood_function lnL allocations, none of which overlap the calmarg edits. On this Linux box RiftFloat==float128 so numerics are unchanged; macOS now uses float64. The in-flight tune job (cluster 82623) already loaded its code and is unaffected.
…rough (incl. time samples) Adds `make pp` (pp-build + pp-validate): exercises the FULL top-level builder util_RIFT_pseudo_pipe.py -> helper_LDG_Events.py -> args generation -> create_event_*, not just the lower-level builder that dag-build calls directly. Offline build-validate (the established pattern: .travis/test-build.sh + demo/pipeline/zero_spin_phenomD), reusing the zero-spin IMRPhenomD ini + ref coinc + a placeholder fake-data cache, so no GPU/data run is needed for the threading check. Zero spin (--assume-nospin); iterations forced small (--internal-force-iterations). Confirms EVERYTHING threads through the generated pipeline (validated, not just emitted): - calmarg --calibration-envelope-directory / --calibration-n-realizations / --calibration-fused-kernel land in args_ile.txt, ILE.sub AND ILE_extr.sub; - TIME SAMPLES: --add-extrinsic-time-resampling + --internal-ile-srate-time-resampling -> --srate-resample-time-marginalization 4096 in the wide AND extrinsic (ILE_extr) stages, alongside --time-marginalization; - zero-spin IMRPhenomD; small iteration count; full top-level DAG produced. (Note: --last-iteration-extrinsic-time-resampling is a transient builder arg consumed at build time -- its persistent effect is the --srate-resample-time-marginalization in ILE_extr.sub, which is what we assert.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…a (breadcrumb) Primes and launches a full util_RIFT_pseudo_pipe.py pipeline that ACTUALLY RUNS on the zero-noise CI fake data (not just the offline pp-build threading check). Pieces: - util_SimInspiralToCoinc.py makes a real coinc from the injection (H1/L1/V1, t=1000000014, m1=35/m2=30, SNR 17.5) -- `make pp-coinc`/ci_coinc.xml (regenerated, not committed). - calmarg_ci.ini: CI-matched (FAKE-STRAIN channels x3, srate 4096, seglen 8 -> the segment [1000000008.236,1000000016.236] sits inside the cache's 328s frame, fmin 10, zero spin, mc [23,35]). - pp-run-build runs pseudo_pipe with the real zero_noise.cache + CI PSD + calmarg (--calmarg-* --calmarg-fused-kernel) + time-resampling (--add-extrinsic-time-resampling + --internal-ile-srate-time-resampling 4096) + --assume-nospin + small forced iterations, and asserts the RUNNABLE bits threaded (real cache, FAKE-STRAIN, event time, and that ILE_extr.sub carries calmarg + --srate-resample-time-marginalization). - pp-run builds + condor_submit_dag. Verified: builds clean, submits (the extrinsic stage produces TIME SAMPLES with calmarg on). Single GPU on cardassia (CUDA_VISIBLE_DEVICES=0). This is the runnable counterpart to pp-build; leaves a working end-to-end breadcrumb for exercising the full pipeline + calmarg + time samples later. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Background
----------
util_ConstructIntrinsicPosterior_GenericCoordinates.py has long used
three CLI flags for declaring how a parameter is treated:
--parameter X both fit dim AND MC sampling dim
--parameter-implied X fit dim only (the converter produces X from the
data file's columns; the MC integrator never
sees it)
--parameter-nofit X MC sampling dim only (the integrator integrates
over it; the fit never sees it)
util_ConstructEOSPosterior.py declared the same three flags but never
honoured them: the integrator at line 487 hardcoded
`low_level_coord_names=dat_orig_names` in its convert_coords closure,
which only worked when the sampling basis equalled the data-file basis;
sampler.add_parameter iterated over coord_names (the fit basis); the
arity dispatch for likelihood_function keyed on len(coord_names); and
sampler.integrate was passed *coord_names rather than
*low_level_coord_names. The net effect: any user who tried to fit in a
transformed basis (e.g. via the new --supplementary-coordinate-code
plugin) silently got a wrong likelihood evaluation -- the rotation was
applied an extra time inside convert_coords every Monte Carlo step.
What this commit changes
------------------------
bin/util_ConstructEOSPosterior.py
* Parameter-resolution block rewritten to mirror IntrinsicPosterior's
semantics, plus a clean fallback to dat_orig_names when none of the
three flags are supplied (legacy bare-invocation unchanged). Seven
CLI permutations now map to documented (coord_names,
low_level_coord_names) pairs.
* The convert_coords closure used by the integrator captures
low_level_coord_names as its input basis (was dat_orig_names). The
initial dat->X conversion still uses dat_orig_names, since that's
the basis of the file columns.
* Sampler add_parameter loop now iterates over low_level_coord_names
(the MC basis), and sampler.integrate is passed *low_level_coord_names.
* The arity-dispatched likelihood_function definitions key on
len(low_level_coord_names) and route every input -- including the
scalar branches -- through convert_coords so a non-trivial converter
is never silently bypassed.
* Output-writer iterates samples by low_level_coord_names (the keys
sampler._rvs actually carries) and applies the "constant fill"
check in the sampling basis, not the fit basis. Implied (fit-only)
coords correctly skip the output file.
* Added a guard: if low_level_coord_names != coord_names but no
coordinate plugin is supplied, raise a clear error instead of
silently feeding samples through an identity convert_coords into a
fit built in a different basis.
* Help text for --parameter / --parameter-implied / --parameter-nofit
rewritten to describe what each flag actually does now.
RIFT/hyperpipe/coords.py
* HyperCoordSpec.from_strings accepts integration ranges for names in
coords-nofit (the MC sampling basis is coords-fit + coords-nofit);
unknown range names are still rejected.
* HyperCoordSpec.validate accepts empty coords-fit so long as
coords-implied covers the fit basis and coords-nofit covers the
sampling basis; emits distinct errors for empty-fit vs empty-sample.
* to_parameter_args emits --integration-parameter-range for the
sampling basis (parameters + nofit), not just parameters.
* to_puff_args and to_test_args emit --parameter for the sampling
basis -- the puff lane and convergence-test driver operate on the
data-file columns, which is the sampling basis after decoupling.
RIFT/hyperpipe/config.py
* validate_config accepts empty coords-fit when coords-implied
(fit-side) and coords-nofit (sample-side) are non-empty.
demo/hyperpipe/hyperpipe_conf_linear_uvw.yaml
* Rewritten to actually exercise the decoupled path: coords-implied
"u v w" (fit), coords-nofit "x y z" (sample), coords-sample ranges
in (x, y, z), coord-module pointing at the linear plugin with the
uvw_rotated chart. Iteration / puff / marg stay in (x, y, z); the
EOS posterior fits in (u, v, w) and writes its posterior in
(x, y, z).
Verified
--------
* Parameter-resolution unit test (in this commit's worktree) covers 7
CLI permutations -- legacy no-flags, legacy --parameter, IntrinsicPosterior
--parameter+implied and --parameter+nofit, the new --implied-only,
--implied+nofit, and full --parameter+implied+nofit -- all map to
the documented (coord_names, low_level_coord_names) pairs.
* HyperCoordSpec unit test covers the new decoupled emit (post sees
implied/nofit and ranges; puff/test see the sampling basis only),
a legacy-regression case (unchanged output), the two new validation
errors (empty fit, empty sample), the new "range for nofit name"
permission, and the still-rejected "unknown range name" case.
* AST + yaml parses on every edited file.
* validate_config passes on hyperpipe_conf_linear_uvw.yaml plus the
demo's baseline and tracer yamls.
Adds --supplementary-coordinate-{code,function,ini,chart} plus the two
input/output parameter list flags to plot_posterior_corner.py, mirroring
the surface already in util_ConstructEOSPosterior.py. When the plugin
flag is set, _materialize_plugin_columns runs once per loaded posterior
and once per loaded composite file *after* the existing RIFT
postprocessing -- it computes the plugin's output columns from existing
record-array fields and splices them in via add_field.
Critically, the hook is strictly ADDITIVE. Any output name already
present in samples.dtype.names is skipped, so the hardcoded
extract_combination_from_LI and the per-file postprocess loops
(mc / eta / chi_eff / LambdaTilde / chi1_perp / ...) always win. Legacy
invocations with no --supplementary-coordinate-code flag are byte-
identical to the pre-plugin tool -- the helper returns input unchanged
when the converter is None.
CLI surface
-----------
--supplementary-coordinate-code SPEC
'rift_default' | filesystem path to a .py | dotted module name.
--supplementary-coordinate-function NAME
Entry-point callable. Defaults to 'convert_coordinates'.
--supplementary-coordinate-ini PATH
Optional; parsed and handed to prepare().
--supplementary-coordinate-chart NAME
Required only when the plugin defines multiple charts.
--supplementary-coordinate-input-parameter NAME (action='append')
Override the plugin-declared INPUT_PARAMETERS / chart's
input_parameters list.
--supplementary-coordinate-output-parameter NAME (action='append')
Override the plugin-declared OUTPUT_PARAMETERS / chart's
parameters list.
When the input / output name lists aren't given on the CLI they're
resolved from CHARTS[chart] (input_parameters, parameters) and then from
the module-level INPUT_PARAMETERS / OUTPUT_PARAMETERS attributes.
Verified
--------
Five synthetic cases on an (m1, x, y, z) record array with the linear
plugin requesting (u, v, w):
* happy path -- (x, y, z) -> (u, v, w) values match
u=(x+y)/sqrt(2), v=(y-x)/sqrt(2), w=z on every row; (m1, x, y, z)
untouched.
* output-reorder -- works regardless of the order u/v/w are listed.
* name-collision -- samples pre-seeded with u=99; the plugin leaves u
alone (RIFT path wins) and still adds v, w.
* missing-input -- samples without an x column; helper logs a skip,
returns input unchanged, no crash.
* no-plugin -- helper is identity (out is samples).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Odd that there are other commits coming in here?