Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
0b74e00
calmarg DAG: fix write_calpilot_sub module + add local-condor DAG tes…
oshaughn May 31, 2026
b390011
Fix stdout pollution corrupting composites (precession warning -> std…
oshaughn May 31, 2026
a5d8939
calmarg pilot: relax first CIP stage sigma-cut for the prior-cal cold…
oshaughn May 31, 2026
5ece22d
demo/calmarg: parameterize DAG target (PILOT/FUSED) + vanilla fused-c…
oshaughn May 31, 2026
ec2226d
calmarg pilot backstop: shrink fitted proposal toward prior (fixes st…
oshaughn May 31, 2026
785e545
calmarg: frame zero-cal burn-in of the extrinsic sampler (design)
oshaughn May 31, 2026
4296c55
demo/calmarg: fix wide-ILE settings that crippled n_eff (single-event…
oshaughn May 31, 2026
e333ff9
demo/calmarg: correct misleading tuning comments (record accurate facts)
oshaughn May 31, 2026
cf16b03
calmarg: zero-cal burn-in of the extrinsic sampler (--calibration-bur…
oshaughn May 31, 2026
b75ae0c
calmarg burn-in: breadcrumb AV reset limitation + future seedable/fle…
oshaughn May 31, 2026
d0497a8
demo/calmarg: tune-condor target (single condor job, unbuffered) for …
oshaughn May 31, 2026
702569f
Use portable RIFT float dtype in likelihood tools
oshaughnessy-junior May 31, 2026
f4272a8
calmarg: note n_eff is conservative vs true ESS (softens pilot-starva…
oshaughn May 31, 2026
d595fa4
Merge rift_O4d_junior_distance_RiftFloat: portable RiftFloat dtype (M…
oshaughn May 31, 2026
8fa9a8d
demo/calmarg: pp-build/pp-validate -- top-level pseudo_pipe thread-th…
oshaughn Jun 1, 2026
26f8d83
demo/calmarg: pp-run -- RUNNABLE top-level calmarg pipeline on CI dat…
oshaughn Jun 1, 2026
8aaba0c
util_ConstructEOSPosterior: decouple fit basis from MC sampling basis
oshaughnessy-junior Jun 2, 2026
d1995aa
demo/hyperpipe: README tour of the four yaml configs
oshaughnessy-junior Jun 2, 2026
5643386
plot_posterior_corner: additive coordinate-plugin hook
oshaughnessy-junior Jun 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,74 @@

Status: **planning** (do not implement the multi-stage loop until we pick a path).

## Zero-cal burn-in of the extrinsic sampler (proposed; ILE-level, analyze_event)

**Problem.** Calibration marginalization makes the extrinsic integral much harder to
converge: with cal drawn from the broad PRIOR the per-time lnL has a large dynamic range,
so the adaptive sampler (AV) struggles to reach a useful `n_eff` -- and on the FIRST
iteration there is no learned cal proposal yet, so every cold-start ILE job is in this
regime. Empirically (local DAG run) iteration-0 points come out with `sigma ~ 0.7-0.9`
and few effective samples; high-SNR sources are hard even WITHOUT cal, and cal makes it
worse. We are effectively failing to seed the *intrinsic* grid because the *extrinsic*
sampler never gets going.

**Idea (O'Shaughnessy).** Burn the sampler in on a *different, cheaper* likelihood first
-- the ZERO-CAL (n_cal=1) baseline -- until it reaches a minimal `n_eff`, so the
extrinsic sampling proposal is "roughly right", THEN switch to the full cal-marginalized
likelihood for the production estimate. The extrinsic posterior shape is nearly the same
with and without cal (cal mostly rescales / mildly shifts lnL), so the burned-in proposal
is an excellent warm start -- and the zero-cal evaluations are ~`n_cal`x cheaper.

**Where it lives.** `analyze_event` in `integrate_likelihood_extrinsic_batchmode`. The
likelihood closures already read the module-scope `n_cal_for_likelihood`; the sampler also
already supports a warm start via `sampler.update_sampling_prior(..., external_rvs=...)`
(the existing `oracleRS` path, ~line 2078). Two viable mechanisms:

1. **Two-phase integrate (simplest).** Set `n_cal_for_likelihood = 1`, call
`sampler.integrate(likelihood_function, ..., n_eff=burn_in_neff)` (the closures now
evaluate the fast zero-cal baseline), then restore `n_cal_for_likelihood` and call
the production `sampler.integrate(...)` WITHOUT resetting -- reusing the adapted
proposal. Risk: AV's reset semantics across two integrate() calls in one
analyze_event are unverified (it "always resets every iteration" between DAG
iterations; need to confirm it does NOT reset at the start of integrate()).
2. **Warm-start via update_sampling_prior (robust).** Run the zero-cal burn-in,
harvest its drawn extrinsic samples + lnL, and feed them to
`sampler.update_sampling_prior(external_rvs=...)` exactly like the oracle path, then
run the production integrate. Survives regardless of integrate()'s reset behavior.

**Proposed flag.** `--calibration-burn-in-neff <float>` (0/None = off): target n_eff for
the zero-cal burn-in (capped by a fraction of n-max). Default off; opt-in.

**Relation to the bigger seeding plan.** This is the in-job version of the same idea as
the *zero-cal pilots* that seed the intrinsic/extrinsic grids for high-SNR sources: burn
in cheaply, then pay for cal only once the sampling is on-target. The pilot (`pilot.py`,
util_CalPilotStage) seeds the CAL proposal across iterations; this burn-in seeds the
EXTRINSIC proposal within a single job. They compose.

Status: implemented behind `--calibration-burn-in-neff` (two-phase integrate, toggling
`n_cal_for_likelihood`), correctness-safe (the production integral is always the full cal
one). BUT see the sampler limitation below.

### Sampler limitation (measured / per review) -- BREADCRUMB for future work
The default and most efficient extrinsic sampler, **AV (mcsamplerAdaptiveVolume),
COMPLETELY RESETS between `integrate()` calls** -- there is currently no seedable AV. So
the two-phase burn-in gives AV **no speedup** (the adapted proposal is thrown away); it is
only ever correctness-safe overhead there. Moreover, **re-seeding AV is inherently
dangerous**: AV can only *contract* its volume, never *expand* or *shift* its boundaries,
so a burn-in proposal that is slightly off / too tight would trap the production phase in
the wrong region. The other integrators (**GMM**, **portfolio**) CAN reuse sampling
models (their `update_sampling_prior` / `gmm_dict` hooks) but are less efficient overall.

Therefore the zero-cal burn-in only pays off once we have:
1. a **seedable AV** (warm-start AV from external samples / a prior proposal), and/or
2. a **more flexible, boundary-shifting AV** that can EXPAND and TRANSLATE its sampling
volume (not only contract) -- so a warm start can't trap it.
Both would be broadly useful well beyond calmarg (every iteration, every warm start).
Until then: keep the burn-in flag (harmless, gated, ready) but do NOT rely on it for AV;
for GMM/portfolio it can warm-start via their model-reuse hooks (untested). The cal PILOT
(across-iteration proposal learning) and the prior-shrinkage backstop remain the load-
bearing pieces for cal; extrinsic seeding waits on seedable AV.

## Where we are

- In-loop calmarg works and is validated (loop == fused == reference ~1e-14; CPU+GPU;
Expand Down Expand Up @@ -135,6 +203,19 @@ prior + importance-weight metadata. Designed from the start to ALSO carry an ex
proposal (same harvest->fit->consolidate->seed structure). NF is a later drop-in behind
the same interface. Stub the schema + the consolidation/seed hooks now.

## n_eff is conservative vs the true ESS (refines the starvation math)
RIFT's reported `n_eff` is a deliberately CONSERVATIVE lower bound -- the true effective
sample size (ESS) is meaningfully larger. So `n_eff(us)=100` yields appreciably more
usable fair-draw points than 100. Consequences:
- The earlier "low n_eff" worry was over-pessimistic: with enough samples the integrator
creeps up fine (the tune-condor run reached n_eff>200 on the moderate-SNR injection), and
the usable ESS is larger still.
- Pilot harvesting is LESS starved than the conservative count implied: top-fraction of
the composite + the larger-than-n_eff ESS means a real run can pull out enough
high-quality points to inform the cal proposal after all. The d(d+1)/2 requirement for
a FULL covariance still holds, but the prior-shrinkage backstop covers the residual
unconstrained directions, so we do not need to fully resolve every cal dof to be safe.

## Build order (this branch)
1. **Timing data** -- done (`--scan-ncal`).
2. **A: brute-force reference** -- prior-only large-`n_cal`, converged; the ground truth.
Expand Down
30 changes: 27 additions & 3 deletions MonteCarloMarginalizeCode/Code/RIFT/calmarg/adaptive.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,18 +94,41 @@ def nodes_to_cal_factors(amp_nodes, phase_nodes, log_f_nodes, T_segment, dT, fmi
# ---------------------------------------------------------------------------
# Tempered proposal fit + diagnostics
# ---------------------------------------------------------------------------
def fit_proposal(nodes, log_resp, beta, cov_floor=1e-8, cov_inflate=1.0):
def fit_proposal(nodes, log_resp, beta, cov_floor=1e-8, cov_inflate=1.0,
prior_sigma=None, shrink=None):
"""Tempered weighted-Gaussian fit. Weights = softmax(beta * log_resp).

beta in (0,1]: small -> broad (many samples), 1 -> full responsibility weighting.

prior_sigma : if given (length-dim 1-sigma of the diagonal prior), SHRINK the fitted
covariance toward diag(prior_sigma**2). This is essential when the fit is starved
-- a weighted sample covariance from ~neff effective points cannot constrain the
dim*(dim+1)/2 entries of a dim-dimensional covariance (cal node space is ~60-D),
so the UNINFORMED directions otherwise collapse to ~0 variance. A near-zero
proposal variance is a near-delta: seeded draws are pinned and the importance
weights log(prior/proposal) blow up, producing the pathological seeded likelihoods
we saw. Shrinking keeps uninformed directions at ~prior width (log_w ~ 0 there).
shrink : explicit shrinkage weight rho in [0,1] toward the prior; default auto =
(dim+1)/(dim+1+neff), i.e. ~1 (all prior) when starved, ->0 (all data) when
neff >> dim.

Returns (mean, cov)."""
lw = beta * log_resp
lw = lw - logsumexp(lw)
w = np.exp(lw)
mean = w @ nodes
d = nodes - mean
cov = (w[:, None] * d).T @ d
cov = cov_inflate * cov + cov_floor * np.eye(mean.shape[0])
dim = mean.shape[0]
if prior_sigma is not None:
prior_sigma = np.asarray(prior_sigma, dtype=float)
neff = neff_from_logweights(beta * log_resp)
rho = shrink if shrink is not None else (dim + 1.0) / (dim + 1.0 + neff)
rho = float(min(max(rho, 0.0), 1.0))
cov = (1.0 - rho) * cov_inflate * cov + rho * np.diag(prior_sigma ** 2)
else:
cov = cov_inflate * cov
cov = cov + cov_floor * np.eye(dim)
return mean, cov


Expand Down Expand Up @@ -156,7 +179,8 @@ def adaptive_cal(evaluate, prior_mean, prior_sigma, n_nodes_amp,
# next proposal from tempered posterior responsibilities; inflate the covariance
# early (while tempering is on) to keep exploring, relax as beta -> 1.
beta = float(betas[min(it, len(betas) - 1)])
mean, cov = fit_proposal(nodes, log_resp, beta, cov_inflate=1.0 + (1.0 - beta))
mean, cov = fit_proposal(nodes, log_resp, beta, cov_inflate=1.0 + (1.0 - beta),
prior_sigma=prior_sigma)
neff_resp = neff_from_logweights(log_resp)
neff_w = neff_from_logweights(log_w)
history.append(dict(iter=it, beta=beta, neff_resp=neff_resp, neff_w=neff_w))
Expand Down
20 changes: 16 additions & 4 deletions MonteCarloMarginalizeCode/Code/RIFT/hyperpipe/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,12 +208,24 @@ def validate_config(cfg) -> None:
if not isinstance(_get(arch, "n-samples-per-job"), int) or _get(arch, "n-samples-per-job") <= 0:
raise ValueError("arch.n-samples-per-job must be a positive integer.")

# post: at least one of (coords-fit) must be set
# post: must have at least one fit dim AND at least one MC sampling dim.
# Fit basis = coords-fit + coords-implied; MC basis = coords-fit + coords-nofit.
# Pre-decoupling this only required coords-fit, because the fit basis was
# forced to equal the MC basis -- now an EOS-style "fit in a transformed
# basis" config can legally have empty coords-fit (everything routed via
# coords-implied + coords-nofit through the coordinate plugin).
post = _get(cfg, "post")
if not _get(post, "coords-fit"):
has_fit = bool(_get(post, "coords-fit")) or bool(_get(post, "coords-implied"))
has_samp = bool(_get(post, "coords-fit")) or bool(_get(post, "coords-nofit"))
if not has_fit:
raise ValueError(
"post.coords-fit must list at least one parameter "
"(e.g. 'x y z')."
"post: must list at least one fit dimension "
"(coords-fit or coords-implied; e.g. 'x y z' or 'u v w')."
)
if not has_samp:
raise ValueError(
"post: must list at least one MC sampling dimension "
"(coords-fit or coords-nofit; e.g. 'x y z')."
)

# init: must have either file or generation set
Expand Down
71 changes: 54 additions & 17 deletions MonteCarloMarginalizeCode/Code/RIFT/hyperpipe/coords.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,12 +177,22 @@ def from_strings(
``coords_nofit`` : "delta_mc s1z s2z" (optional)
``likelihood_factor``: (module, function, ini) (any element may be None)
"""
params = parse_parameter_list(coords_fit)
ranges = parse_range_string(coords_sample)
unknown = set(ranges) - set(params)
params = parse_parameter_list(coords_fit)
implied = parse_parameter_list(coords_implied)
nofit = parse_parameter_list(coords_nofit)
ranges = parse_range_string(coords_sample)
# coords-sample provides INTEGRATION ranges, so it has to cover every
# name in the MC SAMPLING basis -- which is coords-fit + coords-nofit.
# (implied names are fit-only and don't need a sample range.) Pre-
# decoupling this was just coords-fit because nofit/implied were
# rarely used and the sampling basis was forced to equal the fit
# basis; now we have to allow the nofit names too.
sampling_basis = set(params) | set(nofit)
unknown = set(ranges) - sampling_basis
if unknown:
raise ValueError(
f"coords-sample names a parameter not in coords-fit: {sorted(unknown)!r}"
f"coords-sample names a parameter not in coords-fit or "
f"coords-nofit: {sorted(unknown)!r}"
)
lf: Optional[Tuple[str, Optional[str], Optional[str]]] = None
if likelihood_factor:
Expand All @@ -195,8 +205,8 @@ def from_strings(
name=name or None,
parameters=params,
parameter_ranges=ranges,
implied=parse_parameter_list(coords_implied),
nofit=parse_parameter_list(coords_nofit),
implied=implied,
nofit=nofit,
likelihood_factor=lf,
)

Expand All @@ -210,13 +220,30 @@ def validate(self, strict_import: bool = False) -> None:
environment (singularity image, OSG worker) and not necessarily
on the submit host.
"""
if not self.parameters:
raise ValueError("HyperCoordSpec requires at least one fitting parameter.")
missing = [p for p in self.parameters if p not in self.parameter_ranges]
# The fit basis is coords-fit + coords-implied; the sampling basis is
# coords-fit + coords-nofit. Both must be non-empty for the run to
# make sense. Pre-decoupling this only required coords-fit -- now
# an "EOS-style fit in a transformed basis" config can legally have
# empty coords-fit (everything goes through implied + nofit).
if not self.parameters and not self.implied:
raise ValueError(
"HyperCoordSpec requires at least one fit dimension "
"(coords-fit or coords-implied)."
)
if not self.parameters and not self.nofit:
raise ValueError(
"HyperCoordSpec requires at least one MC sampling dimension "
"(coords-fit or coords-nofit)."
)
# Every name in the SAMPLING basis must have an integration range
# (the integrator reads prior_range_map[p] for p in low_level_coord_names).
sampling_names = list(self.parameters) + list(self.nofit)
missing = [p for p in sampling_names if p not in self.parameter_ranges]
if missing:
raise ValueError(
f"No integration range supplied for parameter(s): {missing!r}. "
"Every entry in coords-fit must appear in coords-sample."
f"No integration range supplied for sampling parameter(s): "
f"{missing!r}. Every entry in coords-fit and coords-nofit must "
"appear in coords-sample."
)
for p, (lo, hi) in self.parameter_ranges.items():
if not lo < hi:
Expand Down Expand Up @@ -266,7 +293,9 @@ def to_parameter_args(self) -> str:
bits.append(f"--parameter-implied {p}")
for p in self.nofit:
bits.append(f"--parameter-nofit {p}")
for p in self.parameters:
# Integration ranges cover the MC SAMPLING basis (parameters + nofit).
# Implied coordinates are fit-only and don't have a sampling range.
for p in list(self.parameters) + list(self.nofit):
lo, hi = self.parameter_ranges[p]
bits.append(
f"--integration-parameter-range {p}:[{self._fmt_num(lo)},{self._fmt_num(hi)}]"
Expand All @@ -290,12 +319,16 @@ def to_post_args(self) -> str:
def to_puff_args(self, force_away: float = 0.03, puff_factor: float = 0.5) -> str:
"""Emit the puff-stage arg block.

By default we puff in every fitting parameter; this is what every
existing hyperpipe example does. Extra flags can be appended by the
caller.
The puff lane reads / writes grid files in the data-file column
basis, which is the MC sampling basis (coords-fit + coords-nofit).
Pre-decoupling this only emitted --parameter for coords-fit because
the sampling basis was forced to equal the fit basis; once those
diverge (EOSPosterior with --parameter-implied for a transformed
fit basis), the puff lane must continue to operate on the data-
file columns -- i.e. coords-fit + coords-nofit.
"""
bits = [f"--force-away {force_away}", f"--puff-factor {puff_factor}"]
for p in self.parameters:
for p in list(self.parameters) + list(self.nofit):
bits.append(f"--parameter {p}")
return " ".join(bits)

Expand All @@ -304,8 +337,12 @@ def to_test_args(self, method: str = "JS", threshold: float = 0.05) -> str:

Mirrors the args_test.txt pattern from the Gaussian demo:
``--parameter x --parameter y --parameter z --method JS --threshold 0.05``

Like the puff lane, this operates on the SAMPLING basis (coords-fit
+ coords-nofit) -- the convergence-test driver reads grid / posterior
files whose columns are in the sampling basis.
"""
bits = [f"--parameter {p}" for p in self.parameters]
bits = [f"--parameter {p}" for p in list(self.parameters) + list(self.nofit)]
bits.append(f"--method {method}")
bits.append(f"--threshold {threshold}")
return " ".join(bits)
Expand Down
5 changes: 4 additions & 1 deletion MonteCarloMarginalizeCode/Code/RIFT/lalsimutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,10 @@
try:
import precession
except ImportError:
print('Import Error - module missing. Please install the module "precession."')
# MUST go to stderr: many RIFT tools write data to stdout (composite/.dat files are
# built from it), and a stray stdout line here corrupts those files -> downstream
# parsers (CIP, util_CalHarvestGrid) choke on ragged/non-numeric rows.
print('Import Error - module missing. Please install the module "precession."', file=sys.stderr)
def safe_int(mystr):
try:
return int(mystr)
Expand Down
Loading