Skip to content

Authoritative-oxygen mode#20

Merged
timlichtenberg merged 94 commits into
mainfrom
tl/fo2-source-framework
Jun 1, 2026
Merged

Authoritative-oxygen mode#20
timlichtenberg merged 94 commits into
mainfrom
tl/fo2-source-framework

Conversation

@timlichtenberg
Copy link
Copy Markdown
Member

@timlichtenberg timlichtenberg commented May 14, 2026

@nichollsh, this is ready to review. It is the chemistry half of the coupled oxygen-accounting change and the dependency that has to merge first, so it would be great to merge it as soon as you can, ideally by Wednesday 3 June, so we can move on to the PROTEUS PR (FormingWorlds/PROTEUS#678). It is a focused, self-contained change: a new authoritative-oxygen entry point, the default iron-wustite buffer, and the surrounding test and docs infrastructure.

Paired PR and merge order. PROTEUS imports equilibrium_atmosphere_authoritative_O from this branch. Merge order is this PR (CALLIOPE#20) first, then PROTEUS#678. The full reviewer set and the whole-planet oxygen accounting context live on the paired PROTEUS PR. General comments are welcome here too.


Description

Three coupled changes plus the test and docs infrastructure that surrounds them.

(1) Authoritative-oxygen entry point. A new equilibrium_atmosphere_authoritative_O takes total O mass alongside H/C/N/S and back-solves for the iron-wustite buffer offset that produces that O budget at equilibrium. The legacy equilibrium_atmosphere (fO2 in, O derived) is unchanged. Chemistry half of the whole-planet oxygen accounting framework on the PROTEUS side (PROTEUS PR #678); legacy mode stays the default for ordinary buffered-mode work.

(2) Default IW buffer flipped to Fischer 2011. With the legacy O'Neill default, raw cross-backend disagreement (CALLIOPE vs atmodeller) at the canonical Earth fiducial is around 0.42 dex; with Fischer it drops to around 0.16 dex, residual after analytical buffer correction at or below 0.1 dex below 2000 K and around 0.28 dex at 3000 K. Fischer is within about 0.2 dex of the Hirschmann composite (atmodeller's default) across the magma-ocean range; O'Neill diverges to around 1 dex at the hot end. OxygenFugacity('oneill') remains available.

(3) Diataxis docs pass for the dual-mode chemistry, the cross-backend comparison, and primary-source verification of the surrounding solubility and oxygen-fugacity reference tables.

Also rolled out: the same test and docs infrastructure other PROTEUS-ecosystem submodules use (markers, anti-happy-path linter, validation anchor pages, coverage ratchet, fast PR gate, nightly full gate). Each of the five physics sources now carries at least one reference-pinned test against a published benchmark or analytical limit, with a per-source anchor page recording the citation.

No physics laws change. Both entry points share the same private helpers, Dasgupta N2 solubility, and Gaillard S2 solubility.

What this changes

Solver (src/calliope/solve.py, oxygen_fugacity.py, chemistry.py). The legacy 4-element residual becomes a 5-element (H2O, CO2, N2, S2, fO2_shift) system in the new entry point, the fifth equation enforcing total atmospheric + dissolved O against the user target. Hardening: NaN- and complex-aware clipping, per-element tolerances, input validation, deterministic RNG, typed RuntimeError on non-convergence. IW-buffer default flipped to Fischer; hardcoded test references updated.

Docs. New dual-mode pages (docs/Explanations/authoritative_oxygen.md, docs/How-to/authoritative_oxygen.md) plus refs threaded through 11 sibling pages and mkdocs.yml. New docs/Explanations/cross_backend_comparison.md compares CALLIOPE vs atmodeller at the Earth fiducial (Krijt 2023 BSE, $T = 2000$ K, $\Phi = 1$), enumerates six sources of disagreement, isolates the buffer systematic analytically, quantifies the chemistry residual across $T \in [1500, 3000]$ K, with five reproducible figures. New tutorials: earth_fiducial.md, mars_fiducial.md, coupled_loop.md, phase_diagram.md. Per-source validation anchor pages under docs/Validation/ wired into the nav with footnote-link citations. ADS links converted to SciX; 0.325 Earth core mass fraction re-attributed Zeng 2016 → Wang, Lineweaver & Ireland 2018 (arxiv:1708.08718).

Primary-source verification (per-page diffs in solubility.md + oxygen_fugacity.md). Solubility envelope rebuilt one row per composition (Sossi 2023, Dixon 1995, Hamilton 1964, Wilson & Head 1981, Newcombe 2017, Armstrong 2015, Ardia 2013, Libourel 2003, Dasgupta 2022, Gaillard 2022). fO2 table: Mercury split Fe/S (Cartier & Wood 2019), Mars to $\approx$ IW (Wadhwa 2001), Earth by depth bin (Frost & McCammon 2008), asteroidal row dropped (Doyle 2019). Spot fixes: Fischer 2011 description + DOI; Ardia 2013 → Eq. (8) / Fig. 11; Nicholls 2024 H2 threshold $-2 \to -1$; Nicholls 2026 L 98-59 d → IW$-4$ to IW$-1$; Sossi 2020 Earth anchor $+3.5 \pm 0.5 \to \approx +3.5$.

Cross-backend harness (scripts/cross_backend/). Five figure modules + shared infrastructure (runners.py provides _with_calliope_buffer for swapping the class default Fischer/O'Neill). Deterministic; CSVs ship with PDFs.

Tests and infrastructure. Authoritative-O coverage in test_authoritative_O_validation.py (typed-exception contract), test_authoritative_O.py (result-dict over $\Delta\mathrm{IW} \in {-4,-2,0,+2,+4,+6}$, round-trip within 0.05 dex, random_seed reproducibility, RuntimeError on unreachable targets), test_authoritative_O_monotonicity.py (slow tier). Invariants suite (test_invariants.py, test_invariants_unit.py, hypothesis fuzz in test_invariants_hypothesis.py) covers conservation, positivity, VMR closure, Keq identity, solubility monotonicity, calibration edges. New markers @pytest.mark.physics_invariant and @pytest.mark.reference_pinned, one each per physics source. AST linter tools/check_test_quality.py blocks regression past a one-way baseline. Fast PR gate (unit + smoke) and nightly full gate, ratcheting toward 90% ecosystem ceiling. tests.yaml runs Ubuntu + macOS / Python 3.12 + 3.13; nightly.yml adds Codecov upload.

Why

PROTEUS issue #677 (M_atm > M_planet at high H_ppmw) reduces to a structural asymmetry: every M_atm aggregation includes O (H2O/CO2/SO2 molecular masses) but every M_planet aggregation explicitly skipped O on the assumption the IW buffer made it derivable. That breaks when the user wants O as a primary budget input; the solver needs to support both directions. The buffer flip lands at the same time because the cross-backend comparison driving it depends on the new entry point for its round-trip closure check. The docs verification corrected entries across several pages where the previous text did not match the cited papers. The test infrastructure brings CALLIOPE in line with the rest of the ecosystem and codifies the invariants the dual-mode chemistry must satisfy.

Validation of changes

  • Full unit + smoke suite green locally (macOS, Python 3.12); linter, structure validator, ratchet guard pass.
  • Round-trip recovery within 0.05 dex at $\Delta\mathrm{IW} \in {-2,0,2,4,6}$; result-dict contract covers $\Delta\mathrm{IW} \in {-4,-2,0,+2,+4,+6}$. Input-validation surfaces typed exceptions on every contract; unreachable targets raise RuntimeError; random_seed produces bit-identical output across calls.
  • Property-based fuzz confirms monotonicity, finiteness, and non-negativity of the Dasgupta + Gaillard laws, both IW buffers, and the six modified Keq fits across the full input space.
  • Cross-backend harness produces deterministic CSV + PDF outputs from python -m scripts.cross_backend.figN_*.
  • Reference-pinned anchors pass: Fischer 2011 IW at 2000 K with O'Neill 2002 discrimination guard; JANAF (Chase 1998) closed-form $K_{\mathrm{eq}}$ for H2O at 2000 K with Schaefer cross-check; Sossi 2023 peridotite H2O and Gaillard 2022 S2 against the published parameterisations; Wang, Lineweaver & Ireland 2018 0.325 core mass fraction via the structure module; forward / authoritative-O round-trip at the Earth fiducial.
  • Zensical docs build clean (no unresolved-link warnings, all Validation/* pages reachable from the nav). Pre-commit, ruff format, ruff check pass.

Notes for review

Chemistry half of a coupled change. The PROTEUS-side consumer (the wrappers in src/proteus/outgas/calliope.py and src/proteus/outgas/atmodeller.py that route planet.fO2_source = "from_O_budget" to the new entry point) lives in PROTEUS PR #678 and depends on this PR landing first. PROTEUS pyproject.toml currently pins fwl-calliope to this branch via git URL; once this PR merges and a release ships, the pin reverts to a normal version. hypothesis >= 6.0 is added to [project.optional-dependencies] develop; no new runtime dependency.

Checklist

  • I have followed the contributing guidelines
  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • My changes generate no new warnings or errors
  • I have checked that the tests still pass on my computer
  • I have updated the docs, as appropriate
  • I have added tests for these changes, as appropriate
  • I have checked that all dependencies have been updated, as required

Bit-for-bit equivalent refactor that prepares for an authoritative-O
solver mode. Three internal helpers added: _get_partial_pressures,
_atmosphere_mass, _dissolved_mass. Each takes fO2_shift as an
explicit positional argument rather than reading
ddict['fO2_shift_IW']. The corresponding public functions
get_partial_pressures, atmosphere_mass, dissolved_mass become
one-line wrappers that pass ddict['fO2_shift_IW'] through.

No behaviour change. All 126 existing tests pass unchanged.

Motivation: the upcoming `equilibrium_atmosphere_authoritative_O`
entry point treats fO2_shift as a 5th unknown in the chemistry
solver (alongside the four primary partial pressures), so the
underlying physics functions must accept fO2_shift as an explicit
argument rather than read it from a shared dict. Plumbing it
explicitly also makes the data flow easier to follow when chasing
chemistry bugs.
New solver mode for whole-planet oxygen accounting (PROTEUS Path C
following the issue #677 bookkeeping fix). The legacy mode takes
fO2_shift_IW as a config input and derives the atmospheric+dissolved
O mass that follows from chemistry; the new mode inverts the
relationship and takes the total O mass as the budget input, returning
the fO2_shift_derived that is self-consistent with that budget.

Mathematical model. Four primary partial pressures (H2O, CO2, N2, S2)
become five unknowns with fO2_shift_IW joining as the fifth. The four
existing mass-balance residuals for H, C, N, S are joined by a fifth
for O. Both ``func_authoritative_O`` and the main entry point reuse
the existing physics helpers _get_partial_pressures, _atmosphere_mass,
_dissolved_mass (added in the previous refactor commit), so all
chemistry is shared bit-for-bit with equilibrium_atmosphere.

Solver. scipy.optimize.fsolve with trust-constr fallback, Monte-Carlo
restarts on cold-start; mirrors the existing solver. Pressure bounds
[0, 1e7] bar. fO2_shift bounds [-12, +12] log10 units (physical range
~[-6, +8]; the wider bound gives trust-constr room without escaping
to non-physical territory). On restart the fO2 guess redraws from
Uniform(-6, +8) so the solver gets a chance from a different basin.

Output dict. Same fields as equilibrium_atmosphere, plus two new
keys: ``fO2_shift_derived`` (converged value) and ``O_res`` (5th
mass-balance residual, kg). PROTEUS-side wrappers consume the same
fields regardless of which mode ran.

Round-trip smoke. With the legacy mode's O_kg_total as the target,
the new mode converges to fO2_shift_derived = fO2_shift_IW to within
1e-4 and recovers the legacy M_atm to within 1e-6 relative. All 126
existing CALLIOPE tests still pass; formal monotonicity sweep and
round-trip tests follow in the next commit.
…oints

The new entry point's residual chain was crashing with ZeroDivisionError
on the majority of realistic inputs because every species pressure could
collapse to zero (via a NaN-clip that folded NaN into 0) and the mean
molar mass division then divided by zero. Once that path was unblocked,
several smaller robustness gaps in the same code became visible too.

Robustness in the chemistry chain (affects both solver modes):
- NaN/Inf clip in _get_partial_pressures: explicit isfinite check
  replaces the bare max(0.0, x), which preserved NaN whenever NaN was
  the second arg.
- Zero-pressure guard in atmosphere_mean_molar_mass: when ptot < 1e-30,
  return a sentinel of 1.0 g/mol; downstream mass tallies multiply by
  zero anyway so the residual gets a clean "empty atmosphere" signal.
- CO2 contribution to atmospheric O is now unconditional, mirroring the
  unconditional CO2 contribution to atmospheric C. The previous gate on
  is_included('CO2') made element bookkeeping asymmetric: in any state
  where CO2_included is 0 but the solver carries a non-zero CO2 partial
  pressure, the O tally lost atoms that the C tally still counted.

Hardening of equilibrium_atmosphere_authoritative_O entry point:
- target_d validation: all five element keys (H, C, N, S, O) must be
  present, finite, and non-negative. Missing keys raise KeyError;
  non-finite or negative values raise ValueError. Previously only
  missing 'O' was checked, and other bad inputs raised opaque KeyError
  or RuntimeError from deep inside fsolve.
- ddict validation: M_mantle, Phi_global, T_magma, gravity, radius
  must be present and physical. T_magma outside the Dasgupta/Gaillard
  solubility-law calibration range (1373-1973 K) raises a single
  UserWarning so users notice when the solver is extrapolating.
- fO2_hint validation: must be finite and within [-12, +12].
- nguess and nsolve must be >= 1 (previously nguess=0 raised
  UnboundLocalError on the post-loop success check).
- Each fsolve/trust-constr call is wrapped in try/except for
  ZeroDivisionError, FloatingPointError, ValueError; a crashed
  attempt is treated as a failed Monte-Carlo restart so the loop
  continues instead of propagating the crash. The post-loop
  RuntimeError contract is now honoured.
- Per-element residual gate replaces the legacy max-of-targets
  scalar tolerance: each residual must satisfy
  |res_i| <= max(target_i * rtol, atol / 5). The legacy gate was
  dominated by the largest target (O budgets are ~5 orders of
  magnitude larger than N) and accepted up to ~10% N error.
- random_seed kwarg threads a np.random.Generator through the
  Monte-Carlo restart so two calls with the same seed produce
  bit-identical output. None preserves the previous global-state
  behaviour for callers that have not opted in.

Verification:
- 121 existing unit + smoke tests pass unchanged.
- 40 random Earth-like seeds: 40 successes, 0 ZeroDivisionError
  (previously 36 crashed).
- 35-case adversarial input runner: 0 crashes (previously 7 crashed
  with ZeroDivisionError); every adversarial input either raises a
  typed exception (ValueError / KeyError / TypeError / RuntimeError)
  or returns a valid result.
- Same-seed reproducibility check: two calls with random_seed=42
  return bit-identical fO2_shift_derived and all four primary
  partial pressures.
Cover every typed-exception path the entry point promises in its
docstring: missing target_d keys raise KeyError naming the missing
element; non-finite or negative target_d values raise ValueError;
fO2_hint outside [-12, +12] raises ValueError; missing or unphysical
M_mantle / Phi_global / T_magma raise the appropriate exception;
nguess and nsolve below 1 raise ValueError (the legacy nguess=0
UnboundLocalError path is explicitly checked).

61 parametrised cases run in 0.33 s; all unit-tier.
The core correctness property is that any fO2_shift_IW value the
legacy solver accepts must be recovered by the new mode when fed the
same problem's O budget back as the authoritative O target. If the
round-trip fails by more than the per-element tolerance, the extended
5-equation system has a different fixed point than the legacy 4-equation
one, which would be a real bug. The parametrised round-trip covers
five fO2_shift values across the reducing-to-oxidising range.

Also covers the happy-path output schema, the per-element residual
tolerance gate, same-seed bit-identical reproducibility, and the
RuntimeError-on-impossible-target contract documented in the entry
point's docstring.

12 tests, smoke-tier, ~2 minutes wall time.
The 5-unknown system the authoritative-O solver inverts is well-posed
only if O_kg_total(fO2_shift) is monotonic on the working range at
fixed (H, C, N, S, T_magma, M_mantle, Phi_global). If the curve has
a turning point, the inverse solver can find multiple roots from
different cold starts and lose determinism even when seeded.

The pre-fix adversarial review confirmed monotonicity empirically
on [-5, +5] at 1800 K and 3000 K. Lock that finding in as a property
test so a future chemistry refactor that breaks it fires the test
instead of corrupting downstream simulations.

Earth-like H/C/N/S budget at 1500 K and 1800 K, sweep 11 fO2_shift
points across [-4, +6], confirm strictly increasing O_kg_total.
Empirical dynamic range is ~3x over 10 dex; the test asserts > 1.5x
so a catastrophic regression that flattens the curve catches.

Marked slow because each sweep runs 11 legacy solves.
I added documentation for the second CALLIOPE solver mode, which takes
oxygen as a fifth elemental budget and treats the IW-buffer offset as
an unknown. The two modes share the same physics functions; only the
unknown set and the residual equation count differ.

What is new:

- Explanations/authoritative_oxygen.md: the augmented 5-residual mass
  balance, how fO2_shift enters the chemistry as an unknown, the
  per-element residual gate that replaces the legacy scalar tolerance
  bound, the Monte-Carlo restart with fO2 redraw window, the
  cross-backend IW-buffer divergence with atmodeller (Hirschmann vs
  O'Neill & Eggins), and the calibration-range and solid-FeO
  limitations.
- How-to/authoritative_oxygen.md: a recipe page with four runnable
  patterns (5-element solve, round-trip from buffered mode, warm-start
  via p_guess including the fO2 seed, deterministic mode via
  random_seed) plus the common error modes.

Cross-references and clarifications on existing pages:

- mass_balance.md, oxygen_fugacity.md: reframed as the buffered-mode
  page and pointed to the dual mode.
- Explanations/proteus_coupling.md: showed how the PROTEUS wrapper
  dispatches between the two CALLIOPE entry points based on
  planet.fO2_source.
- How-to/configuration.md: documented the 5-key target dict variant
  for the authoritative-O entry point.
- Reference/api/index.md: listed the new public entry point.
- index.md: reworded the feature bullet so the reader sees both modes
  side by side from the landing page.

mkdocs.yml gets the two new nav entries. Zensical build is clean (no
broken anchors, 3.8 MB site).
…ings

I went back over the docs against the actual source and against the
voice rules and fixed several places where the prose was either
slightly wrong or read as a change-narrative rather than a current-
state statement.

What changed:

- Tests cited by exact pytest node-ID. The earlier doc named tests
  that did not exist; the right names are
  TestRoundTrip::test_round_trip_recovers_fO2_within_tolerance and
  TestMonotonicity::test_O_kg_total_strictly_increasing_with_fO2.
- Monotonicity range narrowed to what the tests actually verify
  ([-4, +6] at two T_magma values, with a Phi_global = 0.3 sibling
  check over [-2, +4]). The prior text claimed [-6, +8] over five
  (T_magma, Phi_global) pairs, which neither test covers.
- TRUNC_MASS source file corrected (it lives in solve.py, not
  constants.py).
- Per-element tolerance motivation now makes explicit that the
  buffered mode never has this problem because O is not in its
  residual vector. The 5-order-of-magnitude N vs O target imbalance
  only matters once O enters the residual.
- "fO2 bounds are looser than the redraw window" was reversed. They
  are wider, not looser; doc now says wider.
- "is now" / "the only change is" / "is now a variable" phrasings on
  the concept page rewritten as current-state statements (each mode
  is described in its own right, comparisons live in the explicit
  comparison table).
- Dasgupta N2 redox dependence is "substantially less dissolved N",
  not "slightly less"; the law carries an exponent of -1.6 * dIW, so
  the swing per dex is order-of-magnitude.
- Cross-backend Hirschmann vs O'Neill & Eggins note drops the
  "growing with temperature" qualifier, since the trend at lower T
  is not numerically established from the CALLIOPE side.
- Buffered-mode contrast says "fewer unknowns", not "simpler
  contract"; the restart-count estimate is hedged as empirical.
- The PROTEUS dispatch snippet in the proteus_coupling Explanation
  now passes config.outgas.calliope.nguess / nsolve rather than
  hardcoded literals, matching the real wrapper. An adjacent
  paragraph documents the schema defaults and the wrapper's
  authoritative-O restore of hf_row['O_kg_total'].
- RuntimeError "max attempts: 7500" message in the how-to is now
  "max attempts: N", with a sentence on the PROTEUS override.
- Recipe 4 (reproducibility) replaces float == with math.isclose, and
  the prose hedges "bit-reproducible" to "reproducible to solver
  tolerance".
- oxygen_fugacity.md said "two channels" while listing four
  subsections; fixed to "four channels". The authoritative-oxygen
  page deep-links into this section and the count was already
  inconsistent.
- index.md chemistry-couples bullet uses ASCII hyphens, not
  en-dashes. The en-dash carve-out is only for page/volume ranges in
  citations; species pairs do not qualify.

Zensical build remains clean, 3.8 MB site, no broken anchors.
The earlier docs commit added the new concept and how-to pages for the
authoritative-oxygen mode but left the model overview, the code
architecture diagram, the solubility page, and the two PROTEUS-coupling
how-to / theory pages describing CALLIOPE as a buffered-mode-only
package. This commit makes the dual mode visible from the broadest
entry points a reader hits.

Explanations/model.md:

- The "what is in the model" bullets now describe H/C/N/S as always
  solved and O as either derived (buffered) or budgeted (authoritative-
  O), with a link to the dedicated page.
- The IW-buffer bullet says fO2_shift is an input under the buffered
  mode and a solver unknown under the authoritative-O mode.
- The "Mathematical statement" section presents one mass-balance
  equation per solved element, then names the two modes explicitly as
  4x4 vs 5x5 systems sharing the same physics.
- The "Why four primaries" admonition is reworded so it remains true
  in both modes: O is either not solved (buffered) or carried as a
  separate scalar unknown (fO2_shift), never a new primary partial
  pressure.

Explanations/code_architecture.md:

- The solve.py inventory now lists both solver entry points first,
  then the shared physics functions, then the cold-start generators
  for the 4-unknown and 5-unknown problems, then the residual /
  objective pairs and the private versions that thread fO2_shift
  positionally.
- The call graph shows both entry points diverging at the cold-start
  generator and converging at the residual, with the private versions
  visible.

Explanations/solubility.md:

- Added See-also entries for oxygen_fugacity.md (the IW-buffer
  parameterisations behind the Gaillard and Dasgupta fO2 dependence)
  and authoritative_oxygen.md (how those fO2-dependent terms become
  part of the 5-residual mass balance once fO2 is solved for).

How-to/proteus_coupling.md:

- New "Selecting the fO2 dispatch" section documents the
  planet.fO2_source TOML field with both values, explains what each
  one does, and links to the dispatch sequence diagram and the
  authoritative-O concept page.
- The Redox-state section now clarifies that fO2_shift_IW is the
  buffer offset under user_constant and the initial-guess seed under
  from_O_budget.

How-to/usage.md:

- Next-step block points readers with an oxygen budget at the
  authoritative-O recipe.

Zensical build remains clean, no broken anchors.
…nges

The authoritative-O entry point used to emit a UserWarning when T_magma
fell outside [1373, 1973] K, with a message naming "Dasgupta N2
(1373-1873 K) and Gaillard S2 (1473-1973 K)" calibration ranges. Both
sets of numbers were inventions: I went back to the primary sources
and neither paper supports them.

What Dasgupta et al. (2022, GCA 336, 291) Equation 10 actually reports
as its calibration footprint (n=137 compiled data, from their paper
just above the equation):

  T range 1050-2327 °C  (1323-2600 K)
  P range 1 bar to 8.2 GPa
  fO2 range IW-8.3 to IW+8.7

What Gaillard et al. (2022, EPSL 577, 117255) Equation 10 reports about
itself: a refit of the O'Neill & Mavrogenes (2002) model combined with
8 other experimental sources (n=369), all calibrated at 1 atm with
fO2 in the range IW-1 to FMQ+0.1. The paper does not state a
temperature calibration range; its modelling section works at a fixed
T=1500 °C.

The earlier CALLIOPE warning was therefore wrong in three ways: it
under-stated Dasgupta's high-T span by ~700 K, it asserted a Gaillard
T range that the source does not state at all, and it ignored fO2
constraints entirely. Since the two laws have incompatible calibration
footprints (Dasgupta covers a wide T x P x fO2 box; Gaillard covers a
narrow fO2 strip at 1 atm), a single scalar T-window cannot
meaningfully represent both. The cleanest fix is to remove the
in-process warning and surface the actual ranges in the docs.

Removed:
- The 12-line warnings.warn block in equilibrium_atmosphere_authoritative_O.
- The test_T_magma_outside_calibration_warns test that asserted the
  warning fired at T=3000 K.

Documented (in solubility.md validity-envelope table):
- N2 (Dasgupta): T 1323-2600 K, P 1 bar to 8.2 GPa, fO2 IW-8.3 to IW+8.7.
- S2 (Gaillard): T not stated by the paper; P 1 atm (calibration data);
  fO2 IW-1 to FMQ+0.1 (~ IW+3.5).
- A new fO2 column on the table so the redox-dependence of N and S is
  visible without burying it in prose.
- An adjacent paragraph naming the primary sources with ADS links,
  the dataset sizes, and the 1-atm-fit-applied-at-MO-pressures gap
  for Gaillard.

Touched the dual-mode docs that previously echoed the warning text:
- Explanations/authoritative_oxygen.md Limitations bullet now points
  at the validity envelope rather than quoting numbers.
- How-to/authoritative_oxygen.md Pitfalls bullet replaces the
  UserWarning entry with a note that the entry point does not flag
  extrapolation and a link to the envelope.

Tests now stand at 202 (was 203). Full suite green; docs build clean.
I rebuilt the solubility validity-envelope table and the oxygen-fugacity reference-values table from primary sources (Sossi 2023, Dixon 1995, Armstrong L.S. 2015, Ardia 2013, Newcombe 2017, Libourel 2003, Wilson & Head 1981, Hamilton 1964, Cartier & Wood 2019, Wadhwa 2001, Frost & McCammon 2008, Nicholls 2024 JGRE, Fischer 2011) and corrected the claims that did not survive the check.

Solubility validity envelope: rebuilt from 7 partially-fabricated rows into 11 primary-source-verified rows, one per implemented composition. The H2O peridotite row previously said "1 bar to several kbar" but Sossi 2023 was at 1 atm total pressure. The H2O basalt Dixon row previously said "176-2021 bar" but the upper bound came from Berndt 2002 (a separate calibration with a different prefactor); Dixon 1995 is actually 176-717 bar pH2O. The CO row previously said "<=3 GPa" but Armstrong's experiments are at 1.2 GPa. The CH4 row previously said "1573-1873 K" but Ardia 2013 was at 1400-1450 C (1673-1723 K). Added the four implemented compositions that were missing from the table (basalt_wilson, anorthite_diopside, lunar_glass, libourel) with their actual calibration footprints. Reframed the fO2 column from "not constrained" (which conflated functional dependence with calibration footprint) to explicit per-row calibration ranges, with a disambiguation paragraph below the table.

Ardia CH4 attribution: the paper's Figure 6 is Raman spectra, not the solubility best-fit. Corrected the doc to cite Eq. (8) with ln K0 = 4.93 and DeltaV = 26.85 cm^3/mol, plotted as Fig. 11.

Oxygen-fugacity buffer attribution: the Fischer 2011 description previously claimed "calibrated against high-pressure (~25 GPa) experimental data" but the 6.94059 - 28180.8/T fit reproduces the 1-bar buffer curve, which Fischer 2011 takes from Chase 1998 NIST-JANAF tabulation. Rewrote the description accordingly.

Oxygen-fugacity reference values: Mercury split into Fe-based (IW-2.8 to IW-4.5) and sulphur-based (IW-5.4 via Namur 2016) estimates per Cartier & Wood 2019. Mars mantle corrected from "-3 to 0" to "approximately IW" with shergottite parent melts at IW-1.0 to IW+1.9 per Wadhwa 2001. Earth mantle expanded into three depth-binned rows (8 GPa, transition zone, lower mantle) per Frost & McCammon 2008. Dropped the asteroidal-material row because Doyle 2019 reports a range across chondrite types, not a single representative value. Nicholls 2024 H2-dominated threshold corrected from DIW <= -2 to DIW <= -1 (the paper says "fO2 <= IW - 1; see also Figure 6"). Nicholls 2026 L 98-59 d range corrected from DIW ~ -1 to "between IW-4 and IW-1" (the paper's stated retrieval range). Sossi 2020 DIW = +3.5 +- 0.5 corrected to "approximately +3.5"; the +-0.5 uncertainty was not in the paper.

Cross-page propagation: model.md validity-range bullet updated to recommend Hamilton 1964 (not Dixon) for surface pressures above 1 kbar, and to quote the actual extrapolation factor for Sossi/Newcombe applied at kbar pressures. How-to/proteus_coupling.md reference-values table updated to match the new oxygen_fugacity.md framing. Tutorials/firstrun.md H2-dominated threshold updated to match Nicholls 2024.

All CALLIOPE tests pass (198/198 unit). Zensical docs build clean.
The four-marker breakdown in docs/Explanations/testing.md was last set
when the suite had 96 unit + 25 smoke + 5 integration + 0 slow tests.
The suite has grown since: pytest --collect-only against the four
markers now reports 156 unit + 37 smoke + 5 integration + 4 slow
(the four slow tests are the authoritative-O monotonicity-regime
property tests). Updated the doc table to match the current state.
This new test file pins eleven invariants that must hold for any valid
CALLIOPE result, plus three additional checks at the edges of the
solubility-law calibration footprints.

The eleven invariants:

1.  Per-element mass conservation: atm + liquid == total for each of H,
    C, N, S, O.
2.  Pressure positivity: every species partial pressure is >= 0.
3.  VMR closure: sum of volume mixing ratios is unity over all eleven
    species.
4.  Total-pressure consistency: P_surf == sum of species partial
    pressures.
5.  Atmospheric mass consistency: M_atm == sum of species column masses.
6.  fO2 reconstruction: in authoritative-O mode, the derived shift
    reproduces log10(p_O2) - log10(fO2_IW_buffer(T_magma)).
7.  Modified equilibrium constant identity: p_B / p_A matches
    ModifiedKeq.<method>(T, fO2_shift) at the converged state for both
    H2O-H2 and CO2-CO couples.
8.  Dissolved-mass non-negativity: every <species>_kg_liquid is >= 0.
9.  CO2 stoichiometric atom counting: the C contribution from CO2
    equals the molar-mass ratio C/CO2 times CO2_kg_atm.
10. Monotonicity of Gaillard S2 solubility as fO2 decreases at fixed
    p_S2 and T (tested at the solubility-function level to isolate the
    law from multi-species solver feedback).
11. Monotonicity of Dasgupta N2 solubility as fO2 decreases at fixed
    p_N2, p_tot, T (same isolation strategy).

The three calibration-edge additions:

-   Dasgupta plateau at the reducing edge (dIW <= -3) where the paper's
    Fig. 7 notes plateau-like N solubility. Verifies both the buffered
    solver converges and the Dasgupta law stays numerically well-behaved
    down to dIW = -8.
-   Gaillard sulfide-saturated regime at the oxidising edge (dIW >= +4)
    where the sulfate regime begins and CALLIOPE's sulfide-only chemistry
    extrapolates. Verifies SO2 dominance over S2 at dIW >= +5 and that
    the dissolved-S fraction drops at oxidising conditions.
-   p_guess warm-start: a warm start with the cold-solve result lands in
    the same basin and recovers the same partial pressures. Includes a
    sad-path with an off-by-100x bad guess that the Monte-Carlo restart
    catches.

Every test class is anti-happy-path: each parametric sweep covers an
edge of the physical regime, and each class includes at least one
sad-path companion (zero mantle mass, extreme reducing, negative S2
pressure, off-by-100x warm start, etc.).

Marker discipline: classes are explicitly marked with exactly one of
unit (for stoichiometric / direct-solubility-function tests that do
not invoke the multi-species solver) or smoke (for tests that exercise
equilibrium_atmosphere or equilibrium_atmosphere_authoritative_O).
29 unit + 57 smoke = 86 new tests, no overlap with existing markers.

Total CALLIOPE test count rises from 202 to 288, all passing.
Wall-time for the new file alone: 1.2 s.
Property-based fuzzing complements the fixed parametric sweeps in
tests/test_invariants.py by exercising the Dasgupta N2 / Gaillard S2
/ O'Neill IW / Fischer IW / ModifiedKeq helpers across the full
physically-relevant input space, not just the discriminating points
the deterministic suite covers.

Seven hypothesis tests, each running 100-200 examples per call,
covering: Dasgupta monotonicity in fO2 at random (p_N2, p_tot, T);
Dasgupta finiteness; Gaillard monotonicity in fO2 at random (p_S2,
T); Gaillard finiteness; O'Neill IW finiteness over T and dIW;
O'Neill-vs-Fischer agreement within 1 dex over 1800-2200 K; and
ModifiedKeq finiteness + positivity across all six methods.

The tests are marked ``slow`` (per the existing 4-marker scheme, no
new marker introduced) so the PR gate is not slowed. Wall-time for
the new file alone is 1.15 s; the nightly runs them.

Adds ``hypothesis >= 6.0`` to the develop optional-dependencies in
pyproject.toml.

Test count: 288 -> 295 (7 added, all slow tier).
PR-gate count: 284 (unchanged).
The previous TestHappyPath class in tests/test_authoritative_O.py
exercised only the canonical Earth-like input (dIW=4.0) and was a
happy-path-only regression check: both methods asserted that the
result-dict contract held at one point in the physical regime.

The PROTEUS test rules require every new or modified test function
to cover at least one edge case and at least one physically
unreasonable input. The rewrite renames the class to
TestPhysicalOutputContract (more honest about what it tests) and:

-   Parametrises test_returns_physical_output over six dIW values
    spanning the physical magma-ocean regime (-4, -2, 0, +2, +4, +6).
    The result-dict contract now has to hold across reducing,
    neutral, and oxidising endpoints, not just at the fiducial.
-   Parametrises test_residuals_within_tolerance over three dIW
    values (-4, 0, +4). Per-element residual gates have to clear at
    the extremes too.
-   Adds test_negative_T_magma_raises, the unphysical-input sad path:
    feeding a negative T_magma to the solver must raise (since both
    IW buffer formulae diverge as 1/T and T*log(T)).

Net effect: one happy-path-only class (2 tests, 1 dIW) becomes one
anti-happy-path contract class (10 tests covering 6 dIW values + a
sad path).

The other three classes in this file (TestRoundTrip, TestReproducibility,
TestConvergenceFailure) already satisfy the anti-happy-path rule
(TestRoundTrip parametrises over 5 dIW values, TestConvergenceFailure
verifies an unreachable target raises) and need no changes.

Full suite count: 295 -> 303, all passing.
…badges

Two presentational fixes on the docs landing page:

-   Move the H1 ``# CALLIOPE`` heading above the badge block to match the
    layout of the sister repo (Aragog) and to give the page a clear title
    line before the CI status indicators.
-   Append ``&color=brightgreen`` to the codecov, Unit Tests, and
    Integration Tests badge URLs so they render in the exact same shade
    of green as the Docs badge regardless of the underlying status
    algorithm. The codecov badge's auto-gradient would otherwise drift
    by a few percent of saturation depending on the coverage value, and
    the test workflow status badges would drift if a transient flake
    flipped them yellow on a given main-branch run.
-   The Testing-suite page (docs/Explanations/testing.md) gets the same
    colour override on its codecov + test badges for visual consistency
    with the landing page.
The round-trip paragraph in docs/Explanations/authoritative_oxygen.md
wrapped the result-dict key inside a math-mode tuple with
``\texttt{O\_kg\_total}``. MathJax under Zensical does not render
``\texttt`` cleanly: the underscore-escape backslashes leak through to
the page as literal characters, so the prose ended up showing
``(H, C, N, S, O\_kg\_total)`` with visible backslashes between the
key and each underscore.

Reflow the sentence so the math tuple uses plain ``(H, C, N, S, O)``
and the ``O_kg_total`` dict key is referenced as inline code outside
the math span. The intent (round-trip from buffered to
authoritative-O via the recovered O budget) is unchanged; only the
typesetting moves.
The "Selecting the fO2 dispatch" section in docs/How-to/proteus_coupling.md
previously gave a three-line TOML snippet showing the fO2_source field
in isolation, with no example of what the surrounding planet.elements
block needs to look like for each mode. A first-time reader had to
piece the full config together from the Choosing-the-elemental-budget
and Selecting-the-fO2-dispatch sections, and the validator rule that
ic_chemistry is rejected under from_O_budget was not stated anywhere.

Rewrite the section so it gives:

-   A dispatch table mapping each fO2_source value to its input,
    its solved-for quantity, and the underlying CALLIOPE entry point.
-   A complete minimal-style TOML config for the buffered-fO2
    mode (fO2_source = "user_constant", O_mode = "ic_chemistry",
    fO2_shift_IW as the buffer offset).
-   A complete minimal-style TOML config for the authoritative-O
    mode (fO2_source = "from_O_budget", O_mode = "FeO_mantle_wt_pct"
    with a worked numeric example at 8 wt% FeO, fO2_shift_IW as the
    initial-guess hint).
-   A per-mode walk-through of what the wrapper does on the first
    call (which target it builds, which entry point it calls, which
    hf_row keys it writes back).
-   A side-by-side decision table for choosing between the two.

Highlights the validator rule (ic_chemistry is incompatible with
from_O_budget because the chemistry needs an O target to invert)
in the authoritative-O section so the user does not hit the
config-load failure first.

Cross-links to the round-trip equivalence test and the Authoritative
oxygen mode explanation page for the underlying physics.

Docs build clean.
The job name on .github/workflows/tests.yaml was

    name: CALLIOPE-check (${{ matrix.os }}, py${{ matrix.python-version }})

The template placeholders evaluate fine when the job actually runs,
but the job is gated by ``if: github.event.pull_request.draft == false``
so it never dispatches on a draft PR. When GitHub skips a job at the
top-level ``if:`` check the matrix context is never populated, and
the unexpanded literal ``CALLIOPE-check (${{ matrix.os }}, py${{ matrix.python-version }})``
appears as the check title in the PR's "Checks" sidebar. That has
been visible on PR #20 for the duration of the draft.

Reduce the job name to a plain ``CALLIOPE-check``. GitHub appends the
matrix values automatically when the job dispatches for real (one
display row per matrix cell), and a skipped draft job now shows a
clean ``CALLIOPE-check`` instead of the raw template string.
Add a new Explanations page "Backend comparison" that quantifies where
the two outgassing backends agree and where they disagree on the
authoritative-O closure. Five publication-quality figures, each one
backed by a reusable harness in scripts/cross_backend/:

  Fig 1: The iron-wustite buffer parameterisations (O'Neill & Eggins
         2002 the CALLIOPE default, Fischer 2011, Hirschmann composite
         the atmodeller default) and the Hirschmann minus O'Neill
         offset across magma-ocean temperatures.

  Fig 2: Per-backend internal round-trip. Each backend goes from
         buffered mode at known dIW through authoritative-O and back
         within solver tolerance at every (T, dIW) grid point.

  Fig 3: Cross-backend dIW at the Krijt et al. 2023 PPVII BSE H/C/N/S
         volatile inventory across T = 1800 K to 3000 K. The raw gap
         tracks the buffer offset; the buffer-corrected residual stays
         within 0.1 dex up to 2000 K and grows to ~0.3 dex by 3000 K.

  Fig 4: Attribution at T = 2000 K. Raw 0.42 dex gap, 0.07 dex after
         buffer correction, 0.21 dex after also forcing solubility
         alignment. The buffer is the dominant systematic.

  Fig 5: Earth anchor. Both backends fall inside the Frost & McCammon
         2008 empirical mantle-fO2 range, with CALLIOPE at the
         Sossi 2020 +3.5 reference and atmodeller a fraction of a dex
         more reducing.

The harness is reusable: any inventory, any temperature, any pinned
fO2_hint can be swept by parameterising the existing scripts. Raw CSV
data is committed alongside the PDF and PNG outputs to lock the
provenance of the embedded figures.

Volatile-O reference for Earth is derived self-consistently from a
CALLIOPE buffered-mode call at the Sossi 2020 IW+3.5 baseline rather
than read directly from Krijt+2023 Table 2 (the latter is the Evans
2006 redox-active oxygen, which is dominated by the mantle FeO redox
imbalance rather than volatile O and is not the variable the
authoritative-O entry point inverts).
Replace every inline [Author Year](URL) citation across the docs body
pages with the [^cite-author-year] footnote-reference syntax, with one
[^cite-...]: definition per source appended at the bottom of each page.
Numbered superscript links and a per-page bibliography section now
render the same way the aragog docs do.

Pages converted: index, code_architecture, equilibrium_chemistry,
mass_balance, model, oxygen_fugacity, proteus_coupling, solubility,
authoritative_oxygen, cross_backend_comparison; How-to/usage,
How-to/proteus_coupling; Tutorials/firstrun.

The dedicated Reference/publications page is left as-is. It is the
full bibliography for the project and intentionally uses a different
expanded layout.

Citation keys (cite-author-year, lowercase) are stable identifiers I
can reuse across pages. Per-page bibliography entries follow the
aragog format: author list, italicised paper title with the DOI link
where one exists, journal + volume + pages + year, SciX link. Page
ranges use the en-dash per the academic-citation carve-out.
Audited every cite-* footnote definition against my references.bib and
the atmodeller and CALLIOPE source code. Seven entries had titles,
authors, or journal information I had drafted from memory and got
wrong. Fixed:

- cite-hirschmann2008: now the Library of Experimental Phase Relations
  (LEPR) paper in Geochemistry, Geophysics, Geosystems 9(3), Q03011,
  which is the source atmodeller cites for the H08 branch of the IW
  buffer. Was previously a fabricated PEPI 170 entry.
- cite-hirschmann2021: now Geochimica et Cosmochimica Acta 313, 74-85.
  Was previously a fabricated American Mineralogist entry.
- cite-hirschmann2012: corrected the volume from 345 to 345-348 (this
  is the bound multi-issue numbering used by EPSL for this paper) and
  added the DOI link to the title.
- cite-li2015: corrected author list and title. The paper is
  Li, Dasgupta, Tsuno on carbon solubility in Fe-rich-alloy / silicate
  systems; I had drafted a different title with different authors.
- cite-yoshioka2019: corrected author list and title. The paper is
  Yoshioka, Nakashima, Nakamura, Shcheka, Keppler on carbon solubility
  in CO-CO2-graphite equilibrium; I had drafted a rheology-focused
  title with different authors.
- cite-boulliungwood2023: changed from the Geochimica et Cosmochimica
  Acta 343 corrigendum to the actual Boulliung & Wood 2023 paper in
  Contributions to Mineralogy and Petrology 178(8), 56, which is what
  atmodeller's BW23 cite key points to.
- cite-rogers2025: corrected the title and added the second and third
  author. The paper is Rogers, Young, Schlichting on hydrogen-silicate
  miscibility for sub-Neptune interiors; I had drafted a binodal-
  framing title that does not match the published article.

Six other entries had verified titles and authors but were missing the
DOI hyperlink on the title. Added the DOI link for Frost & McCammon
2008, Nikolaou 2019, Lebrun 2013, Salvador 2017, Wadhwa 2001, O'Neill
& Mavrogenes 2002, and Cartier & Wood 2019.

Verification sources (in order of precedence): my references.bib
Better-BibTeX export; atmodeller/docs/refs.bib for the atmodeller-
internal cite keys; targeted web search against the published article
for the entries not in either bib file.
"Path C" was an internal Claude-roadmap shorthand for the
authoritative-O dispatch under planet.fO2_source = "from_O_budget".
It is meaningless to anyone reading the docs or source cold, and
leaks the AI-assisted planning vocabulary into checked-in artefacts.

Rewrote the two occurrences in plain language:

- docs/Explanations/cross_backend_comparison.md: "For Path C runs" ->
  "When running in authoritative-O mode".
- src/calliope/solve.py: the docstring of
  equilibrium_atmosphere_authoritative_O dropped the PROTEUS issue
  #677 Path C reference and now describes the science motivation
  directly ("whole-planet O accounting where atmospheric escape and
  ingassing debit the same O reservoir").
The previous version used a viridis colour ramp across four temperatures
and plotted recovered vs input on an identity-line view. Two issues:
the four colours all rendered as essentially the same green at marker
size, and at every input dIW the four T markers stacked on top of each
other so only the topmost one was visible.

Two changes:

- Switch the colour palette to a discrete four-colour set (indigo,
  teal, orange, magenta) that maps cool to warm onto the four T
  values. Each colour is clearly distinguishable in the legend.

- Switch the y-axis to the residual (recovered minus input dIW) and
  add a small horizontal jitter per T so all four T markers fan out
  side-by-side at each input dIW. A grey band marks the +/- 0.01 dex
  solver tolerance.

The new view also surfaces two solver-edge cases that the old identity
view had visually hidden behind perfect-overlap markers: CALLIOPE at
T = 3000 K input dIW = +4 lands in a secondary basin and returns
-5.84; atmodeller at T = 1500 K input dIW = 0 fails to converge. Both
are now shown as downward triangles at the y-axis floor, in the
matching T colour, and acknowledged in the caption rather than masked.

The page text now also explains why T is swept (chemistry is
T-dependent: the equilibrium constants, the Dasgupta nitrogen
solubility, and the Gaillard sulfur solubility all carry explicit T
or fO2 dependences, so the round-trip must hold across the full
calibration range).
Both panels had legends parked on top of the curves: panel (a)'s
lower-right corner sat across the atmodeller and predicted-from-buffer
lines at the hot end, and panel (b)'s upper-right corner crossed the
raw-gap line near its zero-crossing.

Move both legends to lower-left, pad the y-axis on each panel so the
legend box sits below the lowest converged point, and shift the
tolerance-band annotation in panel (b) to the upper-right so it no
longer collides with the panel label.
Bundle Roboto-Regular / Bold / Italic alongside the cross-backend
plotting scripts and register them in plot_style.apply_style so every
figure on the backend-comparison page now renders in the same sans-serif
the Material theme uses for body text. Switch from font.family='serif'
to a sans-serif stack that prefers Roboto and falls back to Helvetica /
DejaVu Sans when Roboto is unavailable, and route mathtext through
Roboto too so equation labels match the surrounding prose. Replace one
H08-arrow-H21 unicode glyph in fig1 with the H08 / H21 form so Roboto
covers it natively.

Move the (b) panel label in fig3 out of the corrected-residual line by
giving the bottom panel a deeper top margin. Move the "0.07 dex" bar
label in fig4 below the 0.10 dex tolerance line so the two no longer
collide, and park the tolerance annotation above the dashed line where
no bar reaches. In fig5, lift the Sossi 2020 anchor line above the
backend lines in z-order so it stays visible even when a backend lands
at +3.5 exactly, add it as a fourth legend entry, and move the legend
below the plot so it stops crowding the data lines.

On the backend-comparison page, swap the words "harness" and "contract"
for plainer phrasing throughout (and in the three sister pages where
they appear), and add an explicit Krijt+2023 H / C / N / S table with
the kg values used by the figures plus a short paragraph explaining why
oxygen is not taken from Krijt directly and how the volatile-O
reference of 1.24e22 kg was derived.
In fig3, both panel labels were sitting close to the data at the cold
end of the sweep: (a) above the CALLIOPE point at T = 1800 K (3.37 dex)
and (b) right next to the corrected-residual diamond (+0.08 dex). Add
more top headroom in each panel so the labels clear the curves with
breathing room.

In fig4, move the per-element-solver-tolerance annotation from the
space between bars 1 and 2 to the space between bars 2 and 3. That
puts the label closer to where the line is actually doing work (the
0.07 dex bar is the only one below threshold) and well clear of every
bar-top label.
Previously parked between bar 2 and bar 3 at x = 1.5, which crowded
the right edge of the figure. At x = 1.0 the annotation sits above the
0.07 dex bar (which has already topped out below the tolerance line)
and the text extends well clear of bars 1 and 3 on either side.
The Krijt+2023 inventory table previously appeared as a level-3 heading
with no upstream context: a reader landing on the page would see a
table of element masses with no explanation of why those specific
numbers belong here. Promote the section to a top-level heading
("The Earth fiducial used throughout this page") and add an
introductory paragraph that states three things up front:

- A cross-backend comparison only carries meaning if both solvers see
  the same inputs, so the page picks one shared scenario and runs both
  backends against it.
- The scenario is Earth BSE at Phi = 1, with T_magma fixed at 2000 K
  for Figures 4 and 5 or swept across 1800 to 3000 K for Figure 3.
- Earth is the natural anchor because the modern upper-mantle Delta-IW
  is empirically constrained, which lets Figure 5 actually check the
  predictions against petrology.

The element-mass table and the oxygen-derivation paragraph follow as
before.
The first-run tutorial walks the reader through Earth-like inputs and
asks them to print partial pressures and plot a bar chart, but
previously had no canonical answer to check their output against. Add
a horizontal-bar reference figure (species ordered top-to-bottom by
partial pressure, summary-box of P_surf / M_atm / mean molar mass)
generated by the same code path the tutorial uses, so the reader can
visually confirm their numbers agree to within a few percent.

The figure is produced by scripts/tutorials/fig_firstrun_reference.py,
which shares the Roboto-based styling with the cross-backend
comparison figures via scripts/tutorials/_style.py. Reusable for
regeneration whenever the solver defaults change.
Both the project and patch Codecov gates used target: auto, which pins
the bar to the base commit's coverage (about 97.7%). That makes any PR
whose new lines land below the existing average fail the patch check even
when coverage is well above the 90% level we hold the repo to. Set both
gates to a fixed 90% target so the check reflects the actual standard and
does not ratchet itself upward toward 100%.
@timlichtenberg timlichtenberg requested a review from nichollsh May 31, 2026 10:35
@timlichtenberg timlichtenberg marked this pull request as ready for review May 31, 2026 10:35
@timlichtenberg timlichtenberg requested a review from a team as a code owner May 31, 2026 10:35
@timlichtenberg timlichtenberg changed the title Add an authoritative-oxygen mode to the equilibrium chemistry solver Authoritative-oxygen mode May 31, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ff50dec17

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/calliope/solve.py Outdated
The production path runs the unbounded fsolve (opt_solver=False), whose
acceptance gate rejected a negative partial pressure but not one above
the documented 1e7 bar upper bound. A mass budget that drives pH2O, pCO2,
pN2, or pS2 above that bound could therefore be returned as a physical
solution, where the trust-constr path would have enforced the same ub
array. The gate now also rejects partial pressures above the ceiling,
tested against the physical 1e7 bar bound captured before the per-guess
bound collapse so a tiny-guess slot is not falsely rejected.
GitHub is removing the Node 20 runner the deprecated actions ran on.
Move every action to its current Node-24 release: actions/checkout v6,
actions/setup-python v6, codecov/codecov-action v6. actions/cache stays
on v5, its latest major, which is already Node-24; there is no v6 yet.
Copy link
Copy Markdown
Member

@nichollsh nichollsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! Thank you for making these changes, which will seem to resolve the oxygen mass balance issue raised by @IKisvardai recently.

The docs are looking great and are well-referenced. They build fine on my laptop. I have also tested running the scripts in scripts/, and found that they do indeed reproduce the plots in the documentation. An important step for showing model robustness. The tests also pass on my laptop (Fedora 44, Python 3.12 and 3.13).

Retaining the existing fixed- $\Delta \rm IW$ functionality makes sense. The new authoritative oxygen mode is an important addition and will immediately enable new science.

It's not clear to me how exactly this mode will couple, in PROTEUS, to Mariana's evolving fO2 parameterisation. This would also seem to over-determine fO2 through Fe redox partitioning. We can think about this carefully after this PR, since the changes here give flexibility for both approaches.

I have tested that this CALLIOPE update works with the existing PROTEUS main branch, using the dummy and AGNI demo configs.

I have some comments below but cannot find any major issues here. I might have more comments after the existing ones have been resolved. Super exciting stuff!

Comment thread src/calliope/chemistry.py Outdated
"""Modified equilibrium constant (includes fO2)"""

def __init__(self, Keq_model, fO2_model='oneill'):
def __init__(self, Keq_model, fO2_model='fischer'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good move to switch to the Fischer parameterisation. It is more recent and robust.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good move to switch to the Fischer parameterisation. It is more recent and robust.

Agreed, that was the motivation: Fischer is the more recent fit and, per the backend-comparison page, tracks atmodeller's Hirschmann composite to within ~0.2 dex across magma-ocean temperatures where O'Neill drifts up to ~1 dex. O'Neill stays available as OxygenFugacity('oneill').

Comment thread src/calliope/oxygen_fugacity.py Outdated
"""log10 oxygen fugacity as a function of temperature"""

def __init__(self, model='oneill'):
def __init__(self, model='fischer'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default model be set somewhere more globally? What happens if someone changes one but not the other? Or maybe this flexibility is desirable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default model be set somewhere more globally? What happens if someone changes one but not the other? Or maybe this flexibility is desirable.

Good catch, the duplicated default was a trap. I pulled it into a single DEFAULT_FO2_MODEL = 'fischer' constant, and both OxygenFugacity and chemistry.ModifiedKeq now default to it, so they can't be changed independently. A test asserts both signatures carry the same constant and that the bare default really dispatches to Fischer.

Comment thread src/calliope/solve.py
ptot = get_total_pressure(p_d)

if ptot < 1e-30:
return 1.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine... although it worries a bit that setting the MMW=1 here could lead to problems if the code changes elsewhere. Would it not be possible to return MMW=0? This is more clearly a non-physical 'dummy' return than MMW=1 would be.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine... although it worries a bit that setting the MMW=1 here could lead to problems if the code changes elsewhere. Would it not be possible to return MMW=0? This is more clearly a non-physical 'dummy' return than MMW=1 would be.

I'd keep it at 1.0. MMW is used as a divisor downstream in the mass-balance chain, so returning 0 would turn the empty-atmosphere case into a ZeroDivisionError, which is what the sentinel avoids. The 1.0 is only reached when total pressure is below 1e-30 bar, and the callers multiply it by p_d[k] = 0, so every element mass still comes out zero and the "atmosphere is empty here" signal propagates cleanly. I have kept the docstring note so the choice isn't mistaken for arbitrary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. Something to perhaps keep in mind for the future, but I suppose that there's no "best" answer for how to define the MMW for a non-existent atmosphere.

Comment thread src/calliope/solve.py Outdated
Comment on lines +457 to +460
pH2O = 10 ** rng.uniform(low=-12, high=5)
pCO2 = 10 ** rng.uniform(low=-12, high=5)
pN2 = 10 ** rng.uniform(low=-12, high=5)
pS2 = 10 ** rng.uniform(low=-12, high=5)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put these pmin and pmax values somewhere more configurable. I can imagine wanting to set pmax > 1e5 bar for SN cases. Even just setting it via an optional function argument would be better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put these pmin and pmax values somewhere more configurable. I can imagine wanting to set pmax > 1e5 bar for SN cases. Even just setting it via an optional function argument would be better.

Done both ways. The cold-start range, the solver box, and the fO2 bounds now come from named module constants (P_GUESS_MIN_BAR, P_GUESS_MAX_BAR, P_CEILING_BAR, FO2_*) instead of inline literals, and the two cold-start helpers take an optional p_max (default P_GUESS_MAX_BAR) so a sub-Neptune case can widen the draw without touching the solver box. Wiring p_max to a PROTEUS config knob is the natural next step, which I have left for the PROTEUS side so it can be set per run.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put these pmin and pmax values somewhere more configurable. I can imagine wanting to set pmax > 1e5 bar for SN cases. Even just setting it via an optional function argument would be better.

Now wired through to PROTEUS as well. PROTEUS PR #678 (FormingWorlds/PROTEUS) adds an [outgas.calliope] p_guess_max config field (default 1e5 bar, matching CALLIOPE's P_GUESS_MAX_BAR, and bounded to (0, 1e7], the solver box) and forwards it to both CALLIOPE entry points, so a sub-Neptune run can widen the cold-start draw from config.

On the CALLIOPE side I renamed the public parameter p_max to p_guess_max for clarity against the P_CEILING_BAR solver box, and the cold-start helpers now validate the ceiling (reject non-finite or below-floor values, which numpy's uniform would otherwise silently invert into a degenerate draw).

One caveat worth stating: the ceiling only sets where the Monte-Carlo cold start samples within the 1e7 bar box; it does not raise the maximum pressure the solver can accept, so a genuine surface pressure above 1e7 bar would still need the box itself raised, which I have left as a separate change.

Comment thread src/calliope/solve.py Outdated
P_surf = outdict['P_surf']
for s in volatile_species:
outdict[s + '_vmr'] = outdict[s + '_bar'] / outdict['P_surf']
outdict[s + '_vmr'] = (outdict[s + '_bar'] / P_surf) if P_surf > 0.0 else 0.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check for numerical tolerance here. What if Psurf=1e-99 or some very small floating point error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check for numerical tolerance here. What if Psurf=1e-99 or some very small floating point error?

Fixed. The guard is now P_surf > P_SURF_FLOOR_BAR (1e-30 bar) rather than > 0.0, so a denormal or floating-point-noise surface pressure reports a mixing ratio of zero instead of dividing into it.


The figure below is the same Earth-like inputs run through `equilibrium_atmosphere` on a clean CALLIOPE install. Your numbers should reproduce these to within a few percent (the residual reflects Monte-Carlo restart variability, not a calibration issue).

![First-run reference](../assets/figures/tutorials/firstrun_reference.png)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! The tutorials are looking great.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! The tutorials are looking great.

Thanks, glad they're useful!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to package the fonts? Seems like a lot of bloat for just making the plots look pretty. These are open source fonts, but still... the regular MPL font is fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to package the fonts? Seems like a lot of bloat for just making the plots look pretty. These are open source fonts, but still... the regular MPL font is fine.

I'd like to keep these. Bundling Roboto keeps the figures rendering identically across machines (otherwise matplotlib falls back to whatever font the local install happens to have) and visually consistent with the rest of the proteus-framework.org docs. They're Apache-2.0 and only a few hundred KB, which I think is an acceptable cost for reproducible, on-brand figures. Happy to revisit if repo size becomes a real concern.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool plot! This seems to be consistent with what we'd expect from the physics, and demonstrates that the model is working robustly.

That being said, why does O2 not appear in any of the panels? Is it included or just not abundant? Not even at IW+6?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool plot! This seems to be consistent with what we'd expect from the physics, and demonstrates that the model is working robustly.

That being said, why does O2 not appear in any of the panels? Is it included or just not abundant? Not even at IW+6?

Great question, and you were right to push on it. O2 was being computed but I'd left it out of the reported-species set, so the figure was hiding it. I ran the solver across the full grid and read O2 out alongside everything else: at low T or reducing ΔIW its partial pressure is tiny (1e-16 to 1e-10 bar), but the IW-buffer fugacity climbs about nine orders of magnitude from 1500 to 3000 K (Fischer gives log10 fO2 ~ -2.5 at 3000 K), so at the hot, oxidising corner O2 becomes major: a few hundred bar at 3000 K / ΔIW +5, about 15% of P_surf and second only to CO2, and top-4 across the whole T >= 2700 K, ΔIW >= +4 corner. I've added O2 to the reported species, regenerated the figure (the green O2 cells now show in that corner), and documented it in the tutorial text.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great - thanks! I think this also matches with our physical expectations from the fO2 buffer parameterisations as functions of $\Delta \rm IW$ and temperature.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comparisons with atmodeller are great. However, it should document which version/commit of atmodeller is used here, in case the other model is changed in the future.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comparisons with atmodeller are great. However, it should document which version/commit of atmodeller is used here, in case the other model is changed in the future.

Added. The backend-comparison page now records that the atmodeller results were produced with atmodeller v1.0.0, and notes a later release may shift the comparison.

Comment on lines 26 to 28
def fischer(self, T):
"""Fischer et al. (2011) IW (FeO equation of state, EPSL 304, 496)"""
return 6.94059 - 28.1808 * 1e3 / T
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now the default model, you should mention here that these coefficients are for the p=1bar isoline in their Figure 6 (derived from integration of their Equation 2).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now the default model, you should mention here that these coefficients are for the p=1bar isoline in their Figure 6 (derived from integration of their Equation 2).

Added to the fischer docstring: the coefficients are the p = 1 bar isoline of their Figure 6, obtained by integrating their Equation 2.

@nichollsh
Copy link
Copy Markdown
Member

Two other things (sorry):

  1. The version reported by calliope.__version__ in the Python REPL is strangely formatted? I get '26.5.13.post1.dev86+g56dbcbc39', which seems like it would break some of the checks in PROTEUS. Why not just return 26.5.13? Also, this is not consistent with 26.5.10 in the metadata files.

  2. The package author list only includes myself (with an old email address). Please feel free to add yourself to this, since you're also a maintainer on PyPI.

@timlichtenberg
Copy link
Copy Markdown
Member Author

Awesome, thanks for the rapid turnaround, as always. :)

The phase-diagram tutorial reported the top four volatile species but left O2
out of the candidate set. At high magma-ocean temperature and oxidising
delta-IW the IW-buffer fugacity climbs steeply, so O2 stops being a trace gas:
at 3000 K and delta-IW +5 it reaches second place, about 15% of the surface
pressure, behind only CO2. Add O2 to the reported species, regenerate the
figure, and describe the hot, oxidising behaviour in the tutorial text.
Lift the cold-start pressure range, the fO2-shift redraw range, the
trust-constr box, and a surface-pressure floor into named module constants in
solve.py so the cold-start guess range, the solver box, and the fO2-hint
validation can no longer drift apart. Add an optional p_max to the cold-start
helpers so a high-pressure (sub-Neptune) case can widen the pressure draw
without changing the box. Report a volatile mixing ratio of zero when the
surface pressure falls below the floor, rather than dividing into a denormal.

Default the fO2 buffer through a single DEFAULT_FO2_MODEL constant shared by
OxygenFugacity and ModifiedKeq, so the two cannot be changed independently.
Record that the Fischer coefficients are the 1 bar isoline of their Figure 6,
and note the atmodeller version used for the backend comparison.
@timlichtenberg
Copy link
Copy Markdown
Member Author

timlichtenberg commented May 31, 2026

Thanks for the thorough review, and for testing the scripts/ reproductions and the PROTEUS coupling. On the coupling to Mariana's evolving fO2 parameterisation and the possible over-determination through Fe redox partitioning: I agree it needs care and I'd like to keep it out of this PR. This PR provides both the fixed-ΔIW and authoritative-O entry points; deciding which one drives fO2 once Mariana's mantle-redox model is active is the interface we'll design on the PROTEUS side (the from_mantle_redox work, PROTEUS #653). I've addressed the inline comments above.

The authors entry packed the name and email into a single quoted string, so
there was no separate email field. Split it into proper name and email fields,
update the contact email, and add a second author.
The version and date-released sat at 26.05.10 while the latest tag is 26.05.13.
setuptools-scm only writes src/calliope/_version.py, not CITATION.cff, so the
citation metadata is bumped by hand to match the current tag.
@timlichtenberg
Copy link
Copy Markdown
Member Author

timlichtenberg commented May 31, 2026

Two other things (sorry):

  1. The version reported by calliope.__version__ in the Python REPL is strangely formatted? I get '26.5.13.post1.dev86+g56dbcbc39', which seems like it would break some of the checks in PROTEUS. Why not just return 26.5.13? Also, this is not consistent with 26.5.10 in the metadata files.

  2. The package author list only includes myself (with an old email address). Please feel free to add yourself to this, since you're also a maintainer on PyPI.

Thanks, both addressed.

  1. The .post1.dev86+g… suffix is expected setuptools-scm output rather than a bug: with version_scheme = "no-guess-dev", any checkout past the latest tag (currently 26.05.13; your editable build was 86 commits past it) reports 26.5.13.post1.devN+g<hash>, while a release install (PyPI 26.5.13, or a checkout at the tag) reports the clean 26.5.13. I kept that scheme deliberately: it's honest about a build being ahead of the tag and avoids stamping a future calendar date. It also doesn't break the PROTEUS checks, coupler.py parses only the first three numeric components (26.5.13.post1.dev86+g…(26, 5, 13)) and doctor.py uses packaging.version.Version (PEP 440 compliant); PROTEUS's own version is dev-formatted the same way. The 26.5.10 you saw was a stale hardcoded version: in CITATION.cff, which setuptools-scm doesn't touch and would have gone stale again at the next release. Rather than re-bump it by hand, I've removed the static version and date-released (and the placeholder doi) from CITATION.cff entirely, so the git tag and _version.py are the single source of version truth.
  2. Added myself to the pyproject.toml author list with tim.lichtenberg@rug.nl, and fixed the entry that had packed your name and email into one string (your email there is now h-nicholls@pm.me).

The version and date-released fields had to be bumped by hand at every release
and went stale in between. Both are optional in CFF 1.2.0, and the
authoritative version already lives in the git tag (and src/calliope/_version.py
via setuptools-scm), so the citation metadata no longer duplicates it. GitHub's
citation widget and CFF validation both work without these fields.
The doi field held a 10.xx/xx.xx placeholder rather than a real identifier, and
a non-resolving DOI is worse than none. Removed until a real Zenodo deposit
exists, at which point Zenodo fills in the per-release DOI and version.
equilibrium_atmosphere and equilibrium_atmosphere_authoritative_O now accept an
optional p_max (default P_GUESS_MAX_BAR) and forward it to the Monte-Carlo
cold-start helpers, so a high-pressure (sub-Neptune) run can raise the guess
ceiling without changing the solver box. Tests cover the forwarding on both
entry points and the default.
Rename the public cold-start ceiling parameter to p_guess_max across the draw
helpers and both solver entry points, so it reads unambiguously against the
separate P_CEILING_BAR solver box and matches the PROTEUS config field. Reject
a non-finite or below-floor ceiling at the draw helpers: an inverted
log-uniform range is not caught by numpy, and a non-finite ceiling feeds
inf/nan into the draw, so both now fail loudly. Clarify in the docstrings that
the ceiling sets where the cold start samples within the fixed solver box, not
the maximum pressure the solver can accept. Add tests for the rejection
contract and for forwarding on the restart redraw, not just the initial draw.
@timlichtenberg timlichtenberg requested a review from nichollsh May 31, 2026 22:54
Copy link
Copy Markdown
Member

@nichollsh nichollsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for implementing my suggestions. I believe that the changes since my last PR address all of these, and the tests/docs still pass on my laptop.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great - thanks! I think this also matches with our physical expectations from the fO2 buffer parameterisations as functions of $\Delta \rm IW$ and temperature.

# out of step. Fischer is the more recent IW parameterisation and tracks the
# atmodeller Hirschmann composite to within ~0.2 dex across magma-ocean
# temperatures (see docs/Explanations/cross_backend_comparison.md).
DEFAULT_FO2_MODEL = 'fischer'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. I think it's better to have this single default global variable than multiple keyword parameters.

Comment thread src/calliope/solve.py
ptot = get_total_pressure(p_d)

if ptot < 1e-30:
return 1.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. Something to perhaps keep in mind for the future, but I suppose that there's no "best" answer for how to define the MMW for a non-existent atmosphere.

Comment thread CITATION.cff
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have checked with cffconvert --validate that this still suits the cff schema.

@timlichtenberg timlichtenberg merged commit b71cb88 into main Jun 1, 2026
8 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in PROTEUS Development Roadmap Jun 1, 2026
@timlichtenberg timlichtenberg deleted the tl/fo2-source-framework branch June 1, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants