Authoritative-oxygen mode by timlichtenberg · Pull Request #20 · FormingWorlds/CALLIOPE

timlichtenberg · 2026-05-14T20:17:37Z

@nichollsh, this is ready to review. It is the chemistry half of the coupled oxygen-accounting change and the dependency that has to merge first, so it would be great to merge it as soon as you can, ideally by Wednesday 3 June, so we can move on to the PROTEUS PR (FormingWorlds/PROTEUS#678). It is a focused, self-contained change: a new authoritative-oxygen entry point, the default iron-wustite buffer, and the surrounding test and docs infrastructure.

Paired PR and merge order. PROTEUS imports equilibrium_atmosphere_authoritative_O from this branch. Merge order is this PR (CALLIOPE#20) first, then PROTEUS#678. The full reviewer set and the whole-planet oxygen accounting context live on the paired PROTEUS PR. General comments are welcome here too.

Description

Three coupled changes plus the test and docs infrastructure that surrounds them.

(1) Authoritative-oxygen entry point. A new equilibrium_atmosphere_authoritative_O takes total O mass alongside H/C/N/S and back-solves for the iron-wustite buffer offset that produces that O budget at equilibrium. The legacy equilibrium_atmosphere (fO2 in, O derived) is unchanged. Chemistry half of the whole-planet oxygen accounting framework on the PROTEUS side (PROTEUS PR #678); legacy mode stays the default for ordinary buffered-mode work.

(2) Default IW buffer flipped to Fischer 2011. With the legacy O'Neill default, raw cross-backend disagreement (CALLIOPE vs atmodeller) at the canonical Earth fiducial is around 0.42 dex; with Fischer it drops to around 0.16 dex, residual after analytical buffer correction at or below 0.1 dex below 2000 K and around 0.28 dex at 3000 K. Fischer is within about 0.2 dex of the Hirschmann composite (atmodeller's default) across the magma-ocean range; O'Neill diverges to around 1 dex at the hot end. OxygenFugacity('oneill') remains available.

(3) Diataxis docs pass for the dual-mode chemistry, the cross-backend comparison, and primary-source verification of the surrounding solubility and oxygen-fugacity reference tables.

Also rolled out: the same test and docs infrastructure other PROTEUS-ecosystem submodules use (markers, anti-happy-path linter, validation anchor pages, coverage ratchet, fast PR gate, nightly full gate). Each of the five physics sources now carries at least one reference-pinned test against a published benchmark or analytical limit, with a per-source anchor page recording the citation.

No physics laws change. Both entry points share the same private helpers, Dasgupta N2 solubility, and Gaillard S2 solubility.

What this changes

Solver (src/calliope/solve.py, oxygen_fugacity.py, chemistry.py). The legacy 4-element residual becomes a 5-element (H2O, CO2, N2, S2, fO2_shift) system in the new entry point, the fifth equation enforcing total atmospheric + dissolved O against the user target. Hardening: NaN- and complex-aware clipping, per-element tolerances, input validation, deterministic RNG, typed RuntimeError on non-convergence. IW-buffer default flipped to Fischer; hardcoded test references updated.

Docs. New dual-mode pages (docs/Explanations/authoritative_oxygen.md, docs/How-to/authoritative_oxygen.md) plus refs threaded through 11 sibling pages and mkdocs.yml. New docs/Explanations/cross_backend_comparison.md compares CALLIOPE vs atmodeller at the Earth fiducial (Krijt 2023 BSE, $T = 2000$ K, $\Phi = 1$), enumerates six sources of disagreement, isolates the buffer systematic analytically, quantifies the chemistry residual across $T \in [1500, 3000]$ K, with five reproducible figures. New tutorials: earth_fiducial.md, mars_fiducial.md, coupled_loop.md, phase_diagram.md. Per-source validation anchor pages under docs/Validation/ wired into the nav with footnote-link citations. ADS links converted to SciX; 0.325 Earth core mass fraction re-attributed Zeng 2016 → Wang, Lineweaver & Ireland 2018 (arxiv:1708.08718).

Primary-source verification (per-page diffs in solubility.md + oxygen_fugacity.md). Solubility envelope rebuilt one row per composition (Sossi 2023, Dixon 1995, Hamilton 1964, Wilson & Head 1981, Newcombe 2017, Armstrong 2015, Ardia 2013, Libourel 2003, Dasgupta 2022, Gaillard 2022). fO2 table: Mercury split Fe/S (Cartier & Wood 2019), Mars to $\approx$ IW (Wadhwa 2001), Earth by depth bin (Frost & McCammon 2008), asteroidal row dropped (Doyle 2019). Spot fixes: Fischer 2011 description + DOI; Ardia 2013 → Eq. (8) / Fig. 11; Nicholls 2024 H2 threshold $-2 \to -1$; Nicholls 2026 L 98-59 d → IW$-4$ to IW$-1$; Sossi 2020 Earth anchor $+3.5 \pm 0.5 \to \approx +3.5$.

Cross-backend harness (scripts/cross_backend/). Five figure modules + shared infrastructure (runners.py provides _with_calliope_buffer for swapping the class default Fischer/O'Neill). Deterministic; CSVs ship with PDFs.

Tests and infrastructure. Authoritative-O coverage in test_authoritative_O_validation.py (typed-exception contract), test_authoritative_O.py (result-dict over $\Delta\mathrm{IW} \in {-4,-2,0,+2,+4,+6}$, round-trip within 0.05 dex, random_seed reproducibility, RuntimeError on unreachable targets), test_authoritative_O_monotonicity.py (slow tier). Invariants suite (test_invariants.py, test_invariants_unit.py, hypothesis fuzz in test_invariants_hypothesis.py) covers conservation, positivity, VMR closure, Keq identity, solubility monotonicity, calibration edges. New markers @pytest.mark.physics_invariant and @pytest.mark.reference_pinned, one each per physics source. AST linter tools/check_test_quality.py blocks regression past a one-way baseline. Fast PR gate (unit + smoke) and nightly full gate, ratcheting toward 90% ecosystem ceiling. tests.yaml runs Ubuntu + macOS / Python 3.12 + 3.13; nightly.yml adds Codecov upload.

Why

PROTEUS issue #677 (M_atm > M_planet at high H_ppmw) reduces to a structural asymmetry: every M_atm aggregation includes O (H2O/CO2/SO2 molecular masses) but every M_planet aggregation explicitly skipped O on the assumption the IW buffer made it derivable. That breaks when the user wants O as a primary budget input; the solver needs to support both directions. The buffer flip lands at the same time because the cross-backend comparison driving it depends on the new entry point for its round-trip closure check. The docs verification corrected entries across several pages where the previous text did not match the cited papers. The test infrastructure brings CALLIOPE in line with the rest of the ecosystem and codifies the invariants the dual-mode chemistry must satisfy.

Validation of changes

Full unit + smoke suite green locally (macOS, Python 3.12); linter, structure validator, ratchet guard pass.
Round-trip recovery within 0.05 dex at $\Delta\mathrm{IW} \in {-2,0,2,4,6}$; result-dict contract covers $\Delta\mathrm{IW} \in {-4,-2,0,+2,+4,+6}$. Input-validation surfaces typed exceptions on every contract; unreachable targets raise RuntimeError; random_seed produces bit-identical output across calls.
Property-based fuzz confirms monotonicity, finiteness, and non-negativity of the Dasgupta + Gaillard laws, both IW buffers, and the six modified Keq fits across the full input space.
Cross-backend harness produces deterministic CSV + PDF outputs from python -m scripts.cross_backend.figN_*.
Reference-pinned anchors pass: Fischer 2011 IW at 2000 K with O'Neill 2002 discrimination guard; JANAF (Chase 1998) closed-form $K_{\mathrm{eq}}$ for H2O at 2000 K with Schaefer cross-check; Sossi 2023 peridotite H2O and Gaillard 2022 S2 against the published parameterisations; Wang, Lineweaver & Ireland 2018 0.325 core mass fraction via the structure module; forward / authoritative-O round-trip at the Earth fiducial.
Zensical docs build clean (no unresolved-link warnings, all Validation/* pages reachable from the nav). Pre-commit, ruff format, ruff check pass.

Notes for review

Chemistry half of a coupled change. The PROTEUS-side consumer (the wrappers in src/proteus/outgas/calliope.py and src/proteus/outgas/atmodeller.py that route planet.fO2_source = "from_O_budget" to the new entry point) lives in PROTEUS PR #678 and depends on this PR landing first. PROTEUS pyproject.toml currently pins fwl-calliope to this branch via git URL; once this PR merges and a release ships, the pin reverts to a normal version. hypothesis >= 6.0 is added to [project.optional-dependencies] develop; no new runtime dependency.

Checklist

I have followed the contributing guidelines
My code follows the style guidelines of this project
I have performed a self-review of my code
My changes generate no new warnings or errors
I have checked that the tests still pass on my computer
I have updated the docs, as appropriate
I have added tests for these changes, as appropriate
I have checked that all dependencies have been updated, as required

Bit-for-bit equivalent refactor that prepares for an authoritative-O solver mode. Three internal helpers added: _get_partial_pressures, _atmosphere_mass, _dissolved_mass. Each takes fO2_shift as an explicit positional argument rather than reading ddict['fO2_shift_IW']. The corresponding public functions get_partial_pressures, atmosphere_mass, dissolved_mass become one-line wrappers that pass ddict['fO2_shift_IW'] through. No behaviour change. All 126 existing tests pass unchanged. Motivation: the upcoming `equilibrium_atmosphere_authoritative_O` entry point treats fO2_shift as a 5th unknown in the chemistry solver (alongside the four primary partial pressures), so the underlying physics functions must accept fO2_shift as an explicit argument rather than read it from a shared dict. Plumbing it explicitly also makes the data flow easier to follow when chasing chemistry bugs.

New solver mode for whole-planet oxygen accounting (PROTEUS Path C following the issue #677 bookkeeping fix). The legacy mode takes fO2_shift_IW as a config input and derives the atmospheric+dissolved O mass that follows from chemistry; the new mode inverts the relationship and takes the total O mass as the budget input, returning the fO2_shift_derived that is self-consistent with that budget. Mathematical model. Four primary partial pressures (H2O, CO2, N2, S2) become five unknowns with fO2_shift_IW joining as the fifth. The four existing mass-balance residuals for H, C, N, S are joined by a fifth for O. Both ``func_authoritative_O`` and the main entry point reuse the existing physics helpers _get_partial_pressures, _atmosphere_mass, _dissolved_mass (added in the previous refactor commit), so all chemistry is shared bit-for-bit with equilibrium_atmosphere. Solver. scipy.optimize.fsolve with trust-constr fallback, Monte-Carlo restarts on cold-start; mirrors the existing solver. Pressure bounds [0, 1e7] bar. fO2_shift bounds [-12, +12] log10 units (physical range ~[-6, +8]; the wider bound gives trust-constr room without escaping to non-physical territory). On restart the fO2 guess redraws from Uniform(-6, +8) so the solver gets a chance from a different basin. Output dict. Same fields as equilibrium_atmosphere, plus two new keys: ``fO2_shift_derived`` (converged value) and ``O_res`` (5th mass-balance residual, kg). PROTEUS-side wrappers consume the same fields regardless of which mode ran. Round-trip smoke. With the legacy mode's O_kg_total as the target, the new mode converges to fO2_shift_derived = fO2_shift_IW to within 1e-4 and recovers the legacy M_atm to within 1e-6 relative. All 126 existing CALLIOPE tests still pass; formal monotonicity sweep and round-trip tests follow in the next commit.

…oints The new entry point's residual chain was crashing with ZeroDivisionError on the majority of realistic inputs because every species pressure could collapse to zero (via a NaN-clip that folded NaN into 0) and the mean molar mass division then divided by zero. Once that path was unblocked, several smaller robustness gaps in the same code became visible too. Robustness in the chemistry chain (affects both solver modes): - NaN/Inf clip in _get_partial_pressures: explicit isfinite check replaces the bare max(0.0, x), which preserved NaN whenever NaN was the second arg. - Zero-pressure guard in atmosphere_mean_molar_mass: when ptot < 1e-30, return a sentinel of 1.0 g/mol; downstream mass tallies multiply by zero anyway so the residual gets a clean "empty atmosphere" signal. - CO2 contribution to atmospheric O is now unconditional, mirroring the unconditional CO2 contribution to atmospheric C. The previous gate on is_included('CO2') made element bookkeeping asymmetric: in any state where CO2_included is 0 but the solver carries a non-zero CO2 partial pressure, the O tally lost atoms that the C tally still counted. Hardening of equilibrium_atmosphere_authoritative_O entry point: - target_d validation: all five element keys (H, C, N, S, O) must be present, finite, and non-negative. Missing keys raise KeyError; non-finite or negative values raise ValueError. Previously only missing 'O' was checked, and other bad inputs raised opaque KeyError or RuntimeError from deep inside fsolve. - ddict validation: M_mantle, Phi_global, T_magma, gravity, radius must be present and physical. T_magma outside the Dasgupta/Gaillard solubility-law calibration range (1373-1973 K) raises a single UserWarning so users notice when the solver is extrapolating. - fO2_hint validation: must be finite and within [-12, +12]. - nguess and nsolve must be >= 1 (previously nguess=0 raised UnboundLocalError on the post-loop success check). - Each fsolve/trust-constr call is wrapped in try/except for ZeroDivisionError, FloatingPointError, ValueError; a crashed attempt is treated as a failed Monte-Carlo restart so the loop continues instead of propagating the crash. The post-loop RuntimeError contract is now honoured. - Per-element residual gate replaces the legacy max-of-targets scalar tolerance: each residual must satisfy |res_i| <= max(target_i * rtol, atol / 5). The legacy gate was dominated by the largest target (O budgets are ~5 orders of magnitude larger than N) and accepted up to ~10% N error. - random_seed kwarg threads a np.random.Generator through the Monte-Carlo restart so two calls with the same seed produce bit-identical output. None preserves the previous global-state behaviour for callers that have not opted in. Verification: - 121 existing unit + smoke tests pass unchanged. - 40 random Earth-like seeds: 40 successes, 0 ZeroDivisionError (previously 36 crashed). - 35-case adversarial input runner: 0 crashes (previously 7 crashed with ZeroDivisionError); every adversarial input either raises a typed exception (ValueError / KeyError / TypeError / RuntimeError) or returns a valid result. - Same-seed reproducibility check: two calls with random_seed=42 return bit-identical fO2_shift_derived and all four primary partial pressures.

Cover every typed-exception path the entry point promises in its docstring: missing target_d keys raise KeyError naming the missing element; non-finite or negative target_d values raise ValueError; fO2_hint outside [-12, +12] raises ValueError; missing or unphysical M_mantle / Phi_global / T_magma raise the appropriate exception; nguess and nsolve below 1 raise ValueError (the legacy nguess=0 UnboundLocalError path is explicitly checked). 61 parametrised cases run in 0.33 s; all unit-tier.

The core correctness property is that any fO2_shift_IW value the legacy solver accepts must be recovered by the new mode when fed the same problem's O budget back as the authoritative O target. If the round-trip fails by more than the per-element tolerance, the extended 5-equation system has a different fixed point than the legacy 4-equation one, which would be a real bug. The parametrised round-trip covers five fO2_shift values across the reducing-to-oxidising range. Also covers the happy-path output schema, the per-element residual tolerance gate, same-seed bit-identical reproducibility, and the RuntimeError-on-impossible-target contract documented in the entry point's docstring. 12 tests, smoke-tier, ~2 minutes wall time.

The 5-unknown system the authoritative-O solver inverts is well-posed only if O_kg_total(fO2_shift) is monotonic on the working range at fixed (H, C, N, S, T_magma, M_mantle, Phi_global). If the curve has a turning point, the inverse solver can find multiple roots from different cold starts and lose determinism even when seeded. The pre-fix adversarial review confirmed monotonicity empirically on [-5, +5] at 1800 K and 3000 K. Lock that finding in as a property test so a future chemistry refactor that breaks it fires the test instead of corrupting downstream simulations. Earth-like H/C/N/S budget at 1500 K and 1800 K, sweep 11 fO2_shift points across [-4, +6], confirm strictly increasing O_kg_total. Empirical dynamic range is ~3x over 10 dex; the test asserts > 1.5x so a catastrophic regression that flattens the curve catches. Marked slow because each sweep runs 11 legacy solves.

I added documentation for the second CALLIOPE solver mode, which takes oxygen as a fifth elemental budget and treats the IW-buffer offset as an unknown. The two modes share the same physics functions; only the unknown set and the residual equation count differ. What is new: - Explanations/authoritative_oxygen.md: the augmented 5-residual mass balance, how fO2_shift enters the chemistry as an unknown, the per-element residual gate that replaces the legacy scalar tolerance bound, the Monte-Carlo restart with fO2 redraw window, the cross-backend IW-buffer divergence with atmodeller (Hirschmann vs O'Neill & Eggins), and the calibration-range and solid-FeO limitations. - How-to/authoritative_oxygen.md: a recipe page with four runnable patterns (5-element solve, round-trip from buffered mode, warm-start via p_guess including the fO2 seed, deterministic mode via random_seed) plus the common error modes. Cross-references and clarifications on existing pages: - mass_balance.md, oxygen_fugacity.md: reframed as the buffered-mode page and pointed to the dual mode. - Explanations/proteus_coupling.md: showed how the PROTEUS wrapper dispatches between the two CALLIOPE entry points based on planet.fO2_source. - How-to/configuration.md: documented the 5-key target dict variant for the authoritative-O entry point. - Reference/api/index.md: listed the new public entry point. - index.md: reworded the feature bullet so the reader sees both modes side by side from the landing page. mkdocs.yml gets the two new nav entries. Zensical build is clean (no broken anchors, 3.8 MB site).

…ings I went back over the docs against the actual source and against the voice rules and fixed several places where the prose was either slightly wrong or read as a change-narrative rather than a current- state statement. What changed: - Tests cited by exact pytest node-ID. The earlier doc named tests that did not exist; the right names are TestRoundTrip::test_round_trip_recovers_fO2_within_tolerance and TestMonotonicity::test_O_kg_total_strictly_increasing_with_fO2. - Monotonicity range narrowed to what the tests actually verify ([-4, +6] at two T_magma values, with a Phi_global = 0.3 sibling check over [-2, +4]). The prior text claimed [-6, +8] over five (T_magma, Phi_global) pairs, which neither test covers. - TRUNC_MASS source file corrected (it lives in solve.py, not constants.py). - Per-element tolerance motivation now makes explicit that the buffered mode never has this problem because O is not in its residual vector. The 5-order-of-magnitude N vs O target imbalance only matters once O enters the residual. - "fO2 bounds are looser than the redraw window" was reversed. They are wider, not looser; doc now says wider. - "is now" / "the only change is" / "is now a variable" phrasings on the concept page rewritten as current-state statements (each mode is described in its own right, comparisons live in the explicit comparison table). - Dasgupta N2 redox dependence is "substantially less dissolved N", not "slightly less"; the law carries an exponent of -1.6 * dIW, so the swing per dex is order-of-magnitude. - Cross-backend Hirschmann vs O'Neill & Eggins note drops the "growing with temperature" qualifier, since the trend at lower T is not numerically established from the CALLIOPE side. - Buffered-mode contrast says "fewer unknowns", not "simpler contract"; the restart-count estimate is hedged as empirical. - The PROTEUS dispatch snippet in the proteus_coupling Explanation now passes config.outgas.calliope.nguess / nsolve rather than hardcoded literals, matching the real wrapper. An adjacent paragraph documents the schema defaults and the wrapper's authoritative-O restore of hf_row['O_kg_total']. - RuntimeError "max attempts: 7500" message in the how-to is now "max attempts: N", with a sentence on the PROTEUS override. - Recipe 4 (reproducibility) replaces float == with math.isclose, and the prose hedges "bit-reproducible" to "reproducible to solver tolerance". - oxygen_fugacity.md said "two channels" while listing four subsections; fixed to "four channels". The authoritative-oxygen page deep-links into this section and the count was already inconsistent. - index.md chemistry-couples bullet uses ASCII hyphens, not en-dashes. The en-dash carve-out is only for page/volume ranges in citations; species pairs do not qualify. Zensical build remains clean, 3.8 MB site, no broken anchors.

The earlier docs commit added the new concept and how-to pages for the authoritative-oxygen mode but left the model overview, the code architecture diagram, the solubility page, and the two PROTEUS-coupling how-to / theory pages describing CALLIOPE as a buffered-mode-only package. This commit makes the dual mode visible from the broadest entry points a reader hits. Explanations/model.md: - The "what is in the model" bullets now describe H/C/N/S as always solved and O as either derived (buffered) or budgeted (authoritative- O), with a link to the dedicated page. - The IW-buffer bullet says fO2_shift is an input under the buffered mode and a solver unknown under the authoritative-O mode. - The "Mathematical statement" section presents one mass-balance equation per solved element, then names the two modes explicitly as 4x4 vs 5x5 systems sharing the same physics. - The "Why four primaries" admonition is reworded so it remains true in both modes: O is either not solved (buffered) or carried as a separate scalar unknown (fO2_shift), never a new primary partial pressure. Explanations/code_architecture.md: - The solve.py inventory now lists both solver entry points first, then the shared physics functions, then the cold-start generators for the 4-unknown and 5-unknown problems, then the residual / objective pairs and the private versions that thread fO2_shift positionally. - The call graph shows both entry points diverging at the cold-start generator and converging at the residual, with the private versions visible. Explanations/solubility.md: - Added See-also entries for oxygen_fugacity.md (the IW-buffer parameterisations behind the Gaillard and Dasgupta fO2 dependence) and authoritative_oxygen.md (how those fO2-dependent terms become part of the 5-residual mass balance once fO2 is solved for). How-to/proteus_coupling.md: - New "Selecting the fO2 dispatch" section documents the planet.fO2_source TOML field with both values, explains what each one does, and links to the dispatch sequence diagram and the authoritative-O concept page. - The Redox-state section now clarifies that fO2_shift_IW is the buffer offset under user_constant and the initial-guess seed under from_O_budget. How-to/usage.md: - Next-step block points readers with an oxygen budget at the authoritative-O recipe. Zensical build remains clean, no broken anchors.

…nges The authoritative-O entry point used to emit a UserWarning when T_magma fell outside [1373, 1973] K, with a message naming "Dasgupta N2 (1373-1873 K) and Gaillard S2 (1473-1973 K)" calibration ranges. Both sets of numbers were inventions: I went back to the primary sources and neither paper supports them. What Dasgupta et al. (2022, GCA 336, 291) Equation 10 actually reports as its calibration footprint (n=137 compiled data, from their paper just above the equation): T range 1050-2327 °C (1323-2600 K) P range 1 bar to 8.2 GPa fO2 range IW-8.3 to IW+8.7 What Gaillard et al. (2022, EPSL 577, 117255) Equation 10 reports about itself: a refit of the O'Neill & Mavrogenes (2002) model combined with 8 other experimental sources (n=369), all calibrated at 1 atm with fO2 in the range IW-1 to FMQ+0.1. The paper does not state a temperature calibration range; its modelling section works at a fixed T=1500 °C. The earlier CALLIOPE warning was therefore wrong in three ways: it under-stated Dasgupta's high-T span by ~700 K, it asserted a Gaillard T range that the source does not state at all, and it ignored fO2 constraints entirely. Since the two laws have incompatible calibration footprints (Dasgupta covers a wide T x P x fO2 box; Gaillard covers a narrow fO2 strip at 1 atm), a single scalar T-window cannot meaningfully represent both. The cleanest fix is to remove the in-process warning and surface the actual ranges in the docs. Removed: - The 12-line warnings.warn block in equilibrium_atmosphere_authoritative_O. - The test_T_magma_outside_calibration_warns test that asserted the warning fired at T=3000 K. Documented (in solubility.md validity-envelope table): - N2 (Dasgupta): T 1323-2600 K, P 1 bar to 8.2 GPa, fO2 IW-8.3 to IW+8.7. - S2 (Gaillard): T not stated by the paper; P 1 atm (calibration data); fO2 IW-1 to FMQ+0.1 (~ IW+3.5). - A new fO2 column on the table so the redox-dependence of N and S is visible without burying it in prose. - An adjacent paragraph naming the primary sources with ADS links, the dataset sizes, and the 1-atm-fit-applied-at-MO-pressures gap for Gaillard. Touched the dual-mode docs that previously echoed the warning text: - Explanations/authoritative_oxygen.md Limitations bullet now points at the validity envelope rather than quoting numbers. - How-to/authoritative_oxygen.md Pitfalls bullet replaces the UserWarning entry with a note that the entry point does not flag extrapolation and a link to the envelope. Tests now stand at 202 (was 203). Full suite green; docs build clean.

I rebuilt the solubility validity-envelope table and the oxygen-fugacity reference-values table from primary sources (Sossi 2023, Dixon 1995, Armstrong L.S. 2015, Ardia 2013, Newcombe 2017, Libourel 2003, Wilson & Head 1981, Hamilton 1964, Cartier & Wood 2019, Wadhwa 2001, Frost & McCammon 2008, Nicholls 2024 JGRE, Fischer 2011) and corrected the claims that did not survive the check. Solubility validity envelope: rebuilt from 7 partially-fabricated rows into 11 primary-source-verified rows, one per implemented composition. The H2O peridotite row previously said "1 bar to several kbar" but Sossi 2023 was at 1 atm total pressure. The H2O basalt Dixon row previously said "176-2021 bar" but the upper bound came from Berndt 2002 (a separate calibration with a different prefactor); Dixon 1995 is actually 176-717 bar pH2O. The CO row previously said "<=3 GPa" but Armstrong's experiments are at 1.2 GPa. The CH4 row previously said "1573-1873 K" but Ardia 2013 was at 1400-1450 C (1673-1723 K). Added the four implemented compositions that were missing from the table (basalt_wilson, anorthite_diopside, lunar_glass, libourel) with their actual calibration footprints. Reframed the fO2 column from "not constrained" (which conflated functional dependence with calibration footprint) to explicit per-row calibration ranges, with a disambiguation paragraph below the table. Ardia CH4 attribution: the paper's Figure 6 is Raman spectra, not the solubility best-fit. Corrected the doc to cite Eq. (8) with ln K0 = 4.93 and DeltaV = 26.85 cm^3/mol, plotted as Fig. 11. Oxygen-fugacity buffer attribution: the Fischer 2011 description previously claimed "calibrated against high-pressure (~25 GPa) experimental data" but the 6.94059 - 28180.8/T fit reproduces the 1-bar buffer curve, which Fischer 2011 takes from Chase 1998 NIST-JANAF tabulation. Rewrote the description accordingly. Oxygen-fugacity reference values: Mercury split into Fe-based (IW-2.8 to IW-4.5) and sulphur-based (IW-5.4 via Namur 2016) estimates per Cartier & Wood 2019. Mars mantle corrected from "-3 to 0" to "approximately IW" with shergottite parent melts at IW-1.0 to IW+1.9 per Wadhwa 2001. Earth mantle expanded into three depth-binned rows (8 GPa, transition zone, lower mantle) per Frost & McCammon 2008. Dropped the asteroidal-material row because Doyle 2019 reports a range across chondrite types, not a single representative value. Nicholls 2024 H2-dominated threshold corrected from DIW <= -2 to DIW <= -1 (the paper says "fO2 <= IW - 1; see also Figure 6"). Nicholls 2026 L 98-59 d range corrected from DIW ~ -1 to "between IW-4 and IW-1" (the paper's stated retrieval range). Sossi 2020 DIW = +3.5 +- 0.5 corrected to "approximately +3.5"; the +-0.5 uncertainty was not in the paper. Cross-page propagation: model.md validity-range bullet updated to recommend Hamilton 1964 (not Dixon) for surface pressures above 1 kbar, and to quote the actual extrapolation factor for Sossi/Newcombe applied at kbar pressures. How-to/proteus_coupling.md reference-values table updated to match the new oxygen_fugacity.md framing. Tutorials/firstrun.md H2-dominated threshold updated to match Nicholls 2024. All CALLIOPE tests pass (198/198 unit). Zensical docs build clean.

The four-marker breakdown in docs/Explanations/testing.md was last set when the suite had 96 unit + 25 smoke + 5 integration + 0 slow tests. The suite has grown since: pytest --collect-only against the four markers now reports 156 unit + 37 smoke + 5 integration + 4 slow (the four slow tests are the authoritative-O monotonicity-regime property tests). Updated the doc table to match the current state.

This new test file pins eleven invariants that must hold for any valid CALLIOPE result, plus three additional checks at the edges of the solubility-law calibration footprints. The eleven invariants: 1. Per-element mass conservation: atm + liquid == total for each of H, C, N, S, O. 2. Pressure positivity: every species partial pressure is >= 0. 3. VMR closure: sum of volume mixing ratios is unity over all eleven species. 4. Total-pressure consistency: P_surf == sum of species partial pressures. 5. Atmospheric mass consistency: M_atm == sum of species column masses. 6. fO2 reconstruction: in authoritative-O mode, the derived shift reproduces log10(p_O2) - log10(fO2_IW_buffer(T_magma)). 7. Modified equilibrium constant identity: p_B / p_A matches ModifiedKeq.<method>(T, fO2_shift) at the converged state for both H2O-H2 and CO2-CO couples. 8. Dissolved-mass non-negativity: every <species>_kg_liquid is >= 0. 9. CO2 stoichiometric atom counting: the C contribution from CO2 equals the molar-mass ratio C/CO2 times CO2_kg_atm. 10. Monotonicity of Gaillard S2 solubility as fO2 decreases at fixed p_S2 and T (tested at the solubility-function level to isolate the law from multi-species solver feedback). 11. Monotonicity of Dasgupta N2 solubility as fO2 decreases at fixed p_N2, p_tot, T (same isolation strategy). The three calibration-edge additions: - Dasgupta plateau at the reducing edge (dIW <= -3) where the paper's Fig. 7 notes plateau-like N solubility. Verifies both the buffered solver converges and the Dasgupta law stays numerically well-behaved down to dIW = -8. - Gaillard sulfide-saturated regime at the oxidising edge (dIW >= +4) where the sulfate regime begins and CALLIOPE's sulfide-only chemistry extrapolates. Verifies SO2 dominance over S2 at dIW >= +5 and that the dissolved-S fraction drops at oxidising conditions. - p_guess warm-start: a warm start with the cold-solve result lands in the same basin and recovers the same partial pressures. Includes a sad-path with an off-by-100x bad guess that the Monte-Carlo restart catches. Every test class is anti-happy-path: each parametric sweep covers an edge of the physical regime, and each class includes at least one sad-path companion (zero mantle mass, extreme reducing, negative S2 pressure, off-by-100x warm start, etc.). Marker discipline: classes are explicitly marked with exactly one of unit (for stoichiometric / direct-solubility-function tests that do not invoke the multi-species solver) or smoke (for tests that exercise equilibrium_atmosphere or equilibrium_atmosphere_authoritative_O). 29 unit + 57 smoke = 86 new tests, no overlap with existing markers. Total CALLIOPE test count rises from 202 to 288, all passing. Wall-time for the new file alone: 1.2 s.

Property-based fuzzing complements the fixed parametric sweeps in tests/test_invariants.py by exercising the Dasgupta N2 / Gaillard S2 / O'Neill IW / Fischer IW / ModifiedKeq helpers across the full physically-relevant input space, not just the discriminating points the deterministic suite covers. Seven hypothesis tests, each running 100-200 examples per call, covering: Dasgupta monotonicity in fO2 at random (p_N2, p_tot, T); Dasgupta finiteness; Gaillard monotonicity in fO2 at random (p_S2, T); Gaillard finiteness; O'Neill IW finiteness over T and dIW; O'Neill-vs-Fischer agreement within 1 dex over 1800-2200 K; and ModifiedKeq finiteness + positivity across all six methods. The tests are marked ``slow`` (per the existing 4-marker scheme, no new marker introduced) so the PR gate is not slowed. Wall-time for the new file alone is 1.15 s; the nightly runs them. Adds ``hypothesis >= 6.0`` to the develop optional-dependencies in pyproject.toml. Test count: 288 -> 295 (7 added, all slow tier). PR-gate count: 284 (unchanged).

The previous TestHappyPath class in tests/test_authoritative_O.py exercised only the canonical Earth-like input (dIW=4.0) and was a happy-path-only regression check: both methods asserted that the result-dict contract held at one point in the physical regime. The PROTEUS test rules require every new or modified test function to cover at least one edge case and at least one physically unreasonable input. The rewrite renames the class to TestPhysicalOutputContract (more honest about what it tests) and: - Parametrises test_returns_physical_output over six dIW values spanning the physical magma-ocean regime (-4, -2, 0, +2, +4, +6). The result-dict contract now has to hold across reducing, neutral, and oxidising endpoints, not just at the fiducial. - Parametrises test_residuals_within_tolerance over three dIW values (-4, 0, +4). Per-element residual gates have to clear at the extremes too. - Adds test_negative_T_magma_raises, the unphysical-input sad path: feeding a negative T_magma to the solver must raise (since both IW buffer formulae diverge as 1/T and T*log(T)). Net effect: one happy-path-only class (2 tests, 1 dIW) becomes one anti-happy-path contract class (10 tests covering 6 dIW values + a sad path). The other three classes in this file (TestRoundTrip, TestReproducibility, TestConvergenceFailure) already satisfy the anti-happy-path rule (TestRoundTrip parametrises over 5 dIW values, TestConvergenceFailure verifies an unreachable target raises) and need no changes. Full suite count: 295 -> 303, all passing.

…badges Two presentational fixes on the docs landing page: - Move the H1 ``# CALLIOPE`` heading above the badge block to match the layout of the sister repo (Aragog) and to give the page a clear title line before the CI status indicators. - Append ``&color=brightgreen`` to the codecov, Unit Tests, and Integration Tests badge URLs so they render in the exact same shade of green as the Docs badge regardless of the underlying status algorithm. The codecov badge's auto-gradient would otherwise drift by a few percent of saturation depending on the coverage value, and the test workflow status badges would drift if a transient flake flipped them yellow on a given main-branch run. - The Testing-suite page (docs/Explanations/testing.md) gets the same colour override on its codecov + test badges for visual consistency with the landing page.

The round-trip paragraph in docs/Explanations/authoritative_oxygen.md wrapped the result-dict key inside a math-mode tuple with ``\texttt{O\_kg\_total}``. MathJax under Zensical does not render ``\texttt`` cleanly: the underscore-escape backslashes leak through to the page as literal characters, so the prose ended up showing ``(H, C, N, S, O\_kg\_total)`` with visible backslashes between the key and each underscore. Reflow the sentence so the math tuple uses plain ``(H, C, N, S, O)`` and the ``O_kg_total`` dict key is referenced as inline code outside the math span. The intent (round-trip from buffered to authoritative-O via the recovered O budget) is unchanged; only the typesetting moves.

The "Selecting the fO2 dispatch" section in docs/How-to/proteus_coupling.md previously gave a three-line TOML snippet showing the fO2_source field in isolation, with no example of what the surrounding planet.elements block needs to look like for each mode. A first-time reader had to piece the full config together from the Choosing-the-elemental-budget and Selecting-the-fO2-dispatch sections, and the validator rule that ic_chemistry is rejected under from_O_budget was not stated anywhere. Rewrite the section so it gives: - A dispatch table mapping each fO2_source value to its input, its solved-for quantity, and the underlying CALLIOPE entry point. - A complete minimal-style TOML config for the buffered-fO2 mode (fO2_source = "user_constant", O_mode = "ic_chemistry", fO2_shift_IW as the buffer offset). - A complete minimal-style TOML config for the authoritative-O mode (fO2_source = "from_O_budget", O_mode = "FeO_mantle_wt_pct" with a worked numeric example at 8 wt% FeO, fO2_shift_IW as the initial-guess hint). - A per-mode walk-through of what the wrapper does on the first call (which target it builds, which entry point it calls, which hf_row keys it writes back). - A side-by-side decision table for choosing between the two. Highlights the validator rule (ic_chemistry is incompatible with from_O_budget because the chemistry needs an O target to invert) in the authoritative-O section so the user does not hit the config-load failure first. Cross-links to the round-trip equivalence test and the Authoritative oxygen mode explanation page for the underlying physics. Docs build clean.

The job name on .github/workflows/tests.yaml was name: CALLIOPE-check (${{ matrix.os }}, py${{ matrix.python-version }}) The template placeholders evaluate fine when the job actually runs, but the job is gated by ``if: github.event.pull_request.draft == false`` so it never dispatches on a draft PR. When GitHub skips a job at the top-level ``if:`` check the matrix context is never populated, and the unexpanded literal ``CALLIOPE-check (${{ matrix.os }}, py${{ matrix.python-version }})`` appears as the check title in the PR's "Checks" sidebar. That has been visible on PR #20 for the duration of the draft. Reduce the job name to a plain ``CALLIOPE-check``. GitHub appends the matrix values automatically when the job dispatches for real (one display row per matrix cell), and a skipped draft job now shows a clean ``CALLIOPE-check`` instead of the raw template string.

Add a new Explanations page "Backend comparison" that quantifies where the two outgassing backends agree and where they disagree on the authoritative-O closure. Five publication-quality figures, each one backed by a reusable harness in scripts/cross_backend/: Fig 1: The iron-wustite buffer parameterisations (O'Neill & Eggins 2002 the CALLIOPE default, Fischer 2011, Hirschmann composite the atmodeller default) and the Hirschmann minus O'Neill offset across magma-ocean temperatures. Fig 2: Per-backend internal round-trip. Each backend goes from buffered mode at known dIW through authoritative-O and back within solver tolerance at every (T, dIW) grid point. Fig 3: Cross-backend dIW at the Krijt et al. 2023 PPVII BSE H/C/N/S volatile inventory across T = 1800 K to 3000 K. The raw gap tracks the buffer offset; the buffer-corrected residual stays within 0.1 dex up to 2000 K and grows to ~0.3 dex by 3000 K. Fig 4: Attribution at T = 2000 K. Raw 0.42 dex gap, 0.07 dex after buffer correction, 0.21 dex after also forcing solubility alignment. The buffer is the dominant systematic. Fig 5: Earth anchor. Both backends fall inside the Frost & McCammon 2008 empirical mantle-fO2 range, with CALLIOPE at the Sossi 2020 +3.5 reference and atmodeller a fraction of a dex more reducing. The harness is reusable: any inventory, any temperature, any pinned fO2_hint can be swept by parameterising the existing scripts. Raw CSV data is committed alongside the PDF and PNG outputs to lock the provenance of the embedded figures. Volatile-O reference for Earth is derived self-consistently from a CALLIOPE buffered-mode call at the Sossi 2020 IW+3.5 baseline rather than read directly from Krijt+2023 Table 2 (the latter is the Evans 2006 redox-active oxygen, which is dominated by the mantle FeO redox imbalance rather than volatile O and is not the variable the authoritative-O entry point inverts).

Replace every inline [Author Year](URL) citation across the docs body pages with the [^cite-author-year] footnote-reference syntax, with one [^cite-...]: definition per source appended at the bottom of each page. Numbered superscript links and a per-page bibliography section now render the same way the aragog docs do. Pages converted: index, code_architecture, equilibrium_chemistry, mass_balance, model, oxygen_fugacity, proteus_coupling, solubility, authoritative_oxygen, cross_backend_comparison; How-to/usage, How-to/proteus_coupling; Tutorials/firstrun. The dedicated Reference/publications page is left as-is. It is the full bibliography for the project and intentionally uses a different expanded layout. Citation keys (cite-author-year, lowercase) are stable identifiers I can reuse across pages. Per-page bibliography entries follow the aragog format: author list, italicised paper title with the DOI link where one exists, journal + volume + pages + year, SciX link. Page ranges use the en-dash per the academic-citation carve-out.

Audited every cite-* footnote definition against my references.bib and the atmodeller and CALLIOPE source code. Seven entries had titles, authors, or journal information I had drafted from memory and got wrong. Fixed: - cite-hirschmann2008: now the Library of Experimental Phase Relations (LEPR) paper in Geochemistry, Geophysics, Geosystems 9(3), Q03011, which is the source atmodeller cites for the H08 branch of the IW buffer. Was previously a fabricated PEPI 170 entry. - cite-hirschmann2021: now Geochimica et Cosmochimica Acta 313, 74-85. Was previously a fabricated American Mineralogist entry. - cite-hirschmann2012: corrected the volume from 345 to 345-348 (this is the bound multi-issue numbering used by EPSL for this paper) and added the DOI link to the title. - cite-li2015: corrected author list and title. The paper is Li, Dasgupta, Tsuno on carbon solubility in Fe-rich-alloy / silicate systems; I had drafted a different title with different authors. - cite-yoshioka2019: corrected author list and title. The paper is Yoshioka, Nakashima, Nakamura, Shcheka, Keppler on carbon solubility in CO-CO2-graphite equilibrium; I had drafted a rheology-focused title with different authors. - cite-boulliungwood2023: changed from the Geochimica et Cosmochimica Acta 343 corrigendum to the actual Boulliung & Wood 2023 paper in Contributions to Mineralogy and Petrology 178(8), 56, which is what atmodeller's BW23 cite key points to. - cite-rogers2025: corrected the title and added the second and third author. The paper is Rogers, Young, Schlichting on hydrogen-silicate miscibility for sub-Neptune interiors; I had drafted a binodal- framing title that does not match the published article. Six other entries had verified titles and authors but were missing the DOI hyperlink on the title. Added the DOI link for Frost & McCammon 2008, Nikolaou 2019, Lebrun 2013, Salvador 2017, Wadhwa 2001, O'Neill & Mavrogenes 2002, and Cartier & Wood 2019. Verification sources (in order of precedence): my references.bib Better-BibTeX export; atmodeller/docs/refs.bib for the atmodeller- internal cite keys; targeted web search against the published article for the entries not in either bib file.

"Path C" was an internal Claude-roadmap shorthand for the authoritative-O dispatch under planet.fO2_source = "from_O_budget". It is meaningless to anyone reading the docs or source cold, and leaks the AI-assisted planning vocabulary into checked-in artefacts. Rewrote the two occurrences in plain language: - docs/Explanations/cross_backend_comparison.md: "For Path C runs" -> "When running in authoritative-O mode". - src/calliope/solve.py: the docstring of equilibrium_atmosphere_authoritative_O dropped the PROTEUS issue #677 Path C reference and now describes the science motivation directly ("whole-planet O accounting where atmospheric escape and ingassing debit the same O reservoir").

The previous version used a viridis colour ramp across four temperatures and plotted recovered vs input on an identity-line view. Two issues: the four colours all rendered as essentially the same green at marker size, and at every input dIW the four T markers stacked on top of each other so only the topmost one was visible. Two changes: - Switch the colour palette to a discrete four-colour set (indigo, teal, orange, magenta) that maps cool to warm onto the four T values. Each colour is clearly distinguishable in the legend. - Switch the y-axis to the residual (recovered minus input dIW) and add a small horizontal jitter per T so all four T markers fan out side-by-side at each input dIW. A grey band marks the +/- 0.01 dex solver tolerance. The new view also surfaces two solver-edge cases that the old identity view had visually hidden behind perfect-overlap markers: CALLIOPE at T = 3000 K input dIW = +4 lands in a secondary basin and returns -5.84; atmodeller at T = 1500 K input dIW = 0 fails to converge. Both are now shown as downward triangles at the y-axis floor, in the matching T colour, and acknowledged in the caption rather than masked. The page text now also explains why T is swept (chemistry is T-dependent: the equilibrium constants, the Dasgupta nitrogen solubility, and the Gaillard sulfur solubility all carry explicit T or fO2 dependences, so the round-trip must hold across the full calibration range).

Both panels had legends parked on top of the curves: panel (a)'s lower-right corner sat across the atmodeller and predicted-from-buffer lines at the hot end, and panel (b)'s upper-right corner crossed the raw-gap line near its zero-crossing. Move both legends to lower-left, pad the y-axis on each panel so the legend box sits below the lowest converged point, and shift the tolerance-band annotation in panel (b) to the upper-right so it no longer collides with the panel label.

Bundle Roboto-Regular / Bold / Italic alongside the cross-backend plotting scripts and register them in plot_style.apply_style so every figure on the backend-comparison page now renders in the same sans-serif the Material theme uses for body text. Switch from font.family='serif' to a sans-serif stack that prefers Roboto and falls back to Helvetica / DejaVu Sans when Roboto is unavailable, and route mathtext through Roboto too so equation labels match the surrounding prose. Replace one H08-arrow-H21 unicode glyph in fig1 with the H08 / H21 form so Roboto covers it natively. Move the (b) panel label in fig3 out of the corrected-residual line by giving the bottom panel a deeper top margin. Move the "0.07 dex" bar label in fig4 below the 0.10 dex tolerance line so the two no longer collide, and park the tolerance annotation above the dashed line where no bar reaches. In fig5, lift the Sossi 2020 anchor line above the backend lines in z-order so it stays visible even when a backend lands at +3.5 exactly, add it as a fourth legend entry, and move the legend below the plot so it stops crowding the data lines. On the backend-comparison page, swap the words "harness" and "contract" for plainer phrasing throughout (and in the three sister pages where they appear), and add an explicit Krijt+2023 H / C / N / S table with the kg values used by the figures plus a short paragraph explaining why oxygen is not taken from Krijt directly and how the volatile-O reference of 1.24e22 kg was derived.

In fig3, both panel labels were sitting close to the data at the cold end of the sweep: (a) above the CALLIOPE point at T = 1800 K (3.37 dex) and (b) right next to the corrected-residual diamond (+0.08 dex). Add more top headroom in each panel so the labels clear the curves with breathing room. In fig4, move the per-element-solver-tolerance annotation from the space between bars 1 and 2 to the space between bars 2 and 3. That puts the label closer to where the line is actually doing work (the 0.07 dex bar is the only one below threshold) and well clear of every bar-top label.

Previously parked between bar 2 and bar 3 at x = 1.5, which crowded the right edge of the figure. At x = 1.0 the annotation sits above the 0.07 dex bar (which has already topped out below the tolerance line) and the text extends well clear of bars 1 and 3 on either side.

The Krijt+2023 inventory table previously appeared as a level-3 heading with no upstream context: a reader landing on the page would see a table of element masses with no explanation of why those specific numbers belong here. Promote the section to a top-level heading ("The Earth fiducial used throughout this page") and add an introductory paragraph that states three things up front: - A cross-backend comparison only carries meaning if both solvers see the same inputs, so the page picks one shared scenario and runs both backends against it. - The scenario is Earth BSE at Phi = 1, with T_magma fixed at 2000 K for Figures 4 and 5 or swept across 1800 to 3000 K for Figure 3. - Earth is the natural anchor because the modern upper-mantle Delta-IW is empirically constrained, which lets Figure 5 actually check the predictions against petrology. The element-mass table and the oxygen-derivation paragraph follow as before.

The first-run tutorial walks the reader through Earth-like inputs and asks them to print partial pressures and plot a bar chart, but previously had no canonical answer to check their output against. Add a horizontal-bar reference figure (species ordered top-to-bottom by partial pressure, summary-box of P_surf / M_atm / mean molar mass) generated by the same code path the tutorial uses, so the reader can visually confirm their numbers agree to within a few percent. The figure is produced by scripts/tutorials/fig_firstrun_reference.py, which shares the Roboto-based styling with the cross-backend comparison figures via scripts/tutorials/_style.py. Reusable for regeneration whenever the solver defaults change.

Both the project and patch Codecov gates used target: auto, which pins the bar to the base commit's coverage (about 97.7%). That makes any PR whose new lines land below the existing average fail the patch check even when coverage is well above the 90% level we hold the repo to. Set both gates to a fixed 90% target so the check reflects the actual standard and does not ratchet itself upward toward 100%.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ff50dec17

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The production path runs the unbounded fsolve (opt_solver=False), whose acceptance gate rejected a negative partial pressure but not one above the documented 1e7 bar upper bound. A mass budget that drives pH2O, pCO2, pN2, or pS2 above that bound could therefore be returned as a physical solution, where the trust-constr path would have enforced the same ub array. The gate now also rejects partial pressures above the ceiling, tested against the physical 1e7 bar bound captured before the per-guess bound collapse so a tiny-guess slot is not falsely rejected.

GitHub is removing the Node 20 runner the deprecated actions ran on. Move every action to its current Node-24 release: actions/checkout v6, actions/setup-python v6, codecov/codecov-action v6. actions/cache stays on v5, its latest major, which is already Node-24; there is no v6 yet.

nichollsh

This is fantastic! Thank you for making these changes, which will seem to resolve the oxygen mass balance issue raised by @IKisvardai recently.

The docs are looking great and are well-referenced. They build fine on my laptop. I have also tested running the scripts in scripts/, and found that they do indeed reproduce the plots in the documentation. An important step for showing model robustness. The tests also pass on my laptop (Fedora 44, Python 3.12 and 3.13).

Retaining the existing fixed- $\Delta \rm IW$ functionality makes sense. The new authoritative oxygen mode is an important addition and will immediately enable new science.

It's not clear to me how exactly this mode will couple, in PROTEUS, to Mariana's evolving fO2 parameterisation. This would also seem to over-determine fO2 through Fe redox partitioning. We can think about this carefully after this PR, since the changes here give flexibility for both approaches.

I have tested that this CALLIOPE update works with the existing PROTEUS main branch, using the dummy and AGNI demo configs.

I have some comments below but cannot find any major issues here. I might have more comments after the existing ones have been resolved. Super exciting stuff!

nichollsh · 2026-05-31T12:58:47Z

    """Modified equilibrium constant (includes fO2)"""

-    def __init__(self, Keq_model, fO2_model='oneill'):
+    def __init__(self, Keq_model, fO2_model='fischer'):


Seems like a good move to switch to the Fischer parameterisation. It is more recent and robust.

Seems like a good move to switch to the Fischer parameterisation. It is more recent and robust.

Agreed, that was the motivation: Fischer is the more recent fit and, per the backend-comparison page, tracks atmodeller's Hirschmann composite to within ~0.2 dex across magma-ocean temperatures where O'Neill drifts up to ~1 dex. O'Neill stays available as OxygenFugacity('oneill').

nichollsh · 2026-05-31T12:59:16Z

    """log10 oxygen fugacity as a function of temperature"""

-    def __init__(self, model='oneill'):
+    def __init__(self, model='fischer'):


Should the default model be set somewhere more globally? What happens if someone changes one but not the other? Or maybe this flexibility is desirable.

Should the default model be set somewhere more globally? What happens if someone changes one but not the other? Or maybe this flexibility is desirable.

Good catch, the duplicated default was a trap. I pulled it into a single DEFAULT_FO2_MODEL = 'fischer' constant, and both OxygenFugacity and chemistry.ModifiedKeq now default to it, so they can't be changed independently. A test asserts both signatures carry the same constant and that the bare default really dispatches to Fischer.

nichollsh · 2026-05-31T13:03:15Z

    ptot = get_total_pressure(p_d)

+    if ptot < 1e-30:
+        return 1.0


I think this is fine... although it worries a bit that setting the MMW=1 here could lead to problems if the code changes elsewhere. Would it not be possible to return MMW=0? This is more clearly a non-physical 'dummy' return than MMW=1 would be.

I think this is fine... although it worries a bit that setting the MMW=1 here could lead to problems if the code changes elsewhere. Would it not be possible to return MMW=0? This is more clearly a non-physical 'dummy' return than MMW=1 would be.

I'd keep it at 1.0. MMW is used as a divisor downstream in the mass-balance chain, so returning 0 would turn the empty-atmosphere case into a ZeroDivisionError, which is what the sentinel avoids. The 1.0 is only reached when total pressure is below 1e-30 bar, and the callers multiply it by p_d[k] = 0, so every element mass still comes out zero and the "atmosphere is empty here" signal propagates cleanly. I have kept the docstring note so the choice isn't mistaken for arbitrary.

Fine with me. Something to perhaps keep in mind for the future, but I suppose that there's no "best" answer for how to define the MMW for a non-existent atmosphere.

nichollsh · 2026-05-31T13:04:45Z

+    pH2O = 10 ** rng.uniform(low=-12, high=5)
+    pCO2 = 10 ** rng.uniform(low=-12, high=5)
+    pN2 = 10 ** rng.uniform(low=-12, high=5)
+    pS2 = 10 ** rng.uniform(low=-12, high=5)


Maybe we should put these pmin and pmax values somewhere more configurable. I can imagine wanting to set pmax > 1e5 bar for SN cases. Even just setting it via an optional function argument would be better.

Maybe we should put these pmin and pmax values somewhere more configurable. I can imagine wanting to set pmax > 1e5 bar for SN cases. Even just setting it via an optional function argument would be better.

Done both ways. The cold-start range, the solver box, and the fO2 bounds now come from named module constants (P_GUESS_MIN_BAR, P_GUESS_MAX_BAR, P_CEILING_BAR, FO2_*) instead of inline literals, and the two cold-start helpers take an optional p_max (default P_GUESS_MAX_BAR) so a sub-Neptune case can widen the draw without touching the solver box. Wiring p_max to a PROTEUS config knob is the natural next step, which I have left for the PROTEUS side so it can be set per run.

Maybe we should put these pmin and pmax values somewhere more configurable. I can imagine wanting to set pmax > 1e5 bar for SN cases. Even just setting it via an optional function argument would be better.

Now wired through to PROTEUS as well. PROTEUS PR #678 (FormingWorlds/PROTEUS) adds an [outgas.calliope] p_guess_max config field (default 1e5 bar, matching CALLIOPE's P_GUESS_MAX_BAR, and bounded to (0, 1e7], the solver box) and forwards it to both CALLIOPE entry points, so a sub-Neptune run can widen the cold-start draw from config.

On the CALLIOPE side I renamed the public parameter p_max to p_guess_max for clarity against the P_CEILING_BAR solver box, and the cold-start helpers now validate the ceiling (reject non-finite or below-floor values, which numpy's uniform would otherwise silently invert into a degenerate draw).

One caveat worth stating: the ceiling only sets where the Monte-Carlo cold start samples within the 1e7 bar box; it does not raise the maximum pressure the solver can accept, so a genuine surface pressure above 1e7 bar would still need the box itself raised, which I have left as a separate change.

nichollsh · 2026-05-31T13:05:38Z

+    P_surf = outdict['P_surf']
    for s in volatile_species:
-        outdict[s + '_vmr'] = outdict[s + '_bar'] / outdict['P_surf']
+        outdict[s + '_vmr'] = (outdict[s + '_bar'] / P_surf) if P_surf > 0.0 else 0.0


Maybe check for numerical tolerance here. What if Psurf=1e-99 or some very small floating point error?

Maybe check for numerical tolerance here. What if Psurf=1e-99 or some very small floating point error?

Fixed. The guard is now P_surf > P_SURF_FLOOR_BAR (1e-30 bar) rather than > 0.0, so a denormal or floating-point-noise surface pressure reports a mixing ratio of zero instead of dividing into it.

nichollsh · 2026-05-31T13:12:02Z

+
+The figure below is the same Earth-like inputs run through `equilibrium_atmosphere` on a clean CALLIOPE install. Your numbers should reproduce these to within a few percent (the residual reflects Monte-Carlo restart variability, not a calibration issue).
+
+![First-run reference](../assets/figures/tutorials/firstrun_reference.png)


This is awesome! The tutorials are looking great.

This is awesome! The tutorials are looking great.

Thanks, glad they're useful!

nichollsh · 2026-05-31T13:27:57Z

Is it really necessary to package the fonts? Seems like a lot of bloat for just making the plots look pretty. These are open source fonts, but still... the regular MPL font is fine.

Is it really necessary to package the fonts? Seems like a lot of bloat for just making the plots look pretty. These are open source fonts, but still... the regular MPL font is fine.

I'd like to keep these. Bundling Roboto keeps the figures rendering identically across machines (otherwise matplotlib falls back to whatever font the local install happens to have) and visually consistent with the rest of the proteus-framework.org docs. They're Apache-2.0 and only a few hundred KB, which I think is an acceptable cost for reproducible, on-brand figures. Happy to revisit if repo size becomes a real concern.

nichollsh · 2026-05-31T13:30:39Z

Cool plot! This seems to be consistent with what we'd expect from the physics, and demonstrates that the model is working robustly.

That being said, why does O2 not appear in any of the panels? Is it included or just not abundant? Not even at IW+6?

Cool plot! This seems to be consistent with what we'd expect from the physics, and demonstrates that the model is working robustly.

That being said, why does O2 not appear in any of the panels? Is it included or just not abundant? Not even at IW+6?

Great question, and you were right to push on it. O2 was being computed but I'd left it out of the reported-species set, so the figure was hiding it. I ran the solver across the full grid and read O2 out alongside everything else: at low T or reducing ΔIW its partial pressure is tiny (1e-16 to 1e-10 bar), but the IW-buffer fugacity climbs about nine orders of magnitude from 1500 to 3000 K (Fischer gives log10 fO2 ~ -2.5 at 3000 K), so at the hot, oxidising corner O2 becomes major: a few hundred bar at 3000 K / ΔIW +5, about 15% of P_surf and second only to CO2, and top-4 across the whole T >= 2700 K, ΔIW >= +4 corner. I've added O2 to the reported species, regenerated the figure (the green O2 cells now show in that corner), and documented it in the tutorial text.

Great - thanks! I think this also matches with our physical expectations from the fO2 buffer parameterisations as functions of $\Delta \rm IW$ and temperature.

nichollsh · 2026-05-31T13:31:57Z

These comparisons with atmodeller are great. However, it should document which version/commit of atmodeller is used here, in case the other model is changed in the future.

These comparisons with atmodeller are great. However, it should document which version/commit of atmodeller is used here, in case the other model is changed in the future.

Added. The backend-comparison page now records that the atmodeller results were produced with atmodeller v1.0.0, and notes a later release may shift the comparison.

nichollsh · 2026-05-31T13:35:40Z

    def fischer(self, T):
        """Fischer et al. (2011) IW (FeO equation of state, EPSL 304, 496)"""
        return 6.94059 - 28.1808 * 1e3 / T


Since this is now the default model, you should mention here that these coefficients are for the p=1bar isoline in their Figure 6 (derived from integration of their Equation 2).

Since this is now the default model, you should mention here that these coefficients are for the p=1bar isoline in their Figure 6 (derived from integration of their Equation 2).

Added to the fischer docstring: the coefficients are the p = 1 bar isoline of their Figure 6, obtained by integrating their Equation 2.

nichollsh · 2026-05-31T13:40:30Z

Two other things (sorry):

The version reported by calliope.__version__ in the Python REPL is strangely formatted? I get '26.5.13.post1.dev86+g56dbcbc39', which seems like it would break some of the checks in PROTEUS. Why not just return 26.5.13? Also, this is not consistent with 26.5.10 in the metadata files.
The package author list only includes myself (with an old email address). Please feel free to add yourself to this, since you're also a maintainer on PyPI.

timlichtenberg · 2026-05-31T21:08:54Z

Awesome, thanks for the rapid turnaround, as always. :)

The phase-diagram tutorial reported the top four volatile species but left O2 out of the candidate set. At high magma-ocean temperature and oxidising delta-IW the IW-buffer fugacity climbs steeply, so O2 stops being a trace gas: at 3000 K and delta-IW +5 it reaches second place, about 15% of the surface pressure, behind only CO2. Add O2 to the reported species, regenerate the figure, and describe the hot, oxidising behaviour in the tutorial text.

Lift the cold-start pressure range, the fO2-shift redraw range, the trust-constr box, and a surface-pressure floor into named module constants in solve.py so the cold-start guess range, the solver box, and the fO2-hint validation can no longer drift apart. Add an optional p_max to the cold-start helpers so a high-pressure (sub-Neptune) case can widen the pressure draw without changing the box. Report a volatile mixing ratio of zero when the surface pressure falls below the floor, rather than dividing into a denormal. Default the fO2 buffer through a single DEFAULT_FO2_MODEL constant shared by OxygenFugacity and ModifiedKeq, so the two cannot be changed independently. Record that the Fischer coefficients are the 1 bar isoline of their Figure 6, and note the atmodeller version used for the backend comparison.

timlichtenberg · 2026-05-31T21:36:39Z

Thanks for the thorough review, and for testing the scripts/ reproductions and the PROTEUS coupling. On the coupling to Mariana's evolving fO2 parameterisation and the possible over-determination through Fe redox partitioning: I agree it needs care and I'd like to keep it out of this PR. This PR provides both the fixed-ΔIW and authoritative-O entry points; deciding which one drives fO2 once Mariana's mantle-redox model is active is the interface we'll design on the PROTEUS side (the from_mantle_redox work, PROTEUS #653). I've addressed the inline comments above.

The authors entry packed the name and email into a single quoted string, so there was no separate email field. Split it into proper name and email fields, update the contact email, and add a second author.

The version and date-released sat at 26.05.10 while the latest tag is 26.05.13. setuptools-scm only writes src/calliope/_version.py, not CITATION.cff, so the citation metadata is bumped by hand to match the current tag.

timlichtenberg · 2026-05-31T21:54:31Z

Two other things (sorry):

The version reported by calliope.__version__ in the Python REPL is strangely formatted? I get '26.5.13.post1.dev86+g56dbcbc39', which seems like it would break some of the checks in PROTEUS. Why not just return 26.5.13? Also, this is not consistent with 26.5.10 in the metadata files.

The package author list only includes myself (with an old email address). Please feel free to add yourself to this, since you're also a maintainer on PyPI.

Thanks, both addressed.

The .post1.dev86+g… suffix is expected setuptools-scm output rather than a bug: with version_scheme = "no-guess-dev", any checkout past the latest tag (currently 26.05.13; your editable build was 86 commits past it) reports 26.5.13.post1.devN+g<hash>, while a release install (PyPI 26.5.13, or a checkout at the tag) reports the clean 26.5.13. I kept that scheme deliberately: it's honest about a build being ahead of the tag and avoids stamping a future calendar date. It also doesn't break the PROTEUS checks, coupler.py parses only the first three numeric components (26.5.13.post1.dev86+g… → (26, 5, 13)) and doctor.py uses packaging.version.Version (PEP 440 compliant); PROTEUS's own version is dev-formatted the same way. The 26.5.10 you saw was a stale hardcoded version: in CITATION.cff, which setuptools-scm doesn't touch and would have gone stale again at the next release. Rather than re-bump it by hand, I've removed the static version and date-released (and the placeholder doi) from CITATION.cff entirely, so the git tag and _version.py are the single source of version truth.
Added myself to the pyproject.toml author list with tim.lichtenberg@rug.nl, and fixed the entry that had packed your name and email into one string (your email there is now h-nicholls@pm.me).

The version and date-released fields had to be bumped by hand at every release and went stale in between. Both are optional in CFF 1.2.0, and the authoritative version already lives in the git tag (and src/calliope/_version.py via setuptools-scm), so the citation metadata no longer duplicates it. GitHub's citation widget and CFF validation both work without these fields.

The doi field held a 10.xx/xx.xx placeholder rather than a real identifier, and a non-resolving DOI is worse than none. Removed until a real Zenodo deposit exists, at which point Zenodo fills in the per-release DOI and version.

equilibrium_atmosphere and equilibrium_atmosphere_authoritative_O now accept an optional p_max (default P_GUESS_MAX_BAR) and forward it to the Monte-Carlo cold-start helpers, so a high-pressure (sub-Neptune) run can raise the guess ceiling without changing the solver box. Tests cover the forwarding on both entry points and the default.

Rename the public cold-start ceiling parameter to p_guess_max across the draw helpers and both solver entry points, so it reads unambiguously against the separate P_CEILING_BAR solver box and matches the PROTEUS config field. Reject a non-finite or below-floor ceiling at the draw helpers: an inverted log-uniform range is not caught by numpy, and a non-finite ceiling feeds inf/nan into the draw, so both now fail loudly. Clarify in the docstrings that the ceiling sets where the cold start samples within the fixed solver box, not the maximum pressure the solver can accept. Add tests for the rejection contract and for forwarding on the restart redraw, not just the initial draw.

nichollsh

Great! Thanks for implementing my suggestions. I believe that the changes since my last PR address all of these, and the tests/docs still pass on my laptop.

nichollsh · 2026-06-01T08:04:06Z

Great - thanks! I think this also matches with our physical expectations from the fO2 buffer parameterisations as functions of $\Delta \rm IW$ and temperature.

nichollsh · 2026-06-01T08:05:34Z

+# out of step. Fischer is the more recent IW parameterisation and tracks the
+# atmodeller Hirschmann composite to within ~0.2 dex across magma-ocean
+# temperatures (see docs/Explanations/cross_backend_comparison.md).
+DEFAULT_FO2_MODEL = 'fischer'


Thanks for adding this. I think it's better to have this single default global variable than multiple keyword parameters.

nichollsh · 2026-06-01T08:06:20Z

    ptot = get_total_pressure(p_d)

+    if ptot < 1e-30:
+        return 1.0


Fine with me. Something to perhaps keep in mind for the future, but I suppose that there's no "best" answer for how to define the MMW for a non-existent atmosphere.

nichollsh · 2026-06-01T08:14:27Z

Looks good. I have checked with cffconvert --validate that this still suits the cff schema.

timlichtenberg added 30 commits May 14, 2026 17:26

timlichtenberg requested a review from nichollsh May 31, 2026 10:35

timlichtenberg marked this pull request as ready for review May 31, 2026 10:35

timlichtenberg requested a review from a team as a code owner May 31, 2026 10:35

timlichtenberg changed the title ~~Add an authoritative-oxygen mode to the equilibrium chemistry solver~~ Authoritative-oxygen mode May 31, 2026

timlichtenberg added this to PROTEUS Development Roadmap May 31, 2026

timlichtenberg moved this to In Progress in PROTEUS Development Roadmap May 31, 2026

timlichtenberg assigned timlichtenberg and nichollsh May 31, 2026

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Comment thread src/calliope/solve.py Outdated

timlichtenberg added 2 commits May 31, 2026 13:08

nichollsh requested changes May 31, 2026

View reviewed changes

timlichtenberg added 2 commits May 31, 2026 23:13

timlichtenberg added 2 commits May 31, 2026 23:46

Fix package author metadata and add a second author

3523722

The authors entry packed the name and email into a single quoted string, so there was no separate email field. Split it into proper name and email fields, update the contact email, and add a second author.

Sync CITATION.cff version to the latest release tag

08b1553

The version and date-released sat at 26.05.10 while the latest tag is 26.05.13. setuptools-scm only writes src/calliope/_version.py, not CITATION.cff, so the citation metadata is bumped by hand to match the current tag.

timlichtenberg added 4 commits May 31, 2026 23:57

Remove placeholder DOI from CITATION.cff

36a8fd9

The doi field held a 10.xx/xx.xx placeholder rather than a real identifier, and a non-resolving DOI is worse than none. Removed until a real Zenodo deposit exists, at which point Zenodo fills in the per-release DOI and version.

timlichtenberg requested a review from nichollsh May 31, 2026 22:54

nichollsh approved these changes Jun 1, 2026

View reviewed changes

timlichtenberg merged commit b71cb88 into main Jun 1, 2026
8 checks passed

github-project-automation Bot moved this from In Progress to Done in PROTEUS Development Roadmap Jun 1, 2026

timlichtenberg deleted the tl/fo2-source-framework branch June 1, 2026 10:04


		The figure below is the same Earth-like inputs run through `equilibrium_atmosphere` on a clean CALLIOPE install. Your numbers should reproduce these to within a few percent (the residual reflects Monte-Carlo restart variability, not a calibration issue).

		![First-run reference](../assets/figures/tutorials/firstrun_reference.png)

Conversation

timlichtenberg commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What this changes

Why

Validation of changes

Notes for review

Checklist

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

nichollsh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nichollsh commented May 31, 2026

Uh oh!

timlichtenberg commented May 31, 2026

Uh oh!

timlichtenberg commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timlichtenberg commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nichollsh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

timlichtenberg commented May 14, 2026 •

edited

Loading

timlichtenberg commented May 31, 2026 •

edited

Loading

timlichtenberg commented May 31, 2026 •

edited

Loading