Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
1cef158
Run candide PBS job scripts through the container instead of conda
cailmdaley May 30, 2026
7a87bef
felt: close cleanup-rhostats-jobscripts (D1 stale premise, D2 shipped…
cailmdaley May 31, 2026
4fc948d
Build OpenMPI 5.0.x in the image; SLURM-ify candide job scripts
cailmdaley May 31, 2026
d31d4d2
ci: publish images on every branch push, not just integration branches
cailmdaley May 31, 2026
6b2b036
felt: scrub personal wayfinding from public shapepipe store
cailmdaley May 31, 2026
e599973
Fix MPI path: thread module_config_sec through to WorkerHandler.worker
cailmdaley May 31, 2026
bf9f1e2
chore: gitignore felt index WAL sidecars (index.db-shm/-wal)
cailmdaley May 31, 2026
a03baf3
felt: correct mpi-hybrid close (two-layer bug); add exec-modes-schedu…
cailmdaley May 31, 2026
9d2e523
Merge remote-tracking branch 'origin/develop' into cleanup/candide-sc…
cailmdaley May 31, 2026
7e7b744
Fix stale module names in example/pbs/config_mpi.ini
cailmdaley May 31, 2026
be0c724
felt: mpi-hybrid — record Layer 3 (stale config) + final e2e verifica…
cailmdaley May 31, 2026
0c2103c
felt: temper MPI claims to observed-vs-inferred (canfar run history u…
cailmdaley May 31, 2026
e82de0e
felt: record SMP==MPI same-worker finding; sharpen MPI question to Ma…
cailmdaley May 31, 2026
33494d7
Propagate shapepipe_run's exit code (main must return run's value)
cailmdaley May 31, 2026
2289e6a
Add MPI world-size preflight check: fail loudly on "rank 0 of N singl…
cailmdaley May 31, 2026
8e00b8b
felt: record Layer 4 hardening (singleton guard + exit-code fix)
cailmdaley May 31, 2026
d83be28
Remove MPI singleton preflight guard from this PR (defer pending deci…
cailmdaley May 31, 2026
8a0bbe5
felt: park the singleton guard as follow-up; sharpen Martin question
cailmdaley May 31, 2026
52f0e41
docs(README): add verified candide container quickstart
cailmdaley May 31, 2026
981c8e0
docs: make README a general front door; move candide detail to contai…
cailmdaley May 31, 2026
bb48a44
Move user-facing docs to the docs-rework PR (#739)
cailmdaley May 31, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 7 additions & 8 deletions .felt/docker-uv-revert/docker-uv-revert.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ tags:
- docker
- infra
created-at: 2026-04-27T11:26:45.677512058+02:00
outcome: 'PR #719 (chore: switch Dockerfile to slim Python + uv lockfile) opened and CI-green on first try (3m31s); ready for Martin''s review. Drops conda double-install, makes pyproject SSOT + uv.lock the pinned manifest, switches WeightWatcher from sed-patched source build to Debian''s pre-patched 1.12+dfsg-3 package, adds binary smoke tests to deploy-image.yml.'
outcome: 'PR #719 (chore: switch Dockerfile to slim Python + uv lockfile) opened and CI-green on first try (3m31s); ready for review. Drops conda double-install, makes pyproject SSOT + uv.lock the pinned manifest, switches WeightWatcher from sed-patched source build to Debian''s pre-patched 1.12+dfsg-3 package, adds binary smoke tests to deploy-image.yml.'
decisions:
base:
label: Base image
rationale: Conda double-install was the actual problem; cleanest resolution is to drop conda entirely. Martin's canfar concern is satisfied as long as the slim image works on canfar.
rationale: Conda double-install was the actual problem; cleanest resolution is to drop conda entirely. The canfar deployment concern is satisfied as long as the slim image works on canfar.
default: python-slim
options:
python-slim:
Expand Down Expand Up @@ -50,15 +50,15 @@ decisions:
label: uv + pyproject + uv.lock; uv sync --frozen in Dockerfile
modernize:
label: Modernize package versions
rationale: 'We determined which versions MUST stay pinned: only ngmix (Axel''s stable_version branch — replacement is tracked separately). Everything else can move to current latest because uv resolved cleanly and CI smoke test still passes (3m42s). If a real pipeline run on canfar surfaces a numpy-2 / pandas-3 break, the fix is a targeted constraint + uv lock, not a wholesale revert.'
rationale: 'We determined which versions MUST stay pinned: only ngmix (pinned to a stable_version fork branch — replacement is tracked separately). Everything else can move to current latest because uv resolved cleanly and CI smoke test still passes (3m42s). If a real pipeline run on canfar surfaces a numpy-2 / pandas-3 break, the fix is a targeted constraint + uv lock, not a wholesale revert.'
default: stay-current
options:
stay-conservative:
label: Keep pre-v2 minimums (numpy 1.26, astropy 6.1, pandas 2.2); only bump when forced
excluded: true
excluded_reason: Drift between pyproject signal and lockfile reality; loses the chance to surface numpy-2/pandas-3 incompatibilities at PR time when CI is fast
stay-current:
label: Bump pyproject minimums to current major versions (numpy 2, astropy 7, pandas 3, galsim 2.8, mpi4py 4.1, etc.); pin ngmix to Axel's stable_version branch
label: Bump pyproject minimums to current major versions (numpy 2, astropy 7, pandas 3, galsim 2.8, mpi4py 4.1, etc.); pin ngmix to its stable_version fork branch
insights:
ci-fast:
claim: 'First CI run on PR #719 went green in 3m31s. uv installed 238 packages in 322ms — everything resolved to prebuilt wheels, no source compilation of galsim/mpi4py/python-pysap/etc. Massive speedup vs. previous build.'
Expand Down Expand Up @@ -97,11 +97,10 @@ The `--frozen` flag is the discipline mechanism: a stale lockfile cannot ship.
## Followups

- Watch CI on #719. The slim-base apt list is conjectural — galsim/mpi4py/python-pysap pull a lot of system deps and we may need to add more (`libatlas-base-dev`, `libblas-dev`, etc).
- If CI needs anything beyond what's in the apt block, that's the surface that benefits from a [[shapepipe/prs-in-flight]] note for next time.
- After this lands, [[shapepipe/prs-in-flight]] PRs #708 and #714 may need a small rebase.
- Optional: separate `Dockerfile.canfar` building on skaha if there's a concrete deployment reason. Currently conjectural — Martin floated it but we agreed slim should work on canfar.
- If CI needs anything beyond what's in the apt block, that's worth noting for next time.
- After this lands, PRs #708 and #714 may need a small rebase.
- Optional: separate `Dockerfile.canfar` building on skaha if there's a concrete deployment reason. Currently conjectural — floated as a possibility, but slim should work on canfar.

## Connections

- [[shapepipe]] — root
- [[shapepipe/prs-in-flight]] — touches the testing-scaffold xfail set and the develop-bugs PR
10 changes: 0 additions & 10 deletions .felt/fabian-coord-bug/fabian-coord-bug.md

This file was deleted.

4 changes: 2 additions & 2 deletions .felt/ngmix-update/ngmix-update.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
name: ngmix library upgrade + Lucy wrapper sync
name: ngmix library upgrade + wrapper sync
tags:
- shapepipe
- ngmix
- future
created-at: 2026-04-27T11:26:51.026191639+02:00
outcome: 'Future: replace Axel''s stable_version fork with upstream ngmix; reconcile with Lucy''s cleaned-up wrapper from her visit'
outcome: 'Replace the pinned ngmix fork (a stable_version branch carrying not-yet-upstreamed fixes) with upstream ngmix once those land; reconcile the wrapper afterward.'
---
76 changes: 0 additions & 76 deletions .felt/prs-in-flight/prs-in-flight.md

This file was deleted.

66 changes: 28 additions & 38 deletions .felt/shapepipe.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,40 @@
---
name: ShapePipe maintenance & PRs
name: ShapePipe — project knowledge & active threads
tags:
- shapepipe
- portolan
created-at: 2026-04-27T11:26:38.71538657+02:00
outcome: 'Root: collaboration with Martin on ShapePipe — PRs, infra, future ngmix and Fabian work'
outcome: 'Root of ShapePipe''s felt store: the stack division, repo conventions, and the why behind in-flight infra/cleanup threads.'
---

ShapePipe is the UNIONS shape-measurement pipeline. I'm not the primary
maintainer (that's Martin Kilbinger); my role is collaborator helping
clean up infra, surface bugs, and keep the merge queue moving while
Martin focuses on science threads.
This is the root of ShapePipe's felt store — shared notes on architecture
decisions, conventions, and in-flight work, for the team and AI agents alike.
ShapePipe is the UNIONS galaxy shape-measurement pipeline; `CLAUDE.md` covers the
build / container / CI overview, and the fibers here carry the *why*. Start here,
then follow the links.

## Working agreement with Martin
## Stack division

Surfaced over a 2026-04-27 walking conversation. Captured in
[[shapepipe/prs-in-flight]] and the per-thread fibers below.
ShapePipe **produces** shear catalogues; `sp_validation` / `cosmo_val`
**consume** and validate them; `cs_util` holds code shared across both. A concern
about *validating* catalogues belongs downstream, not in ShapePipe.

- I review and patch his PRs; he reviews mine. Bugs found during review
go to a dedicated PR rather than getting bundled into his feature
branch (per `feedback_separate_infra_prs`).
- v2.0 was merged fast (it was ready). The skaha base it brought in is
the active source of pain → see [[shapepipe/docker-uv-revert]].
- I file the issues; Claude usually drafts the PRs in my voice.
Disclosure on Claude-only review per
`feedback_claude_only_review_disclosure`.

## Active threads

- **[[shapepipe/docker-uv-revert]]** — slim Python + uv lockfile, drop conda. PR #719 (draft).
- **[[shapepipe/prs-in-flight]]** — tracking #708 (testing scaffold), #714 (develop bugs), #719 (this one).

## Future work
## Conventions specific to this repo

- **[[shapepipe/ngmix-update]]** — replace Axel's stable_version fork
with upstream ngmix; reconcile with Lucy's wrapper.
- **[[shapepipe/fabian-coord-bug]]** — port Fabian's 1-line coord
propagation fix; first need his image-sim code on github.
- **Rho-statistics are obsolete inside ShapePipe.** PSF-systematics validation
moved downstream to `sp_validation` / `cosmo_val` (via `shear_psf_leakage`);
the stile/treecorr rho code was removed in #715. But the **meanshapes /
ellipticity focal-plane plots** (`mccd_plots_runner`) are *deliberately kept* —
they are a general PSF/star-catalogue diagnostic, not rho-stats, and feed
catalogue-paper figures. Don't delete that path along with rho-stats; see
[[shapepipe/cleanup-rhostats-jobscripts]] for where the boundary actually sits.
- Run the pipeline through the container; use `python3.12` explicitly inside it.
- **ngmix** is pinned to a fork branch until fixes land upstream — don't bump
that dependency line. [[ngmix-update]] tracks the path back to upstream.

## Conventions specific to this repo
## Active threads

- Container runs through `app` (apptainer wrapper); use `python3.12`
inside the shapepipe container (see `reference_containers`).
- ShapePipe produces; `sp_validation` consumes; `cs_util` is shared (see
`project_stack_division`).
- Rho stats are obsolete here — sp_validation/cosmo_val took over (see
`project_rho_stats_obsolete`).
- Royal "we" in PR/issue voice; specific findings attributed to Claude
by name (see `feedback_writing_voice_on_cails_behalf`).
- **[[shapepipe/ci-green-on-develop]]** / **[[shapepipe/test-suite]]** — a
tiered, in-image test suite and trustworthy CI on `develop`.
- **[[docker-uv-revert]]** — slim Python base + uv lockfile, dropping conda.
- **[[shapepipe/mpi-hybrid]]** — running hybrid MPI through the container on candide.
- **[[ngmix-update]]** — replacing the pinned ngmix fork with upstream.
2 changes: 1 addition & 1 deletion .felt/shapepipe/ci-develop-trigger/ci-develop-trigger.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ just CI. Deserves its own issue; #732 doesn't touch it.

## Knock-on

[[shapepipe/prs-in-flight]]: **#729** (actions group, bumps `setup-miniconda`
**#729** (actions group, bumps `setup-miniconda`
v3→v4) hit the layer-1 failure too — confirming the action bump alone
doesn't fix the path. #729 must rebase on top of #732 once it merges before
it can go green. The smoke-test work in [[shapepipe/smoke-test-read-only]]
Expand Down
Loading