Print a stable run-ID banner to stdout on run start/resume#119
Print a stable run-ID banner to stdout on run start/resume#119asaiacai wants to merge 3 commits into
Conversation
Emit a fixed-format line on stdout when a run starts or resumes so
external tooling can reverse-look up a run from a training process's
stdout (e.g. when a multi-node job crashes and the operator only has the
logs):
pluto: run LV3-12 started (external_id=dhyecrvx)
Previously the only run-start output was a logger.info line that (a) went
to stderr via the logging StreamHandler, (b) printed the numeric run ID
rather than the LV3-12 display ID, and (c) wasn't a stable greppable
format. The server's create/resume response already returns displayId,
so this captures it (settings._display_id) and prints a plain stdout
line independent of the logging system. external_id is the sqid slug
parsed from the run URL.
Adds tests/test_run_banner.py covering stdout routing, the started/
resumed verbs, the consumer regex, trailing-slash URLs, and the
no-display-id / no-url fallbacks.
There was a problem hiding this comment.
Code Review
This pull request introduces a stable, greppable run banner printed to stdout when a run starts or resumes, allowing external tooling to reverse-look up a run. It retrieves a _display_id from the server response, parses an external_id from the run URL, and adds comprehensive unit tests. The review feedback correctly points out a potential bug where a host-only URL would incorrectly parse the hostname as the external_id due to string splitting, and suggests a more robust parsing method using urllib.parse.urlparse along with an additional test case to cover this scenario.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Use urllib.parse.urlparse to extract the path before taking the last segment, so a host-only url_view (e.g. https://pluto.trainy.ai with no run slug) omits external_id instead of falling back to the hostname. Adds a regression test for the host-only URL case. Addresses review feedback on PR #119.
The previous run failed only on test_e2e_metrics_logged due to a 503 from pluto-api.trainy.ai mid-run (the e2e suite hits the live server); not related to this change. Empty commit to re-run the matrix.
What
Emit a fixed-format line to stdout when a run starts or resumes:
(
resumedis used for the resume path.)Why
This comes from the linum-v3 feedback (item #7). When a multi-node training job crashes, the operator often only has the process logs and doesn't remember the run ID. Their
/resume-crashed-runskill reverse-looks up the run by grepping the trainer's stdout — today it greps W&B's banner (wandb: setting up run …) and wants the equivalent for Pluto:The only run-start output we emitted was a
logger.infoline (pluto/op.py), which:StreamHandlerdefaults tosys.stderr), but the consumer greps stdout;151299), not theLV3-12display ID;disable_console/ notebook handler stripping).How
No server change needed — the
POST /api/runs/create(and resume) response already returnsdisplayId(verified inpluto-serverweb/server/routes/runs-openapi.ts). This:displayIdfrom the response intosettings._display_id;Op._print_run_banner(verb), a plainprint(..., flush=True)to stdout (deliberately independent of the logging system so it can't be suppressed and always lands on stdout);external_idas the sqid slug — the last path segment of the run URL (runUrlends insqidEncode(run.id)).The existing
logger.infolines are left unchanged.Tests
tests/test_run_banner.py(6 cases, all passing):started/resumedverbs;displayId→ silent;external_idomitted.ruff check/ruff format --checkclean on changed files; existingTestNoopRunStatusOp-construction tests still pass.Notes / scope
This is the in-scope, client-side slice of the linum-v3 notes. The other items are either web-frontend (#1, 2, 9–14) or MCP/backend (#3, 8) and live in other repos. #5 (GPU metrics) is already emitted by the SDK (
pluto/sys.py); #6 (image captions) and #4 (string-valuedcheckpoint/*metrics) also have client-side slices inpluto/compat/wandb.pythat could be follow-ups if wanted.https://claude.ai/code/session_01DEs8exGc8X6WqmLkjbqEi2
Generated by Claude Code
Note
Low Risk
Client-only observability output with no changes to auth, uploads, or server APIs beyond reading an existing response field.
Overview
Adds a greppable stdout banner when a run starts or resumes, so operators can recover the Pluto display ID (e.g.
LV3-12) from trainer logs—similar to W&B’s run banner.After create/resume, the client stores
displayIdfrom the API response onSettings._display_idand calls newOp._print_run_banner, whichprints to stdout (not the logger):pluto: run <display_id> started|resumedwith optionalexternal_idparsed from the last URL path segment. MissingdisplayIdprints nothing; host-only URLs omitexternal_id. Existinglogger.inforun lines are unchanged.tests/test_run_banner.pycovers stdout vs stderr, verbs, regex compatibility, URL edge cases, and silent/no-external-id behavior.Reviewed by Cursor Bugbot for commit 43bcbbf. Configure here.