Print a stable run-ID banner to stdout on run start/resume by asaiacai · Pull Request #119 · Trainy-ai/pluto

asaiacai · 2026-06-04T21:01:22Z

What

Emit a fixed-format line to stdout when a run starts or resumes:

pluto: run LV3-12 started (external_id=dhyecrvx)

(resumed is used for the resume path.)

Why

This comes from the linum-v3 feedback (item #7). When a multi-node training job crashes, the operator often only has the process logs and doesn't remember the run ID. Their /resume-crashed-run skill reverse-looks up the run by grepping the trainer's stdout — today it greps W&B's banner (wandb: setting up run …) and wants the equivalent for Pluto:

m = re.search(r"pluto:\s*run\s+(LV3-\d+)", line)

The only run-start output we emitted was a logger.info line (pluto/op.py), which:

goes to stderr (the logging StreamHandler defaults to sys.stderr), but the consumer greps stdout;
prints the numeric run ID (e.g. 151299), not the LV3-12 display ID;
isn't a stable, greppable format (it's prose behind the logging system, subject to log level / disable_console / notebook handler stripping).

How

No server change needed — the POST /api/runs/create (and resume) response already returns displayId (verified in pluto-server web/server/routes/runs-openapi.ts). This:

captures displayId from the response into settings._display_id;
adds Op._print_run_banner(verb), a plain print(..., flush=True) to stdout (deliberately independent of the logging system so it can't be suppressed and always lands on stdout);
derives external_id as the sqid slug — the last path segment of the run URL (runUrl ends in sqidEncode(run.id)).

The existing logger.info lines are left unchanged.

Tests

tests/test_run_banner.py (6 cases, all passing):

banner goes to stdout, not stderr;
started / resumed verbs;
output satisfies the documented consumer regex;
trailing-slash URL handling;
no displayId → silent;
no URL → external_id omitted.

$ python -m pytest tests/test_run_banner.py
6 passed

ruff check / ruff format --check clean on changed files; existing TestNoopRunStatus Op-construction tests still pass.

Notes / scope

This is the in-scope, client-side slice of the linum-v3 notes. The other items are either web-frontend (#1, 2, 9–14) or MCP/backend (#3, 8) and live in other repos. #5 (GPU metrics) is already emitted by the SDK (pluto/sys.py); #6 (image captions) and #4 (string-valued checkpoint/* metrics) also have client-side slices in pluto/compat/wandb.py that could be follow-ups if wanted.

https://claude.ai/code/session_01DEs8exGc8X6WqmLkjbqEi2

Generated by Claude Code

Note

Low Risk
Client-only observability output with no changes to auth, uploads, or server APIs beyond reading an existing response field.

Overview
Adds a greppable stdout banner when a run starts or resumes, so operators can recover the Pluto display ID (e.g. LV3-12) from trainer logs—similar to W&B’s run banner.

After create/resume, the client stores displayId from the API response on Settings._display_id and calls new Op._print_run_banner, which prints to stdout (not the logger): pluto: run <display_id> started|resumed with optional external_id parsed from the last URL path segment. Missing displayId prints nothing; host-only URLs omit external_id. Existing logger.info run lines are unchanged.

tests/test_run_banner.py covers stdout vs stderr, verbs, regex compatibility, URL edge cases, and silent/no-external-id behavior.

^{Reviewed by Cursor Bugbot for commit 43bcbbf. Configure here.}

Emit a fixed-format line on stdout when a run starts or resumes so external tooling can reverse-look up a run from a training process's stdout (e.g. when a multi-node job crashes and the operator only has the logs): pluto: run LV3-12 started (external_id=dhyecrvx) Previously the only run-start output was a logger.info line that (a) went to stderr via the logging StreamHandler, (b) printed the numeric run ID rather than the LV3-12 display ID, and (c) wasn't a stable greppable format. The server's create/resume response already returns displayId, so this captures it (settings._display_id) and prints a plain stdout line independent of the logging system. external_id is the sqid slug parsed from the run URL. Adds tests/test_run_banner.py covering stdout routing, the started/ resumed verbs, the consumer regex, trailing-slash URLs, and the no-display-id / no-url fallbacks.

gemini-code-assist

Code Review

This pull request introduces a stable, greppable run banner printed to stdout when a run starts or resumes, allowing external tooling to reverse-look up a run. It retrieves a _display_id from the server response, parses an external_id from the run URL, and adds comprehensive unit tests. The review feedback correctly points out a potential bug where a host-only URL would incorrectly parse the hostname as the external_id due to string splitting, and suggests a more robust parsing method using urllib.parse.urlparse along with an additional test case to cover this scenario.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Use urllib.parse.urlparse to extract the path before taking the last segment, so a host-only url_view (e.g. https://pluto.trainy.ai with no run slug) omits external_id instead of falling back to the hostname. Adds a regression test for the host-only URL case. Addresses review feedback on PR #119.

The previous run failed only on test_e2e_metrics_logged due to a 503 from pluto-api.trainy.ai mid-run (the e2e suite hits the live server); not related to this change. Empty commit to re-run the matrix.

asaiacai temporarily deployed to integration June 4, 2026 21:01 — with GitHub Actions Inactive

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread pluto/op.py Outdated

Comment thread tests/test_run_banner.py

asaiacai temporarily deployed to integration June 4, 2026 21:09 — with GitHub Actions Inactive

asaiacai temporarily deployed to integration June 5, 2026 23:57 — with GitHub Actions Inactive

asaiacai had a problem deploying to integration June 5, 2026 23:57 — with GitHub Actions Failure

asaiacai temporarily deployed to integration June 5, 2026 23:57 — with GitHub Actions Inactive

ci: re-trigger CI (transient 503 from production server in e2e test)

88a9606

The previous run failed only on test_e2e_metrics_logged due to a 503 from pluto-api.trainy.ai mid-run (the e2e suite hits the live server); not related to this change. Empty commit to re-run the matrix.

asaiacai temporarily deployed to integration June 6, 2026 00:06 — with GitHub Actions Inactive

asaiacai temporarily deployed to integration June 6, 2026 00:13 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print a stable run-ID banner to stdout on run start/resume#119

Print a stable run-ID banner to stdout on run start/resume#119
asaiacai wants to merge 3 commits into
mainfrom
claude/magical-lovelace-Bn0Ay

asaiacai commented Jun 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asaiacai commented Jun 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Tests

Notes / scope

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asaiacai commented Jun 4, 2026 •

edited by cursor Bot

Loading