Added metrics and fixed step counter by matthewyryang · Pull Request #17 · LLM360/miles

matthewyryang · 2026-05-07T00:48:51Z

Problem

Agentic-RL W&B charts were unreadable:

Metrics in different panels didn't share a step axis (e.g., reward/returns plotted against W&B's auto _step while reward/rewards
plotted against rollout/step)
The cumulative train-step number was nonmonotonic under dynamic batching (collided / went backwards)
Per-domain training metrics couldn't be split (only one global pg_loss)
The agent/* aggregator default-zeroed missing keys, so e.g. agent/turns_sum=0 was logged even when the agent never reported turns —
looks like real data, isn't
Monkey-patches in ~/rl360/src/miles360/training_metrics_patches.py (432 lines) duplicated logic that now belongs in miles

Changes

Axis alignment. Every emission of a reward/* / optimization/* / policy_shift/* / train_inference_mismatch/* metric is now co-logged with
the bare counter it's supposed to plot against — rollout_step on the rollout side, train_step on the train side. W&B picks them up via
existing define_metric registration.

Monotone train-step. Replaced rollout_id × num_steps_per_rollout + step_id with a process-global _TRAIN_STEP_COUNTER. The old formula
collapsed under dynamic batching because num_steps_per_rollout shrinks across rollouts when total tokens drop. Counter is invariant.

Panel groupings. _TRAIN_METRIC_GROUPS and _ROLLOUT_DATA_METRIC_GROUPS maps in log_utils.py route metrics into the W&B sections reward/,
optimization/, policy_shift/, train_inference_mismatch/. No emission-site changes needed when a metric is added — just update the map.

Per-domain fan-out. pg_loss/, pg_clipfrac/, ppo_kl/, train_rollout_logprob_*/ are now routed into
// so each panel shows one curve per domain.

Sparse metric aggregation. _collect_values in src/agent360/harbor/miles/generate.py no longer defaults missing keys to 0. A fall-through
loop auto-emits agent/mean for any numeric scalar agents put in agent_metrics, so terminus-2's n_episodes, summarization_count,
n_input_tokens, n_output_tokens show up automatically. Names that no agent reports (turns, tool_calls, model_query_time*) just aren't
logged — no misleading zeros.

Cleanup. Deleted src/sitecustomize.py and src/miles360/training_metrics_patches.py (432 lines of runtime monkey-patches) — that logic is
now upstream in ~/miles. Plus dropped useless reward/{correct,incorrect}/raw_reward panels (they were always exactly 0 or 1 by
construction).

Other side effects

agent.yaml: generate_multi_samples: false, rollout_top_p: 0.95 — match the working baseline.
scale.yaml iter: max_response_len: 4096, n_samples_per_prompt: 8 — survives all-AgentError rollouts statistically, ~2× faster per turn.

Validation

Run (pd-hicache-l3 --scale iter) - 10 rollouts end-to-end, no hangs.

📊 https://wandb.ai/mbzuai-llm/harbor-miles/runs/215ha1cw

PR on the RL360 side: https://github.com/LLM360/RL360/pull/308

nightlessbaron · 2026-05-18T09:09:04Z

Code review

Found 1 issue:

New define_metric calls are missing step_metric, so samples_seen, train_step, and rollout_step will be plotted against W&B's internal auto-_step instead of the rollout axis — directly contradicting the PR's stated goal of axis alignment. Every other metric registered in the same function uses an explicit step_metric (e.g. wandb.define_metric("perf/*", step_metric="rollout/step")).

https://github.com/LLM360/miles/blob/700da89d13803cbc399247e0348db389ca5298d8/miles/utils/wandb_utils.py#L179-L183

Fix: add step_metric="rollout/step" to all three calls (or step_metric="train/step" for train_step if it should track the training axis).

nightlessbaron · 2026-05-18T18:28:44Z

Follow-up: _TRAIN_STEP_COUNTER correctness

Two concerns with the global counter replacing rollout_id * num_steps_per_rollout + step_id:

Not resume-safe. The counter is module-level and resets to 0 on process restart. On checkpoint resume, train/step will dip back to 0 in W&B rather than continuing from where it left off. The old formula was deterministic from rollout_id and step_id, so it was implicitly resume-safe.
Shared across roles if actor and critic run in the same process. Both call log_train_step with different role= values but the same counter. If co-located, actor logs at steps 0, 2, 4… and critic at 1, 3, 5… — so they never share an x-axis position in W&B, making actor/critic metric comparison harder. The old formula gave both the same value for the same (rollout_id, step_id) pair.

https://github.com/LLM360/miles/blob/38677e04a6cd63ce406a9ef0b7f98e2f6f54a0a/miles/backends/training_utils/log_utils.py#L513-L515

If actor and critic are always separate Ray actors (separate processes), concern 2 is a non-issue — each process has its own counter. But concern 1 applies regardless.

matthewyryang · 2026-05-18T23:40:08Z

Fixed resume-safe problem, and tested:

Resumed run: https://wandb.ai/mbzuai-llm/Verification-Study/runs/33qdth6b
from: https://wandb.ai/mbzuai-llm/Verification-Study/runs/f7kbhjcy

The second issue is not a problem, as actor and critic are always separate Ray actors.

matthewyryang

works

…ts define_metric calls

…y once per rollout

matthewyryang force-pushed the steps-metrics branch from ffac24b to 0cf1cf7 Compare May 13, 2026 22:45

matthewyryang marked this pull request as ready for review May 14, 2026 20:13

matthewyryang requested a review from a team as a code owner May 14, 2026 20:13

matthewyryang marked this pull request as draft May 14, 2026 23:01

matthewyryang marked this pull request as ready for review May 14, 2026 23:03

matthewyryang assigned matthewyryang, DavidBellamy and nightlessbaron and unassigned matthewyryang May 14, 2026

DavidBellamy removed their assignment May 18, 2026

matthewyryang commented May 18, 2026

View reviewed changes

Matthew Yang and others added 8 commits May 20, 2026 05:38

metrics changes

0bd87fc

added change

b893fe2

removed unnecessary files

61a9c54

added

6c27a8e

fix: add step_metric to bare counter and mirrored reward/response_sta…

3750cbb

…ts define_metric calls

fix: add step_metric to define_metric calls and emit rollout_step onl…

6f68cea

…y once per rollout

fixed resume

506769d

simplified code

f205308

nightlessbaron force-pushed the steps-metrics branch from 5193a5f to f205308 Compare May 20, 2026 05:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added metrics and fixed step counter#17

Added metrics and fixed step counter#17
matthewyryang wants to merge 8 commits into
prodfrom
steps-metrics

matthewyryang commented May 7, 2026 •

edited

Loading

Uh oh!

nightlessbaron commented May 18, 2026 •

edited

Loading

Uh oh!

nightlessbaron commented May 18, 2026 •

edited

Loading

Uh oh!

matthewyryang commented May 18, 2026

Uh oh!

matthewyryang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matthewyryang commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Validation

Uh oh!

nightlessbaron commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review

Uh oh!

nightlessbaron commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewyryang commented May 18, 2026

Uh oh!

matthewyryang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matthewyryang commented May 7, 2026 •

edited

Loading

nightlessbaron commented May 18, 2026 •

edited

Loading

nightlessbaron commented May 18, 2026 •

edited

Loading