Added metrics and fixed step counter#17
Conversation
ffac24b to
0cf1cf7
Compare
Code reviewFound 1 issue:
Fix: add |
|
Follow-up: Two concerns with the global counter replacing
If actor and critic are always separate Ray actors (separate processes), concern 2 is a non-issue — each process has its own counter. But concern 1 applies regardless. |
|
Fixed resume-safe problem, and tested: Resumed run: https://wandb.ai/mbzuai-llm/Verification-Study/runs/33qdth6b The second issue is not a problem, as actor and critic are always separate Ray actors. |
…ts define_metric calls
…y once per rollout
5193a5f to
f205308
Compare
Problem
Agentic-RL W&B charts were unreadable:
plotted against rollout/step)
looks like real data, isn't
Changes
Axis alignment. Every emission of a reward/* / optimization/* / policy_shift/* / train_inference_mismatch/* metric is now co-logged with
the bare counter it's supposed to plot against — rollout_step on the rollout side, train_step on the train side. W&B picks them up via
existing define_metric registration.
Monotone train-step. Replaced rollout_id × num_steps_per_rollout + step_id with a process-global _TRAIN_STEP_COUNTER. The old formula
collapsed under dynamic batching because num_steps_per_rollout shrinks across rollouts when total tokens drop. Counter is invariant.
Panel groupings. _TRAIN_METRIC_GROUPS and _ROLLOUT_DATA_METRIC_GROUPS maps in log_utils.py route metrics into the W&B sections reward/,
optimization/, policy_shift/, train_inference_mismatch/. No emission-site changes needed when a metric is added — just update the map.
Per-domain fan-out. pg_loss/, pg_clipfrac/, ppo_kl/, train_rollout_logprob_*/ are now routed into
// so each panel shows one curve per domain.
Sparse metric aggregation. _collect_values in src/agent360/harbor/miles/generate.py no longer defaults missing keys to 0. A fall-through
loop auto-emits agent/mean for any numeric scalar agents put in agent_metrics, so terminus-2's n_episodes, summarization_count,
n_input_tokens, n_output_tokens show up automatically. Names that no agent reports (turns, tool_calls, model_query_time*) just aren't
logged — no misleading zeros.
Cleanup. Deleted src/sitecustomize.py and src/miles360/training_metrics_patches.py (432 lines of runtime monkey-patches) — that logic is
now upstream in ~/miles. Plus dropped useless reward/{correct,incorrect}/raw_reward panels (they were always exactly 0 or 1 by
construction).
Other side effects
Validation
Run (pd-hicache-l3 --scale iter) - 10 rollouts end-to-end, no hangs.
📊 https://wandb.ai/mbzuai-llm/harbor-miles/runs/215ha1cw
PR on the RL360 side: https://github.com/LLM360/RL360/pull/308