Skip to content

Seamless training UX: configurable persisted splits, full hyperparameter/augmentation control, rich metric visualization, one-click evaluation#18

Merged
whittenator merged 2 commits into
mainfrom
claude/training-ux-analysis-RL7V1
May 30, 2026
Merged

Seamless training UX: configurable persisted splits, full hyperparameter/augmentation control, rich metric visualization, one-click evaluation#18
whittenator merged 2 commits into
mainfrom
claude/training-ux-analysis-RL7V1

Conversation

@whittenator

Copy link
Copy Markdown
Owner

Why

The training flow worked end-to-end but had four seams that made it neither fully controllable nor truly intuitive:

  1. Splits were inconsistent and invisible. train_task hardcoded a round-robin split (split = "val" if i % 5 == 4 else "train") and never held out a test set, while evaluation read asset.meta_data.split. So the "test set" you'd evaluate on was never actually unseen during training, and nothing was persisted to visualize.
  2. Hyperparameters/augmentations were a subset. Only ~11 aug knobs and a handful of HPs were reachable; optimizer, patience, cos_lr, close_mosaic, dropout, label smoothing, loss gains, perspective, copy_paste, erasing, etc. were unreachable.
  3. Completed-run metrics barely rendered. Ultralytics emits keys like metrics/mAP50(B), train/box_loss, val/box_loss, lr/pg0 — but the frontend looked for clean keys (mAP50, box_loss), so most lines never rendered, and losses + mAP were crushed onto one shared 0–1 axis. Native Ultralytics plots (PR curve, confusion matrix, results grid) were discarded.
  4. Test-set evaluation wasn't discoverable from a completed run.

What changed

Backend

  • New services/split_service.py: deterministic, seeded, optionally class-stratified split assignment persisted into the existing asset.meta_data.split; hash-based fallback so runs are always reproducible.
  • jobs/tasks/training.py: honors the persisted split and holds out test assets entirely; ULTRALYTICS_TRAIN_ARGS allow-list forwards every tunable knob with plots=True; _normalize_metrics maps Ultralytics keys → clean charted keys; persists {epochs, split, summary, plots} in metrics_json and uploads plot PNGs to MinIO.
  • New API: GET/POST /api/datasets/{id}/versions/{vid}/split, ?split= filter on the asset list, /runs/{id}/metrics now returns summary/plots/split (+ presigned plot URLs), /runs/{id}/plots/{name} stream; run detail exposes artifacts.
  • services/storage.py: put_bytes / get_bytes helpers.

Frontend

  • Config-driven, grouped training form (Core / Optimizer & Schedule / Regularization & Loss / Augmentation) with per-section reset — adding a knob is a one-line change.
  • New SplitPanel with ratio/seed/stratify controls, a stacked split bar, and per-class-per-split breakdown; the split is persisted on launch so training trains on exactly what you see.
  • Rebuilt run detail: separate per-unit charts (loss train vs val, mAP50/mAP50-95, P/R, LR), summary stat tiles, native-plot gallery with click-to-zoom, the split bar, and a one-click Run Evaluation (prefills the eval form with the model + version, split=test) plus an "Evaluations of this model" list.

Notes

  • No DB migration — everything reuses existing JSON/Text columns (asset.meta_data, experiment_runs.metrics_json / artifacts).
  • Gold-standard CV metrics chosen for visualization: train/val box·cls·dfl losses, mAP@50 and mAP@50-95, precision/recall/F1, LR schedule, plus the native PR/P/R/F1 curves and confusion matrix.

Verification

  • ruff / black / isort clean on all changed backend files.
  • Backend unit tests pass: new tests/unit/test_split_service.py (ratios, determinism, slice-sum, persistence, reproducibility), plus existing asset and training-service suites.
  • tsc and prettier clean on the three changed frontend files. (The 9 remaining tsc errors are pre-existing on main in upload.tsx / projects/[projectId]/index.tsx and are unrelated to this PR.)
  • Specs added under specs/training-ux/.

⚠️ The metric-rendering and plot-upload paths fully exercise only against a real Ultralytics run + MinIO, which couldn't run in CI here. The logic is unit-covered and import-safe; a 1-epoch yolov8n smoke test on a small dataset (per the spec's verification section) is the recommended final check before merge.

https://claude.ai/code/session_01DWbceFZLMEB9F76zTJNhrp


Generated by Claude Code

claude added 2 commits May 30, 2026 13:25
…sisted splits UI, rich metrics + native plots, one-click eval

- Frontend: grouped config-driven hyperparameter/augmentation form, SplitPanel
  with ratio/seed/stratify + per-class breakdown, multi-panel run-detail charts
  (loss/mAP/PR/LR), summary tiles, native plot gallery, Run Evaluation button
- Backend: split GET/POST endpoints + ?split= asset filter, metric-key
  normalization, summary/plots/split in metrics_json, plot streaming endpoint,
  run detail exposes artifacts
- Tests: split_service unit coverage; specs/training-ux docs
@whittenator whittenator merged commit bbdbb7b into main May 30, 2026
3 checks passed
@whittenator whittenator deleted the claude/training-ux-analysis-RL7V1 branch May 30, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants