Seamless training UX: configurable persisted splits, full hyperparameter/augmentation control, rich metric visualization, one-click evaluation by whittenator · Pull Request #18 · whittenator/VisionForge

whittenator · 2026-05-30T19:29:42Z

Why

The training flow worked end-to-end but had four seams that made it neither fully controllable nor truly intuitive:

Splits were inconsistent and invisible. train_task hardcoded a round-robin split (split = "val" if i % 5 == 4 else "train") and never held out a test set, while evaluation read asset.meta_data.split. So the "test set" you'd evaluate on was never actually unseen during training, and nothing was persisted to visualize.
Hyperparameters/augmentations were a subset. Only ~11 aug knobs and a handful of HPs were reachable; optimizer, patience, cos_lr, close_mosaic, dropout, label smoothing, loss gains, perspective, copy_paste, erasing, etc. were unreachable.
Completed-run metrics barely rendered. Ultralytics emits keys like metrics/mAP50(B), train/box_loss, val/box_loss, lr/pg0 — but the frontend looked for clean keys (mAP50, box_loss), so most lines never rendered, and losses + mAP were crushed onto one shared 0–1 axis. Native Ultralytics plots (PR curve, confusion matrix, results grid) were discarded.
Test-set evaluation wasn't discoverable from a completed run.

What changed

Backend

New services/split_service.py: deterministic, seeded, optionally class-stratified split assignment persisted into the existing asset.meta_data.split; hash-based fallback so runs are always reproducible.
jobs/tasks/training.py: honors the persisted split and holds out test assets entirely; ULTRALYTICS_TRAIN_ARGS allow-list forwards every tunable knob with plots=True; _normalize_metrics maps Ultralytics keys → clean charted keys; persists {epochs, split, summary, plots} in metrics_json and uploads plot PNGs to MinIO.
New API: GET/POST /api/datasets/{id}/versions/{vid}/split, ?split= filter on the asset list, /runs/{id}/metrics now returns summary/plots/split (+ presigned plot URLs), /runs/{id}/plots/{name} stream; run detail exposes artifacts.
services/storage.py: put_bytes / get_bytes helpers.

Frontend

Config-driven, grouped training form (Core / Optimizer & Schedule / Regularization & Loss / Augmentation) with per-section reset — adding a knob is a one-line change.
New SplitPanel with ratio/seed/stratify controls, a stacked split bar, and per-class-per-split breakdown; the split is persisted on launch so training trains on exactly what you see.
Rebuilt run detail: separate per-unit charts (loss train vs val, mAP50/mAP50-95, P/R, LR), summary stat tiles, native-plot gallery with click-to-zoom, the split bar, and a one-click Run Evaluation (prefills the eval form with the model + version, split=test) plus an "Evaluations of this model" list.

Notes

No DB migration — everything reuses existing JSON/Text columns (asset.meta_data, experiment_runs.metrics_json / artifacts).
Gold-standard CV metrics chosen for visualization: train/val box·cls·dfl losses, mAP@50 and mAP@50-95, precision/recall/F1, LR schedule, plus the native PR/P/R/F1 curves and confusion matrix.

Verification

ruff / black / isort clean on all changed backend files.
Backend unit tests pass: new tests/unit/test_split_service.py (ratios, determinism, slice-sum, persistence, reproducibility), plus existing asset and training-service suites.
tsc and prettier clean on the three changed frontend files. (The 9 remaining tsc errors are pre-existing on main in upload.tsx / projects/[projectId]/index.tsx and are unrelated to this PR.)
Specs added under specs/training-ux/.

⚠️ The metric-rendering and plot-upload paths fully exercise only against a real Ultralytics run + MinIO, which couldn't run in CI here. The logic is unit-covered and import-safe; a 1-epoch yolov8n smoke test on a small dataset (per the spec's verification section) is the recommended final check before merge.

https://claude.ai/code/session_01DWbceFZLMEB9F76zTJNhrp

Generated by Claude Code

…h, richer metrics capture

…sisted splits UI, rich metrics + native plots, one-click eval - Frontend: grouped config-driven hyperparameter/augmentation form, SplitPanel with ratio/seed/stratify + per-class breakdown, multi-panel run-detail charts (loss/mAP/PR/LR), summary tiles, native plot gallery, Run Evaluation button - Backend: split GET/POST endpoints + ?split= asset filter, metric-key normalization, summary/plots/split in metrics_json, plot streaming endpoint, run detail exposes artifacts - Tests: split_service unit coverage; specs/training-ux docs

claude added 2 commits May 30, 2026 13:25

feat(training): persisted configurable splits, full HP/aug passthroug…

12bbc79

…h, richer metrics capture

whittenator merged commit bbdbb7b into main May 30, 2026
3 checks passed

whittenator deleted the claude/training-ux-analysis-RL7V1 branch May 30, 2026 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seamless training UX: configurable persisted splits, full hyperparameter/augmentation control, rich metric visualization, one-click evaluation#18

Seamless training UX: configurable persisted splits, full hyperparameter/augmentation control, rich metric visualization, one-click evaluation#18
whittenator merged 2 commits into
mainfrom
claude/training-ux-analysis-RL7V1

whittenator commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

whittenator commented May 30, 2026

Why

What changed

Notes

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants