feat(datasets): unified create/import/add-imagery flow by whittenator · Pull Request #17 · whittenator/VisionForge

whittenator · 2026-05-30T11:20:54Z

Add a 3-step "New Dataset" wizard (/datasets/new) that combines dataset
creation, image upload, and labeled-archive import in one flow, plus a
shared "Add Data" panel on the dataset detail page for ongoing ingestion
into the open (unlocked) version.

Extract reusable ImageUploader, ArchiveImporter, and DataSourcePanel
components from the legacy upload/import pages.
Backend: add dataset_service.latest_open_version() and surface
latest_version_id / open_version_id on the dataset detail response so
new data lands in the correct editable version.
Repoint dataset list entry points to the new wizard; legacy
upload/import/version pages remain for deep links.

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

Add a 3-step "New Dataset" wizard (/datasets/new) that combines dataset creation, image upload, and labeled-archive import in one flow, plus a shared "Add Data" panel on the dataset detail page for ongoing ingestion into the open (unlocked) version. - Extract reusable ImageUploader, ArchiveImporter, and DataSourcePanel components from the legacy upload/import pages. - Backend: add dataset_service.latest_open_version() and surface latest_version_id / open_version_id on the dataset detail response so new data lands in the correct editable version. - Repoint dataset list entry points to the new wizard; legacy upload/import/version pages remain for deep links. https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

Add a detailed dataset metrics view: coverage/label-status, review status, per-class balance with imbalance ratio and unused-class detection, annotations-per-image, bbox area/aspect distributions, image resolution, annotation types, and 30-day labeling velocity. - Backend: asset_service.get_dataset_metrics() aggregates in SQL with a capped Python pass for box-geometry histograms; exposed at GET /api/datasets/{id}/metrics. Unit-tested (3 cases). - Frontend: hand-coded SVG/HUD chart primitives (StatTile, BarChart, Histogram, DonutChart, Sparkline) + /datasets/:id/metrics page. - Wire the unified create flow: routes for /datasets/new and /datasets/:id/metrics, "+ ADD DATA" panel and METRICS link on the dataset detail page, and repointed list CTAs to the new wizard. https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

The per-class / per-image / velocity aggregations select only column expressions, so SQLAlchemy could not infer the join's left side and raised InvalidRequestError. Anchor the join with select_from(Annotation). https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

…actor One-click "Suggest" in the annotator runs the latest (or user-selected) model trained on the asset's dataset and overlays predicted boxes the annotator can accept, accept-and-edit, or reject per-suggestion or in bulk. Accepted suggestions become normal annotations saved through the existing bulk path; nothing is persisted until accepted. Backend: - suggestion_service.suggest_annotations / candidate_artifacts_for_dataset resolve the dataset's succeeded-run artifacts (newest first, with an override that must belong to the dataset), fetch image bytes via the SSRF-safe asset fetch, and map inference_service.predict detections to box suggestions. NoModelError -> 404, SuggestionError -> 400, InferenceError -> 502. - POST /api/annotations/suggest and GET /api/annotations/suggest/artifacts. - Hermetic unit tests (predict + fetch monkeypatched). Frontend: - Rewrote the legacy single-file Annotator.jsx as typed TS modules (Annotator.tsx, types.ts, useSuggestions.ts), preserving all canvas drawing, undo/redo, keyboard shortcuts, bulk-save semantics, and the data-testid hooks the Playwright specs depend on. - Suggestion bar with model dropdown, dashed scored overlays, a per-suggestion accept/edit/reject list, and accept-all/reject-all. https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

- suggestion_service now loads asset image bytes from object storage via a MinIO-backed _load_image_bytes() (with an HTTP/SSRF-safe fallback), instead of mis-calling the URL-only fetch_asset_bytes with an Asset. - Unit tests register ORM models before create_all and set the required ExperimentRun.owner_id, so they pass standalone. Full unit suite green (26 passed, 1 skipped). https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

claude added 6 commits May 30, 2026 01:21

style(annotate): fix import ordering in suggestion_service

d6d6f67

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

whittenator merged commit 0741a5c into main May 30, 2026
3 checks passed

whittenator deleted the claude/datasets-annotator-improvements-lYwUI branch May 30, 2026 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datasets): unified create/import/add-imagery flow#17

feat(datasets): unified create/import/add-imagery flow#17
whittenator merged 6 commits into
mainfrom
claude/datasets-annotator-improvements-lYwUI

whittenator commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

whittenator commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants