feat(datasets): unified create/import/add-imagery flow#17
Merged
Conversation
Add a 3-step "New Dataset" wizard (/datasets/new) that combines dataset creation, image upload, and labeled-archive import in one flow, plus a shared "Add Data" panel on the dataset detail page for ongoing ingestion into the open (unlocked) version. - Extract reusable ImageUploader, ArchiveImporter, and DataSourcePanel components from the legacy upload/import pages. - Backend: add dataset_service.latest_open_version() and surface latest_version_id / open_version_id on the dataset detail response so new data lands in the correct editable version. - Repoint dataset list entry points to the new wizard; legacy upload/import/version pages remain for deep links. https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
Add a detailed dataset metrics view: coverage/label-status, review
status, per-class balance with imbalance ratio and unused-class
detection, annotations-per-image, bbox area/aspect distributions, image
resolution, annotation types, and 30-day labeling velocity.
- Backend: asset_service.get_dataset_metrics() aggregates in SQL with a
capped Python pass for box-geometry histograms; exposed at
GET /api/datasets/{id}/metrics. Unit-tested (3 cases).
- Frontend: hand-coded SVG/HUD chart primitives (StatTile, BarChart,
Histogram, DonutChart, Sparkline) + /datasets/:id/metrics page.
- Wire the unified create flow: routes for /datasets/new and
/datasets/:id/metrics, "+ ADD DATA" panel and METRICS link on the
dataset detail page, and repointed list CTAs to the new wizard.
https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
The per-class / per-image / velocity aggregations select only column expressions, so SQLAlchemy could not infer the join's left side and raised InvalidRequestError. Anchor the join with select_from(Annotation). https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
…actor One-click "Suggest" in the annotator runs the latest (or user-selected) model trained on the asset's dataset and overlays predicted boxes the annotator can accept, accept-and-edit, or reject per-suggestion or in bulk. Accepted suggestions become normal annotations saved through the existing bulk path; nothing is persisted until accepted. Backend: - suggestion_service.suggest_annotations / candidate_artifacts_for_dataset resolve the dataset's succeeded-run artifacts (newest first, with an override that must belong to the dataset), fetch image bytes via the SSRF-safe asset fetch, and map inference_service.predict detections to box suggestions. NoModelError -> 404, SuggestionError -> 400, InferenceError -> 502. - POST /api/annotations/suggest and GET /api/annotations/suggest/artifacts. - Hermetic unit tests (predict + fetch monkeypatched). Frontend: - Rewrote the legacy single-file Annotator.jsx as typed TS modules (Annotator.tsx, types.ts, useSuggestions.ts), preserving all canvas drawing, undo/redo, keyboard shortcuts, bulk-save semantics, and the data-testid hooks the Playwright specs depend on. - Suggestion bar with model dropdown, dashed scored overlays, a per-suggestion accept/edit/reject list, and accept-all/reject-all. https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
- suggestion_service now loads asset image bytes from object storage via a MinIO-backed _load_image_bytes() (with an HTTP/SSRF-safe fallback), instead of mis-calling the URL-only fetch_asset_bytes with an Asset. - Unit tests register ORM models before create_all and set the required ExperimentRun.owner_id, so they pass standalone. Full unit suite green (26 passed, 1 skipped). https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a 3-step "New Dataset" wizard (/datasets/new) that combines dataset
creation, image upload, and labeled-archive import in one flow, plus a
shared "Add Data" panel on the dataset detail page for ongoing ingestion
into the open (unlocked) version.
components from the legacy upload/import pages.
latest_version_id / open_version_id on the dataset detail response so
new data lands in the correct editable version.
upload/import/version pages remain for deep links.
https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg