Skip to content

feat(datasets): unified create/import/add-imagery flow#17

Merged
whittenator merged 6 commits into
mainfrom
claude/datasets-annotator-improvements-lYwUI
May 30, 2026
Merged

feat(datasets): unified create/import/add-imagery flow#17
whittenator merged 6 commits into
mainfrom
claude/datasets-annotator-improvements-lYwUI

Conversation

@whittenator

Copy link
Copy Markdown
Owner

Add a 3-step "New Dataset" wizard (/datasets/new) that combines dataset
creation, image upload, and labeled-archive import in one flow, plus a
shared "Add Data" panel on the dataset detail page for ongoing ingestion
into the open (unlocked) version.

  • Extract reusable ImageUploader, ArchiveImporter, and DataSourcePanel
    components from the legacy upload/import pages.
  • Backend: add dataset_service.latest_open_version() and surface
    latest_version_id / open_version_id on the dataset detail response so
    new data lands in the correct editable version.
  • Repoint dataset list entry points to the new wizard; legacy
    upload/import/version pages remain for deep links.

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg

claude added 6 commits May 30, 2026 01:21
Add a 3-step "New Dataset" wizard (/datasets/new) that combines dataset
creation, image upload, and labeled-archive import in one flow, plus a
shared "Add Data" panel on the dataset detail page for ongoing ingestion
into the open (unlocked) version.

- Extract reusable ImageUploader, ArchiveImporter, and DataSourcePanel
  components from the legacy upload/import pages.
- Backend: add dataset_service.latest_open_version() and surface
  latest_version_id / open_version_id on the dataset detail response so
  new data lands in the correct editable version.
- Repoint dataset list entry points to the new wizard; legacy
  upload/import/version pages remain for deep links.

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
Add a detailed dataset metrics view: coverage/label-status, review
status, per-class balance with imbalance ratio and unused-class
detection, annotations-per-image, bbox area/aspect distributions, image
resolution, annotation types, and 30-day labeling velocity.

- Backend: asset_service.get_dataset_metrics() aggregates in SQL with a
  capped Python pass for box-geometry histograms; exposed at
  GET /api/datasets/{id}/metrics. Unit-tested (3 cases).
- Frontend: hand-coded SVG/HUD chart primitives (StatTile, BarChart,
  Histogram, DonutChart, Sparkline) + /datasets/:id/metrics page.
- Wire the unified create flow: routes for /datasets/new and
  /datasets/:id/metrics, "+ ADD DATA" panel and METRICS link on the
  dataset detail page, and repointed list CTAs to the new wizard.

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
The per-class / per-image / velocity aggregations select only column
expressions, so SQLAlchemy could not infer the join's left side and
raised InvalidRequestError. Anchor the join with select_from(Annotation).

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
…actor

One-click "Suggest" in the annotator runs the latest (or user-selected)
model trained on the asset's dataset and overlays predicted boxes the
annotator can accept, accept-and-edit, or reject per-suggestion or in
bulk. Accepted suggestions become normal annotations saved through the
existing bulk path; nothing is persisted until accepted.

Backend:
- suggestion_service.suggest_annotations / candidate_artifacts_for_dataset
  resolve the dataset's succeeded-run artifacts (newest first, with an
  override that must belong to the dataset), fetch image bytes via the
  SSRF-safe asset fetch, and map inference_service.predict detections to
  box suggestions. NoModelError -> 404, SuggestionError -> 400,
  InferenceError -> 502.
- POST /api/annotations/suggest and GET /api/annotations/suggest/artifacts.
- Hermetic unit tests (predict + fetch monkeypatched).

Frontend:
- Rewrote the legacy single-file Annotator.jsx as typed TS modules
  (Annotator.tsx, types.ts, useSuggestions.ts), preserving all canvas
  drawing, undo/redo, keyboard shortcuts, bulk-save semantics, and the
  data-testid hooks the Playwright specs depend on.
- Suggestion bar with model dropdown, dashed scored overlays, a
  per-suggestion accept/edit/reject list, and accept-all/reject-all.

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
- suggestion_service now loads asset image bytes from object storage via
  a MinIO-backed _load_image_bytes() (with an HTTP/SSRF-safe fallback),
  instead of mis-calling the URL-only fetch_asset_bytes with an Asset.
- Unit tests register ORM models before create_all and set the required
  ExperimentRun.owner_id, so they pass standalone. Full unit suite green
  (26 passed, 1 skipped).

https://claude.ai/code/session_01LJ7JL1pztkSpkfhEdu8ivg
@whittenator whittenator merged commit 0741a5c into main May 30, 2026
3 checks passed
@whittenator whittenator deleted the claude/datasets-annotator-improvements-lYwUI branch May 30, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants