Skip to content

perf(track): bulk metadata upserts + drain pacing + single-pass scrub#129

Open
trentshaines wants to merge 2 commits into
mainfrom
fleetcode-upload-revamp
Open

perf(track): bulk metadata upserts + drain pacing + single-pass scrub#129
trentshaines wants to merge 2 commits into
mainfrom
fleetcode-upload-revamp

Conversation

@trentshaines

Copy link
Copy Markdown
Contributor

Summary

Three SDK-side changes that together remove the dispatch ceiling on flt track uploads. The companion orchestrator PR adds the POST /v1/track/sessions/bulk endpoint this client targets.

  • Drain pacing: daemon main loop now tight-loops drain_once until the queue is empty rather than doing one drain per 10s sleep. The 8-thread upload pool is now actually saturable.
  • Drainer batch size: 32 → 100 to match the server's /v1/track/upload-urls cap (which the existing comment already acknowledged).
  • Bulk metadata upserts: workers buffer per-upload metadata; the main loop flushes via a new POST /v1/track/sessions/bulk each iteration and on graceful shutdown. Halves per-file network round trips.
  • Scrubber single-pass for uploads: scrub_bytes now uses a new scrub_text that skips the hits-enumeration pass (which the upload path discarded anyway). scrub() retains the full API for flt track inspect-style callers.

Failure semantics:

  • Failed bulk flush leaves items in the buffer for the next flush; reconcile re-queues anything lost on crash.
  • Daemon uses the bulk endpoint exclusively for new metadata writes — needs the orchestrator side merged first.

Test plan

  • All 525 track tests pass (pytest tests/track)
  • Full SDK suite passes (pytest) — 735 passed, 11 skipped
  • Added tests: bulk client serializes + chunks per server cap; daemon coalesces many uploads into one bulk request; daemon retains items on flush failure and retries successfully; scrub_text matches scrub().text on diverse inputs; _drain_queue tight-loops until claimed=0
  • Smoke-test against staging orchestrator once the bulk endpoint lands

Three changes that together remove the dispatch ceiling on uploads.

Drain pacing: the daemon main loop did one drain_once per iteration then
slept 10s, capping dispatch at ~32 files per 10s regardless of how fast
the 8 worker threads finished. _drain_queue now tight-loops drain_once
until the queue is empty, then returns to the main loop. The thread
pool's worker count is the actual concurrency ceiling.

Drainer batch size 32 → 100 to match the server's /v1/track/upload-urls
cap (called out in the existing comment but not enforced).

Bulk metadata upserts: workers were each making a synchronous
POST /v1/track/sessions/{id} after every successful S3 PUT, doubling
per-file network round trips. Workers now buffer per-upload metadata in
the daemon; the main loop flushes via a new POST /v1/track/sessions/bulk
each iteration and on graceful shutdown. Failed flushes leave items
buffered for retry; reconcile re-queues anything lost on crash.

Scrubber single-pass for the upload path: scrub() ran every regex twice
(once to enumerate hits for `flt track inspect`, once for substitution)
even when the upload path discarded the hits. Added scrub_text() that
skips the hit-enumeration pass; scrub_bytes uses it. scrub() retains
the hits API for non-upload callers.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 15df700. Configure here.

Comment thread fleet/track/daemon.py Outdated
Per cursor-bot review on PR #129. The bulk client chunks at
SERVER_BULK_UPSERT_BATCH_CAP internally; if a later chunk fails after
earlier ones succeed, the previous code re-buffered ALL items for retry —
meaning successive flushes silently re-sent rows the server already had.

Fix:
- Add `BulkUpsertPartialFailure(TrackAPIError)` carrying `unsent_items`
  (the failing chunk plus everything after it).
- `upsert_sessions_bulk` raises it on per-chunk failure, leaving committed
  earlier chunks alone.
- Daemon's `_flush_metadata_buffer` catches it: marks the sent prefix
  as indexed, re-buffers only the unsent tail.

Tests: partial-failure exception carries the right tail; first-chunk
failure marks all items unsent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant