perf(track): bulk metadata upserts + drain pacing + single-pass scrub by trentshaines · Pull Request #129 · fleet-ai/fleet-sdk

trentshaines · 2026-05-08T01:46:30Z

Summary

Three SDK-side changes that together remove the dispatch ceiling on flt track uploads. The companion orchestrator PR adds the POST /v1/track/sessions/bulk endpoint this client targets.

Drain pacing: daemon main loop now tight-loops drain_once until the queue is empty rather than doing one drain per 10s sleep. The 8-thread upload pool is now actually saturable.
Drainer batch size: 32 → 100 to match the server's /v1/track/upload-urls cap (which the existing comment already acknowledged).
Bulk metadata upserts: workers buffer per-upload metadata; the main loop flushes via a new POST /v1/track/sessions/bulk each iteration and on graceful shutdown. Halves per-file network round trips.
Scrubber single-pass for uploads: scrub_bytes now uses a new scrub_text that skips the hits-enumeration pass (which the upload path discarded anyway). scrub() retains the full API for flt track inspect-style callers.

Failure semantics:

Failed bulk flush leaves items in the buffer for the next flush; reconcile re-queues anything lost on crash.
Daemon uses the bulk endpoint exclusively for new metadata writes — needs the orchestrator side merged first.

Test plan

All 525 track tests pass (pytest tests/track)
Full SDK suite passes (pytest) — 735 passed, 11 skipped
Added tests: bulk client serializes + chunks per server cap; daemon coalesces many uploads into one bulk request; daemon retains items on flush failure and retries successfully; scrub_text matches scrub().text on diverse inputs; _drain_queue tight-loops until claimed=0
Smoke-test against staging orchestrator once the bulk endpoint lands

Three changes that together remove the dispatch ceiling on uploads. Drain pacing: the daemon main loop did one drain_once per iteration then slept 10s, capping dispatch at ~32 files per 10s regardless of how fast the 8 worker threads finished. _drain_queue now tight-loops drain_once until the queue is empty, then returns to the main loop. The thread pool's worker count is the actual concurrency ceiling. Drainer batch size 32 → 100 to match the server's /v1/track/upload-urls cap (called out in the existing comment but not enforced). Bulk metadata upserts: workers were each making a synchronous POST /v1/track/sessions/{id} after every successful S3 PUT, doubling per-file network round trips. Workers now buffer per-upload metadata in the daemon; the main loop flushes via a new POST /v1/track/sessions/bulk each iteration and on graceful shutdown. Failed flushes leave items buffered for retry; reconcile re-queues anything lost on crash. Scrubber single-pass for the upload path: scrub() ran every regex twice (once to enumerate hits for `flt track inspect`, once for substitution) even when the upload path discarded the hits. Added scrub_text() that skips the hit-enumeration pass; scrub_bytes uses it. scrub() retains the hits API for non-upload callers.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 15df700. Configure here.}

Per cursor-bot review on PR #129. The bulk client chunks at SERVER_BULK_UPSERT_BATCH_CAP internally; if a later chunk fails after earlier ones succeed, the previous code re-buffered ALL items for retry — meaning successive flushes silently re-sent rows the server already had. Fix: - Add `BulkUpsertPartialFailure(TrackAPIError)` carrying `unsent_items` (the failing chunk plus everything after it). - `upsert_sessions_bulk` raises it on per-chunk failure, leaving committed earlier chunks alone. - Daemon's `_flush_metadata_buffer` catches it: marks the sent prefix as indexed, re-buffers only the unsent tail. Tests: partial-failure exception carries the right tail; first-chunk failure marks all items unsent.

cursor Bot reviewed May 8, 2026

View reviewed changes

Comment thread fleet/track/daemon.py Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(track): bulk metadata upserts + drain pacing + single-pass scrub#129

perf(track): bulk metadata upserts + drain pacing + single-pass scrub#129
trentshaines wants to merge 2 commits into
mainfrom
fleetcode-upload-revamp

trentshaines commented May 8, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

trentshaines commented May 8, 2026

Summary

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant