perf(track): bulk metadata upserts + drain pacing + single-pass scrub#129
Open
trentshaines wants to merge 2 commits into
Open
perf(track): bulk metadata upserts + drain pacing + single-pass scrub#129trentshaines wants to merge 2 commits into
trentshaines wants to merge 2 commits into
Conversation
Three changes that together remove the dispatch ceiling on uploads.
Drain pacing: the daemon main loop did one drain_once per iteration then
slept 10s, capping dispatch at ~32 files per 10s regardless of how fast
the 8 worker threads finished. _drain_queue now tight-loops drain_once
until the queue is empty, then returns to the main loop. The thread
pool's worker count is the actual concurrency ceiling.
Drainer batch size 32 → 100 to match the server's /v1/track/upload-urls
cap (called out in the existing comment but not enforced).
Bulk metadata upserts: workers were each making a synchronous
POST /v1/track/sessions/{id} after every successful S3 PUT, doubling
per-file network round trips. Workers now buffer per-upload metadata in
the daemon; the main loop flushes via a new POST /v1/track/sessions/bulk
each iteration and on graceful shutdown. Failed flushes leave items
buffered for retry; reconcile re-queues anything lost on crash.
Scrubber single-pass for the upload path: scrub() ran every regex twice
(once to enumerate hits for `flt track inspect`, once for substitution)
even when the upload path discarded the hits. Added scrub_text() that
skips the hit-enumeration pass; scrub_bytes uses it. scrub() retains
the hits API for non-upload callers.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 15df700. Configure here.
Per cursor-bot review on PR #129. The bulk client chunks at SERVER_BULK_UPSERT_BATCH_CAP internally; if a later chunk fails after earlier ones succeed, the previous code re-buffered ALL items for retry — meaning successive flushes silently re-sent rows the server already had. Fix: - Add `BulkUpsertPartialFailure(TrackAPIError)` carrying `unsent_items` (the failing chunk plus everything after it). - `upsert_sessions_bulk` raises it on per-chunk failure, leaving committed earlier chunks alone. - Daemon's `_flush_metadata_buffer` catches it: marks the sent prefix as indexed, re-buffers only the unsent tail. Tests: partial-failure exception carries the right tail; first-chunk failure marks all items unsent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Three SDK-side changes that together remove the dispatch ceiling on
flt trackuploads. The companion orchestrator PR adds thePOST /v1/track/sessions/bulkendpoint this client targets.drain_onceuntil the queue is empty rather than doing one drain per 10s sleep. The 8-thread upload pool is now actually saturable./v1/track/upload-urlscap (which the existing comment already acknowledged).POST /v1/track/sessions/bulkeach iteration and on graceful shutdown. Halves per-file network round trips.scrub_bytesnow uses a newscrub_textthat skips the hits-enumeration pass (which the upload path discarded anyway).scrub()retains the full API forflt track inspect-style callers.Failure semantics:
Test plan
pytest tests/track)pytest) — 735 passed, 11 skipped