Skip to content

fix(embedding): not-found metric classification + backfill pagination#91

Merged
BhagyaAmarasinghe merged 2 commits into
mainfrom
fix/embedding-backfill-and-notfound
Jun 24, 2026
Merged

fix(embedding): not-found metric classification + backfill pagination#91
BhagyaAmarasinghe merged 2 commits into
mainfrom
fix/embedding-backfill-and-notfound

Conversation

@xernobyl

Copy link
Copy Markdown
Contributor

What does this PR do?

Two independent hardening fixes to the embedding enrichment pipeline, both pre-existing on main and surfaced while reviewing the translation pipeline in #90. One commit each.

  1. fix: a not-found record is no longer recorded as a terminal failure. In FeedbackEmbeddingWorker.Work, a not-found GetFeedbackRecord — a benign race where the record was deleted or its tenant purged between enqueue and processing — recorded failed_final plus a get_record_failed worker error before the not-found short-circuit returned nil. That inflates hub_embedding_outcomes_total{status="failed_final"} and hub_embedding_worker_errors_total and can trip false alerts; it was also inconsistent with the same worker's not-found-on-write path, which already records skipped. It now records skipped (no worker error), and reserves the failure metrics for genuine errors.

  2. perf: the embedding backfill streams in keyset pages. BackfillEmbeddings / ListFeedbackRecordIDsForBackfill materialized every eligible record id in one slice; on a large deployment that is a memory spike when the CLI runs. They now keyset-paginate (fr.id > cursor ORDER BY fr.id LIMIT n), enqueuing page by page so the full set is never held in memory at once.

No API or behavior change; the cmd/backfill-embeddings interface is unchanged (it now streams internally).

How should this be tested?

  • make build
  • make fmt && make lint → 0 issues
  • make tests (integration tests in tests/, DATABASE_URL → a test DB with migrations + River applied), including:
    • TestFeedbackEmbeddingWorker_GetNotFoundRecordsSkipped — a not-found GET records skipped, not failed_final and not a worker error
    • tests/embedding_backfill_test.go — keyset pagination (bounded pages, strictly-ascending ids, no id on two pages, completeness) and the service backfill streaming every eligible record
  • Manual: run cmd/backfill-embeddings over a DB with un-embedded records → enqueues them in keyset pages (no behavior or CLI change).

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Repository Guidelines
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand bits
  • Ran make build
  • Ran make tests (integration tests in tests/)
  • Ran make fmt and make lint; no new warnings
  • Removed debug prints / temporary logging
  • Merged the latest changes from main onto my branch (branch is cut from current main)
  • If database schema changed: n/a — no schema change

Appreciated

  • If API changed: n/a — no API change
  • If API behavior changed: n/a
  • Updated docs in docs/ if changes were necessary: n/a
  • Ran make tests-coverage for meaningful logic changes

xernobyl added 2 commits June 23, 2026 15:42
When GetFeedbackRecord returns not-found at the start of Work — a benign race where
the record was deleted or its tenant purged between enqueue and processing — the
worker recorded terminal-failure metrics (a get_record_failed worker error plus a
failed_final outcome) before the not-found short-circuit. That inflates the failure
metrics and can trip false alerts, and it was inconsistent with the same worker's
not-found-on-write path, which already records skipped.

Check not-found first and record it as skipped (no worker error); record the failure
metrics only for genuine errors.
ListFeedbackRecordIDsForBackfill and BackfillEmbeddings materialized every eligible
record id in one slice; on a large deployment that is a memory spike when the CLI runs.
They now keyset-paginate (id > cursor LIMIT n), enqueuing page by page so the full set
is never held in memory at once.
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e8eec03c-fb4b-404a-a8a9-3d5f3772aafb

📥 Commits

Reviewing files that changed from the base of the PR and between 4141d58 and 2d5fb56.

📒 Files selected for processing (5)
  • internal/repository/embeddings_repository.go
  • internal/service/feedback_records_service.go
  • internal/workers/feedback_embedding.go
  • internal/workers/feedback_embedding_test.go
  • tests/embedding_backfill_test.go

Walkthrough

ListFeedbackRecordIDsForBackfill in the embeddings repository is updated to accept an afterID cursor and a limit, with the SQL rewritten to add fr.id > $2, ORDER BY fr.id, and LIMIT $3. The FeedbackRecordsRepository interface is updated to match, and BackfillEmbeddings is rewritten from a single full fetch to a keyset pagination loop with a page size of 500. Separately, FeedbackEmbeddingWorker.Work now intercepts ErrNotFound from GetFeedbackRecord as a skipped outcome, records metrics, and returns nil, replacing the prior downstream suppression branch. Integration tests for pagination correctness and end-to-end backfill streaming are added, along with a unit test for the skipped outcome.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes both main changes: fixing not-found metric classification and implementing backfill pagination, which are the primary objectives of this PR.
Description check ✅ Passed The description comprehensively addresses the template requirements, including clear objectives, detailed testing instructions, and a completed checklist covering all required items.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@BhagyaAmarasinghe BhagyaAmarasinghe left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@BhagyaAmarasinghe BhagyaAmarasinghe added this pull request to the merge queue Jun 24, 2026
Merged via the queue into main with commit 0b66636 Jun 24, 2026
11 checks passed
@BhagyaAmarasinghe BhagyaAmarasinghe deleted the fix/embedding-backfill-and-notfound branch June 24, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants