Skip to content

feat: add tenant settings for language enrichment#89

Merged
xernobyl merged 33 commits into
mainfrom
feat/ENG-1254_tenant-settings-hub-language-enrichment
Jun 26, 2026
Merged

feat: add tenant settings for language enrichment#89
xernobyl merged 33 commits into
mainfrom
feat/ENG-1254_tenant-settings-hub-language-enrichment

Conversation

@xernobyl

@xernobyl xernobyl commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

First task in the Language consolidation chain (project: Hub Native Enrichment). Adds a tenant-scoped settings store so language enrichment can be configured per tenant — starting with target_language (a normalized BCP-47 locale, e.g. en-US) — and, following review, wires the consumption side so the setting is usable end to end (see Added during review below). Pairs with ENG-1255 (translate feedback records), which reads these settings.

Fixes ENG-1254

Design

  • Storage: one row per tenant in a new tenant_settings table, keyed by tenant_id (natural PK, no surrogate id), with a single open-ended settings JSONB column mapped to a typed EnrichmentSettings struct — new settings are added as struct fields with no schema migration.
  • API: GET/PUT/PATCH /v1/tenants/{tenant_id}/settings. PUT is a full replace. PATCH is an RFC 7396 JSON Merge Patch (application/merge-patch+json; application/json also accepted): a member with a value sets that setting, an explicit JSON null removes the key, and an omitted member is left unchanged — and it creates the row when the tenant has none. An empty string is rejected (a value must be a valid locale; clear via null). Both verbs normalize the locale via golang.org/x/text/language (en-usen-US). An unconfigured tenant returns 200 with empty settings, not 404 — consumers decide the fallback.
  • Body cap: PUT and PATCH bodies are capped at 8 KiB (http.MaxBytesReader); an oversized body returns 413 with a dedicated content_too_large problem code (added to the shared RFC 9457 response layer + the OpenAPI ErrorModel enum).
  • Tenant scoping: tenant_id is path-only (the body has no tenant_id; unknown fields are rejected); reads are WHERE tenant_id = $1; writes are INSERT … ON CONFLICT (tenant_id) behind the existing shared tenant write-lock. The tenant-data purge now also deletes tenant_settings.
  • Reuse: the tenant write-lock helpers, tenant-id normalization, the x/text dependency, and the response/validation helpers — no new dependencies. PUT and PATCH share one body-decode/validate helper (decodeSettingsBody) and one locked-write helper (writeSettings). The merge patch is applied in SQL — (settings || set) - removeKeys::text[] — so null members delete keys while unmentioned keys survive; a small generic Optional[T] distinguishes absent / null / value when decoding the patch body.

Trust boundary: like every existing Hub endpoint, the service authenticates with a single static API key and the caller passes tenant_id; there is no per-tenant authn here — isolation is data-scoping (consistent with the ENG-1289 conclusion).

Deferred to follow-ups (out of scope):

  • A repo-wide request body-limit middleware (the 8 KiB cap here is per-handler; other endpoints remain uncapped).
  • Making the settings cache multi-replica-coherent (TTL/Redis): the eviction added below is per-process, so cross-replica staleness is still TTL-bounded.

API examples

PUT /v1/tenants/org-123/settings    {"target_language":"de-de"}
→ 200 {"tenant_id":"org-123","settings":{"target_language":"de-DE"}}

PATCH /v1/tenants/org-123/settings  {"target_language":"fr-fr"}   (merge; unspecified keys preserved)
→ 200 {"tenant_id":"org-123","settings":{"target_language":"fr-FR"}}

PATCH /v1/tenants/org-123/settings  {"target_language":null}      (RFC 7396 null removes the key)
→ 200 {"tenant_id":"org-123","settings":{}}

GET /v1/tenants/never-configured/settings
→ 200 {"tenant_id":"never-configured","settings":{}}

PUT /v1/tenants/org-123/settings    (body > 8 KiB)
→ 413 {"code":"content_too_large","title":"Request Entity Too Large", ...}

Added during review

Reviewer feedback (thanks @BhagyaAmarasinghe) surfaced gaps between a bare settings store and a working enrichment path. Addressed in this PR:

  • Deployment default language. New TRANSLATION_DEFAULT_LANGUAGE env (BCP-47, canonicalized at load) — the fallback target for tenants with no target_language of their own; empty keeps translation strictly per-tenant opt-in. It is threaded through the service so a tenant's effective target — COALESCE(its own target_language, TRANSLATION_DEFAULT_LANGUAGE) — is resolved consistently at the enqueue gate, both backfills (global and per-tenant), and the SetTranslation write-guard. So an operator can turn translation on for every tenant with one env var; the guard persists a default-resolved write while still superseding a stale former-explicit-target write after a tenant clears its target; and clearing a tenant's target re-translates its existing records to the default.
  • Cache coherence. A settings write now evicts the tenant's cached settings (composed with the translation backfill listener), so a changed or newly enabled target is seen by the enqueue gate immediately instead of after the cache TTL.
  • Stale translation clear. An update to value_text/language now clears value_text_translated/translation_lang_key — but only on an actual change — so an edited record can't keep a stale translation; the row falls back to the original text and stays recoverable by a backfill.

On ENG-1254's original contract: the ticket sketched translation_enabled + target_language + env-default + explicit disabled-vs-default. We simplified to env-based enablement (TRANSLATION_PROVIDER + TRANSLATION_MODEL switch the whole deployment) + target_language + TRANSLATION_DEFAULT_LANGUAGE fallback — no per-tenant translation_enabled. Opt-in is "the tenant has a target" (or the deployment default), which removes a redundant flag and the ambiguous enabled-but-no-target state. ENG-1254 has been updated to this design.

How should this be tested?

Config: API_KEY=<key>, DATABASE_URL=postgres://postgres:postgres@localhost:5432/test_db?sslmode=disable. To exercise translation end to end, also set TRANSLATION_PROVIDER + TRANSLATION_MODEL (provider creds) and optionally TRANSLATION_DEFAULT_LANGUAGE.

Automated:

  • make init-db && make river-migrate (applies schema incl. 012_tenant_settings.sql)
  • make tests — integration tests in tests/: round-trip, defaults (200 not 404), locale normalization, tenant A↔B isolation, body-tenant_id rejected, invalid locale → 400 (PUT and PATCH), purge-removes-settings, 413 on an oversized body (PUT and PATCH, asserts the content_too_large code), and the RFC 7396 PATCH paths — value-merge preserves other keys, null removes a key (siblings kept, key deleted from the JSONB), an empty string is rejected (400), and PATCH creates the row for a brand-new tenant. Plus an Optional[T] tri-state unit test (absent/null/value).
  • Default-language fallback (added during review): config validation/canonicalization of TRANSLATION_DEFAULT_LANGUAGE (unit), the enqueue-gate fallback + per-tenant override (provider unit), the default-aware global and per-tenant backfills, and the write-guard — a default-resolved write persists, while a stale former-explicit-target write (after a tenant clears its target) is superseded (integration in tests/).
  • make tests-coverage — new code: service 100%, handler 100%, Optional 100%, repository file ~85%

Manual smoke (server on :8099):

curl -X PUT -H "Authorization: Bearer $API_KEY" -H 'Content-Type: application/json' \
  -d '{"target_language":"de-de"}' localhost:8099/v1/tenants/org-123/settings   # 200, de-DE
curl -X PATCH -H "Authorization: Bearer $API_KEY" -H 'Content-Type: application/merge-patch+json' \
  -d '{"target_language":"fr-fr"}' localhost:8099/v1/tenants/org-123/settings   # 200, fr-FR (merge)
curl -X PATCH -H "Authorization: Bearer $API_KEY" -H 'Content-Type: application/merge-patch+json' \
  -d '{"target_language":null}'   localhost:8099/v1/tenants/org-123/settings    # 200, {} (removed)
curl -H "Authorization: Bearer $API_KEY" localhost:8099/v1/tenants/unset/settings # 200, {}
curl localhost:8099/v1/tenants/org-123/settings                                   # 401 (no key)

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Repository Guidelines
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand bits (tenant write-lock rationale, defaults-when-unset, the //nolint for ErrNoRows, the RFC 7396 set/remove SQL, the default-language write-guard)
  • Ran make build
  • Ran make tests (integration tests in tests/, against test_db)
  • Ran make fmt and make lint; no new warnings (0 issues)
  • Removed debug prints / temporary logging
  • Merged the latest changes from main onto my branch with git pull origin main — branched off origin/main@4141d58; rebase before merge
  • If database schema changed: added migration in migrations/ with goose annotations and ran make migrate-validate (012_tenant_settings.sql; also applied via goose up)

Appreciated

  • If API changed: added or updated OpenAPI spec and ran contract tests (make tests)
  • If API behavior changed: added request/response examples to this PR
  • Updated docs in docs/ if changes were necessary — companion hub-api-docs PRs (stainless-sdks/hub-api-docs#8 tenant settings, stainless-sdks/hub-api-docs#9 translated feedback incl. the TRANSLATION_DEFAULT_LANGUAGE env); the API reference is auto-generated from the OpenAPI spec
  • Ran make tests-coverage for meaningful logic changes

xernobyl added 2 commits June 19, 2026 17:39
Add a tenant-scoped settings store so language enrichment can be
configured per tenant, starting with target_language (a normalized
BCP-47 locale, e.g. en-US). First task in the language consolidation
chain; unblocks per-tenant translated enrichment (ENG-1255).

- migration 012: tenant_settings keyed by tenant_id (natural PK) with
  an open-ended settings JSONB, so new settings are added as struct
  fields without a schema migration
- repository: tenant-scoped Get and a lock-serialized upsert
  (INSERT ... ON CONFLICT (tenant_id)) reusing the tenant write lock
- service: GetSettings returns documented defaults when unset (never
  404); target_language is validated/normalized via golang.org/x/text
- HTTP: GET/PUT /v1/tenants/{tenant_id}/settings; tenant_id is
  path-only so a request can only ever address its own tenant
- tenant data purge now also removes tenant_settings
- OpenAPI spec, unit tests, and integration tests (round-trip,
  defaults, tenant isolation, purge cleanup)

ENG-1254
Cap PUT /v1/tenants/{tenant_id}/settings request bodies at 8 KiB via
http.MaxBytesReader; an oversized body is rejected with 413 before it
is read into memory. Add a dedicated content_too_large problem code and
type to the shared response layer so a 413 is distinguishable from a
malformed-body 400 via the machine-readable `code` clients branch on,
and document it in the OpenAPI ErrorModel enum and the PUT responses.

- handler: 8 KiB MaxBytesReader -> 413; malformed JSON still -> 400
- response: CodeContentTooLarge / ProblemTypeContentTooLarge + 413 arms
  in codeForStatus/problemTypeForStatus (+ unit test)
- openapi: 413 response on PUT settings; content_too_large in the
  ErrorModel code enum
- tests: integration test asserts the 413 status and content_too_large code

ENG-1254
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

✱ Stainless preview builds

This PR will update the hub SDKs with the following commit message.

feat: add tenant settings for language enrichment
hub-openapi studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅

hub-typescript studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅build ✅lint ✅test ✅


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-06-26 19:18:07 UTC

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This pull request introduces a tenant-scoped enrichment settings API. A new tenant_settings PostgreSQL table is added via a Goose migration, storing per-tenant settings as JSONB. Go model types (EnrichmentSettings, TenantSettings, UpdateTenantSettingsRequest, PatchTenantSettingsRequest) are defined, followed by TenantSettingsRepository (Get, Upsert, Patch with JSONB merge), TenantSettingsService (BCP-47 locale normalization via normalizeTargetLanguage), and TenantSettingsHandler (GET/PUT/PATCH with body size capping and unknown-field rejection). A new content_too_large RFC 9457 problem type and code handle HTTP 413. Routes are registered at /v1/tenants/{tenant_id}/settings. Tenant purge (DeleteByTenant) is extended to also delete tenant_settings rows. The OpenAPI spec documents all three operations and new schemas. Unit and integration tests cover the full stack.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding tenant-scoped settings support for language enrichment configuration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description follows the template well, with scope, issue link, testing steps, examples, and checklist items filled in.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Add PATCH /v1/tenants/{tenant_id}/settings for partial updates: only the
fields present in the body change, omitted fields are left untouched, and an
empty string clears a field. PUT remains a full replace.

Implemented as an atomic top-level JSONB merge (settings = settings || patch)
inside the existing shared tenant write lock, so concurrent writes don't race
and settings the patch doesn't mention survive. Pointer fields in the request
distinguish "absent" (leave unchanged) from "set" (including empty = clear).

- repository: Patch (|| merge) + a shared writeSettings helper used by both
  Upsert and Patch
- handler: Patch + a shared decodeSettingsBody helper (8KB cap, 413,
  unknown-field rejection) used by both Update and Patch
- openapi: PATCH operation + PatchTenantSettingsInputBody schema
- tests: service unit tests + integration tests, including a merge test
  proving a key the patch omits is preserved

ENG-1254

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@migrations/012_tenant_settings.sql`:
- Line 1: The Goose migration annotations in this file are using capitalized
forms that do not match the project's coding guidelines. Change the `-- +goose
Up` annotation to `-- +goose up` (lowercase) and the `-- +goose Down` annotation
to `-- +goose down` (lowercase) to maintain consistency with the documented
style guidelines. Both annotations should use lowercase variants throughout the
migration file.

In `@tests/tenant_settings_test.go`:
- Around line 341-349: The TestTenantSettings_PatchBodyTooLargeRejected function
currently only asserts the HTTP status code but does not verify the RFC 9457
problem code in the response body, unlike the corresponding PUT test. After
asserting the StatusCode is http.StatusRequestEntityTooLarge, you need to read
and parse the resp.Body to extract the code field and assert it equals the
expected RFC 9457 code value (likely "content_too_large" to match the PUT test
behavior). This prevents regressions where PATCH returns the wrong problem code
despite having the correct status code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8f3c484b-e03a-4723-9807-cc8da50c5836

📥 Commits

Reviewing files that changed from the base of the PR and between 4141d58 and e54efb6.

📒 Files selected for processing (16)
  • cmd/api/app.go
  • cmd/api/app_test.go
  • internal/api/handlers/tenant_settings_handler.go
  • internal/api/response/problem.go
  • internal/api/response/response_test.go
  • internal/models/tenant_settings.go
  • internal/repository/tenant_data_repository.go
  • internal/repository/tenant_data_repository_test.go
  • internal/repository/tenant_settings_repository.go
  • internal/service/tenant_settings_service.go
  • internal/service/tenant_settings_service_test.go
  • migrations/012_tenant_settings.sql
  • openapi.yaml
  • tests/helpers.go
  • tests/integration_test.go
  • tests/tenant_settings_test.go

Comment thread migrations/012_tenant_settings.sql Outdated
Comment thread tests/tenant_settings_test.go
xernobyl added 5 commits June 22, 2026 13:55
Add an integration test for the PATCH insert branch (a tenant with no
settings row yet): PATCH creates the row with the patched field and a
subsequent GET returns it. The other PATCH tests all seed a row first,
so this is the only coverage of the advertised create-on-new-tenant path.

ENG-1254
Extend TestTenantSettings_InvalidLocaleRejected to drive both PUT and
PATCH. Both verbs normalize target_language through the same path, so
both must reject an invalid locale with 400; previously only PUT was
exercised, leaving the PATCH handler's error-response branch (77.8%,
below the 80% bar) untested. The PATCH handler is now fully covered.

ENG-1254
AGENTS.md documents `-- +goose up` / `-- +goose down` (lowercase) as the
migration annotation style; 012 used the capitalized form. goose treats
the keywords case-insensitively, so this is style-only — no re-run, the
applied schema is unchanged.

ENG-1254
The PATCH oversized-body test only checked the 413 status, while the PUT
test also asserts the RFC 9457 `content_too_large` problem code. Mirror
it so a PATCH returning the wrong code under a correct 413 is caught.

ENG-1254
CleanupTestData has no callers anywhere in the repo (there is no
TestMain, and setupTestServer's cleanup never invokes it), so the
DELETE FROM tenant_settings line it was extended with never ran — dead
code giving false confidence of cleanup. The settings tests isolate via
UUID-unique tenant ids (testTenantID), like the rest of the suite, so
no cleanup is needed. Reverts tests/helpers.go to its prior state.

ENG-1254
@xernobyl xernobyl marked this pull request as draft June 22, 2026 14:28
xernobyl added 2 commits June 22, 2026 15:21
PATCH now follows JSON Merge Patch (RFC 7396): a member with a value sets
that setting, an explicit JSON null removes the key, and an omitted member
is left unchanged. Previously an empty string cleared a field and there was
no way to remove a key (the JSONB `||` merge could only add or overwrite).

- models: add a small generic Optional[T] that distinguishes absent vs null
  vs value when decoding a merge-patch body; PatchTenantSettingsRequest uses
  it instead of *string.
- service: translate the typed request into keys-to-set and keys-to-remove;
  reject an explicit empty string (clear via null), keeping the value
  contract "a valid BCP-47 locale, or null to remove".
- repository: Patch applies (settings || set) - removeKeys::text[], so null
  members delete keys while unmentioned keys survive; writeSettings takes the
  extra bind. Upsert (full replace) is unchanged.
- openapi: declare application/merge-patch+json (application/json still
  accepted), make target_language nullable, document the null-removes rule.
- tests: Optional tri-state unit test; service set/remove/empty-rejected
  cases; integration PatchNullRemovesField (key deleted, siblings kept) and
  PatchEmptyStringRejected.

ENG-1254
Spectral 6.16.0 (nimma) crashes with "Cannot read properties of null
(reading 'enum')" when a lint rule traverses a literal null leaf in an
example value — which the merge-patch `remove` example introduced
(target_language: null). Drop the structured null example and show the
null-removes form in the operation description instead. SDK codegen still
learns nullability from the schema (target_language type: [string, "null"]),
so the contract is unchanged.

ENG-1254
@xernobyl xernobyl marked this pull request as ready for review June 22, 2026 15:54
xernobyl added 13 commits June 22, 2026 16:04
Switching the PATCH field to Optional[string] dropped the struct-tag
validation (no_null_bytes, max=35) — go-playground's validator can't reach
through the custom type — so PATCH accepted null bytes and over-long values
that PUT and the OpenAPI maxLength:35 still reject. Enforce the same bounds
in the PATCH value path (normalizeProvidedTargetLanguage), restoring
PUT/PATCH parity and honoring the contract.

Also pin settingKeyTargetLanguage to the EnrichmentSettings json tag with a
unit test, so a tag rename can't silently break PATCH null-removal.

ENG-1254
ENG-1255 (persistence): scaffold language-enrichment storage. Adds the
value_text_translated + translation_lang_key columns (migration 013),
surfaces them as read-only fields on FeedbackRecord and the OpenAPI output
schema, includes them in every feedback-record read path, and adds a
tenant-write-locked SetTranslation repo method that does not publish a
domain event (so writing a translation can't loop the enrichment pipeline).
The async translation worker that populates them follows.

ENG-1255
ENG-1255: add TranslationConfig (TRANSLATION_* env; translation stays disabled
unless provider and model are set) and a per-process TTL+LRU tenant-settings
cache — the cache deferred from ENG-1254. The cache wraps the settings accessor
so the translation enqueue gate and worker can resolve a tenant's target
language without a DB read per feedback event; staleness is TTL-bounded and
self-corrects because the worker records the target it actually used. Registers
a bounded "tenant_settings" cache metric label. TRANSLATION_BASE_URL is
normalized/validated like EMBEDDING_BASE_URL.

ENG-1255
Re-review follow-ups on the ENG-1255 foundation:
- Unify the feedback-record column/scan duplication onto one feedbackRecordColumns
  const + the existing scanFeedbackRecord helper. The repo previously inlined 4
  column lists + 4 scans while taxonomy_repository.go had its own scanner; that
  scanner is now the single one (extended to the two translation columns) and the
  taxonomy node-records query selects them too. Collapses 8 sites to 2 so column
  and scan order cannot drift (a silent runtime scan error otherwise).
- config: default TENANT_SETTINGS_CACHE_TTL in applyDefaults too, mirroring the
  other nested DurationSec defaults, so a dropped env-default tag can't silently
  disable the cache; disable explicitly via TENANT_SETTINGS_CACHE_SIZE=0.
- observability: cache metric descriptions now list tenant_settings as a label.
- tests: add a SetTranslation integration test (persist, clear, NotFound) — the
  highest-risk new write path, covering the value/NULL round-trip.

ENG-1255
ENG-1255: the LLM seam for translation enrichment. Adds a TranslationClient
interface (Translate(TranslateRequest) -> text) and a provider-agnostic factory
mirroring the embedding factory — openai / google / google-gemini, explicit
TRANSLATION_PROVIDER (no default), validating API key, base URL, and Gemini
project+location. The OpenAI and Google SDK wrappers gain a low-level Translate
(chat completions / generate-content at temperature 0); a prompt adapter builds
a Formbricks-style "professional translator" prompt using human-readable language
names (x/text/language/display), falling back to "original language" when the
source is unknown. Unit tests cover config validation, the language-name helper,
and prompt rendering.

ENG-1255
ENG-1255: the enqueue side of translation enrichment, mirroring the embedding
provider. Adds FeedbackTranslationArgs (River job, unique by record + target +
value_text hash) and TranslationProvider — on a feedback-record create (non-empty
open text) or an update that changed value_text, it enqueues a job, but only for
text fields with non-empty value_text whose tenant has a target language
configured (read via the settings cache). Failures are logged and swallowed so
record ingestion is never blocked.

Consolidates the two identical per-feature River inserter interfaces into one
shared RiverJobInserter, and reuses TenantSettingsReader for the target lookup
(the iface linter flagged the duplicates). Unit tests cover enqueue eligibility,
target gating, update-changed-fields, and the error/skip paths.

ENG-1255
ENG-1255: the worker side. FeedbackTranslationWorker loads the record, translates
value_text into the job's target language via the TranslationClient — or copies
value_text verbatim when the source language already shares the target's base
language (no LLM call) — and persists it through FeedbackRecordsService.SetTranslation.
Mirrors the embedding worker's error handling: a missing record completes the job, a
tenant write conflict retries, a provider error retries then fails on the final
attempt; value_text that became empty since enqueue is skipped. Registered in the
River wiring, gated on a configured TranslationClient.

Adds FeedbackRecordsService.SetTranslation (+ the repository interface method) and 8
worker unit tests (translate, source==target copy, skip-empty, not-found, provider
retry/fail, tenant-write-conflict, record-gone-on-write).

ENG-1255
Address review findings on the translation pipeline:

- Clear-on-empty was unwired: editing value_text to empty left a stale
  value_text_translated forever. The provider now enqueues on an update that
  empties value_text (mirroring the embedding provider) and the worker clears the
  translation (SetTranslation nil) instead of skipping; the repository nulls both
  translation columns when the translation is nil.
- The source==target short-circuit compared only the base language, so it would
  copy zh-Hans text as a zh-Hant "translation". It now compares base AND script
  (sameLanguageAndScript): different scripts translate, regional variants
  (en-US/en-GB) still copy.
- Update the openai/googleai package docs to mention chat/generate-content.

Adds provider (update-to-empty enqueues a clear) and worker (clears on empty,
translates across scripts) tests.

ENG-1255
Address the full-feature review:

- Undetermined source ("und") was coerced to a guessed base by likely-subtags, so
  the source==target short-circuit copied text untranslated. Guard against
  language.Und in sameLanguageAndScript so an undetermined tag always translates.
- A source-language correction never re-translated: the provider only re-enqueued
  on value_text changes and the dedup hash ignored the source language. The provider
  now also enqueues on a language change, and the dedup hash folds in the source
  language (translationContentHash), so a correction produces a fresh job.
- Drop the stray "backfill" mention from the job-args doc (no translation backfill
  exists) and document the TRANSLATION_* and TENANT_SETTINGS_CACHE_* env vars in
  .env.example, mirroring EMBEDDING_*.

Tests: provider re-enqueues on a language change + the content hash varies by source
language; worker translates (not copies) an "und" source.

ENG-1255
ENG-1255: activate the translation pipeline, gated on TRANSLATION_PROVIDER +
TRANSLATION_MODEL (disabled otherwise), mirroring the embedding wiring.

- cmd/api: register the FeedbackTranslationWorker + translations queue (so the
  combined api process also translates, like embeddings) and register the
  TranslationProvider with the message manager, resolving the tenant target
  language through a short-TTL CachedTenantSettings over tenant settings.
- cmd/worker: build the TranslationClient and populate RiverDeps so the dedicated
  worker process translates jobs.

End to end: the provider enqueues on a feedback-record create/update for a
text field whose tenant has a target language; the worker translates value_text
(or copies when source==target) and persists it.

ENG-1255
ENG-1255: re-translate existing records when translation is first enabled or a
tenant changes its target language.

- repository.ListTranslationBackfillTargets joins tenant_settings to find text
  records with non-empty value_text whose tenant has a target language and whose
  stored translation_lang_key differs from it (never translated, or stale).
- FeedbackRecordsService.BackfillTranslations enqueues a "backfill" job per target;
  the inserter/queue/attempts are caller-provided (a one-off command), so the shared
  service keeps no backfill-only dependency.
- cmd/backfill-translations wires the client + a River producer and runs it,
  mirroring cmd/backfill-embeddings. Gated on TRANSLATION_PROVIDER+MODEL.

Tests: service unit (enqueues one job per target; repo error propagates) and a
Postgres integration test for the backfill query (untranslated/stale included,
already-current and no-target excluded).

ENG-1255
Drive FeedbackTranslationWorker end to end against Postgres with a fake
TranslationClient: translate + persist (source value_text preserved), copy
verbatim when source base+script matches the target (no provider call), and clear
a stale translation when value_text is empty. Complements the repo-level
SetTranslation and backfill-query integration tests.

Also clarify the cmd/api comment: the API registers the translation worker only to
satisfy River's insert-time validation; jobs are processed by hub-worker, not in
the API process (mirrors the embedding wiring).

ENG-1255
Mirror the embedding pipeline's metrics on the translation provider and worker so the
new enrichment path is observable in production:

  hub_translation_jobs_enqueued_total
  hub_translation_provider_errors_total{reason}
  hub_translation_outcomes_total{status}
  hub_translation_worker_errors_total{reason}
  hub_translation_duration_seconds{status}

TranslationMetrics mirrors EmbeddingMetrics (nil when metrics are disabled) and is wired
through NewMetrics, RiverDeps, cmd/api, and cmd/worker. The provider records enqueue
counts plus settings-read/enqueue errors; the worker records outcome + duration + worker
errors at every branch (success, skip, clear, retry, failed_final). Backfill stays
producer-only (nil metrics).

Bounded label sets gate every reason/status to "other". Unlike the embedding worker —
which emits "tenant_write_conflict" without listing it, so it buckets to "other" — the
translation worker's reason set includes it, so the conflict metric is labeled accurately.

Covered by observability metric tests (real SDK manual reader), provider/worker recording
tests, and the existing pipeline tests.

ENG-1255
…y guard

Worker: a not-found GetFeedbackRecord is a benign delete/purge race, not a terminal
failure. Record it as "skipped" (consistent with the not-found-on-write path) instead
of "failed_final" plus a worker error, so it no longer trips failure alerts.

Config: honor an explicit TENANT_SETTINGS_CACHE_SIZE=0 to disable the cache (the
documented behavior; NewCachedTenantSettings already treats size <= 0 as "no caching").
The previous `<= 0` default reset 0 to 2048, so disable could never take effect — now
the size is defaulted only when the env var is unset.

Service: SetTranslation rejects a (translated, "") pair via ErrTranslationLangKeyRequired
so a translation can never persist without the locale it was produced in; clearing (nil
translation) still passes through.

Also: clear-path integration tests now assert translation_lang_key is nulled too, and the
NewMetrics doc comment lists CacheMetrics.

ENG-1255
@xernobyl xernobyl marked this pull request as draft June 23, 2026 14:36
xernobyl added 3 commits June 23, 2026 15:24
Changing a tenant's target_language previously had no effect on existing feedback
records — only newly created/edited records were translated, and refreshing the
backlog required the global backfill CLI. A settings change now automatically
triggers a per-tenant re-translation backfill.

Design (clean separation; generalizes to future enrichment settings):
- TenantSettingsService gains a translation-free SettingsChangeListener port and
  fires it after a successful write with the keys that changed (PUT: all settable
  keys; PATCH: keys present, incl. a null removal; skipped on a no-op PATCH). A
  listener issue never fails the settings write.
- EnrichmentSettingsListener (adapter) maps a changed key to the enrichment backfill
  it triggers via a config-built handler map: target_language enqueues a durable
  TenantTranslationBackfillArgs job. Not routed through the webhook-coupled, lossy
  event bus. Adding a future setting (e.g. sentiment_enabled) is one map entry.
- TenantTranslationBackfillWorker keyset-paginates the tenant's stale records and
  enqueues the existing per-record FeedbackTranslationArgs jobs, off the request
  path. Unique by TenantID (one in-flight backfill per tenant); crash-safe via the
  idempotent "translation_lang_key IS DISTINCT FROM target_language" query.
- Repo ListTranslationBackfillTargetsForTenant (keyset, shares SQL with the global
  query) + service BackfillTranslationsForTenant.

Clearing target_language leaves existing translations in place (the query's
non-empty guard makes the triggered backfill a no-op). The global backfill CLI
remains the guaranteed recovery path.

ENG-1255
BackfillTranslations materialized every eligible target across all tenants in one
slice; on a large deployment that is a memory spike when the CLI runs. It now keyset-
paginates (id > cursor LIMIT n) like the per-tenant backfill, with the shared loop
factored into backfillTranslationsPaged so both paths stream identically.

ENG-1255
A provider 429 was treated as a generic error, so the translation worker
burned through River's retry attempts on the fast backoff and dropped the
work as failed_final, ignoring the provider's Retry-After / RetryInfo hint.

The openai and google clients now classify a 429 (RESOURCE_EXHAUSTED) as a
shared huberrors.RateLimitError carrying the provider's retry-after hint, and
the worker snoozes for that delay (river.JobSnooze) instead of failing.
Snoozing re-queues without consuming an attempt, so a burst against a
rate-limited model defers rather than drops work. The delay is clamped
(5s-5min, default 30s) and a per-job window (1h) bounds indefinite snoozing
against a standing quota; past it the job fails normally and a backfill
recovers it. A rate_limited worker-error metric records the deferral.
@xernobyl xernobyl marked this pull request as ready for review June 23, 2026 17:20
xernobyl and others added 2 commits June 24, 2026 15:32
…tion

A feedback translation job carries the tenant's target language captured at
enqueue time. If the target changed (or was read from a stale settings cache),
an out-of-order older job could finish after a newer-target job and overwrite
value_text_translated / translation_lang_key with the stale target.

SetTranslation now persists only while the tenant's current target_language
still matches the job's target (atomic UPDATE ... FROM tenant_settings); a
stale write matches no row and returns huberrors.ErrTranslationSuperseded,
which the worker records as a benign skip. The clear path stays unconditional.

Adds a worker unit test, an end-to-end worker/repo regression test, and a
distinct "superseded" worker-error metric label.
…ck-records

feat: enrich feedback records with translated text
Comment thread internal/service/tenant_settings_service.go
Comment thread internal/repository/feedback_records_repository.go Outdated
Comment thread internal/models/tenant_settings.go
xernobyl added 2 commits June 26, 2026 13:56
…anslation on edit

Addresses two review findings on the tenant-settings / translation enqueue path:

- Settings cache staleness: CachedTenantSettings now implements SettingsChangeListener
  and is composed with the backfill listener, so a settings write evicts the tenant's
  cached entry. A freshly enabled/changed target is visible to the enqueue gate
  immediately instead of after TTL, so records created in the former staleness window
  are no longer skipped. (Eviction is per-process; cross-replica stays TTL-bounded.)

- Stale translation after a content edit: an Update that changes value_text or language
  now clears value_text_translated / translation_lang_key, but only when the value
  actually changes (CASE ... IS DISTINCT FROM the pre-update column). The row falls back
  to the original and becomes a backfill target, while an unchanged re-send keeps the
  valid translation (avoiding a deduped re-translation stranding it).

Adds buildUpdateQuery + cache-eviction unit tests and an Update integration test.
Tenants with no target_language of their own fall back to a deployment-wide
default (TRANSLATION_DEFAULT_LANGUAGE); an empty default keeps translation
per-tenant opt-in. The fallback is applied at the enqueue gate and the global
backfill. The SetTranslation write guard now also accepts a default-resolved
write for a tenant with no stored target, while still rejecting stale writes
for tenants that set an explicit target.
Comment thread internal/models/tenant_settings.go
Comment thread internal/repository/feedback_records_repository.go
Comment thread internal/repository/feedback_records_repository.go Outdated
xernobyl added 2 commits June 26, 2026 16:02
The SetTranslation write-guard and the settings-change (per-tenant) backfill now
resolve the tenant's effective target as COALESCE(its own target_language,
TRANSLATION_DEFAULT_LANGUAGE), threaded through the service. A stale job carrying
a tenant's former explicit target no longer writes once the tenant falls back to
the default, and clearing a target re-translates existing records to the default
instead of no-opping.
@xernobyl xernobyl enabled auto-merge June 26, 2026 16:36
@xernobyl xernobyl added this pull request to the merge queue Jun 26, 2026
Merged via the queue into main with commit 0f4e36f Jun 26, 2026
12 checks passed
@xernobyl xernobyl deleted the feat/ENG-1254_tenant-settings-hub-language-enrichment branch June 26, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants