feat: add tenant settings for language enrichment by xernobyl · Pull Request #89 · formbricks/hub

xernobyl · 2026-06-22T11:15:16Z

What does this PR do?

First task in the Language consolidation chain (project: Hub Native Enrichment). Adds a tenant-scoped settings store so language enrichment can be configured per tenant — starting with target_language (a normalized BCP-47 locale, e.g. en-US) — and, following review, wires the consumption side so the setting is usable end to end (see Added during review below). Pairs with ENG-1255 (translate feedback records), which reads these settings.

Fixes ENG-1254

Design

Storage: one row per tenant in a new tenant_settings table, keyed by tenant_id (natural PK, no surrogate id), with a single open-ended settings JSONB column mapped to a typed EnrichmentSettings struct — new settings are added as struct fields with no schema migration.
API: GET/PUT/PATCH /v1/tenants/{tenant_id}/settings. PUT is a full replace. PATCH is an RFC 7396 JSON Merge Patch (application/merge-patch+json; application/json also accepted): a member with a value sets that setting, an explicit JSON null removes the key, and an omitted member is left unchanged — and it creates the row when the tenant has none. An empty string is rejected (a value must be a valid locale; clear via null). Both verbs normalize the locale via golang.org/x/text/language (en-us → en-US). An unconfigured tenant returns 200 with empty settings, not 404 — consumers decide the fallback.
Body cap: PUT and PATCH bodies are capped at 8 KiB (http.MaxBytesReader); an oversized body returns 413 with a dedicated content_too_large problem code (added to the shared RFC 9457 response layer + the OpenAPI ErrorModel enum).
Tenant scoping: tenant_id is path-only (the body has no tenant_id; unknown fields are rejected); reads are WHERE tenant_id = $1; writes are INSERT … ON CONFLICT (tenant_id) behind the existing shared tenant write-lock. The tenant-data purge now also deletes tenant_settings.
Reuse: the tenant write-lock helpers, tenant-id normalization, the x/text dependency, and the response/validation helpers — no new dependencies. PUT and PATCH share one body-decode/validate helper (decodeSettingsBody) and one locked-write helper (writeSettings). The merge patch is applied in SQL — (settings || set) - removeKeys::text[] — so null members delete keys while unmentioned keys survive; a small generic Optional[T] distinguishes absent / null / value when decoding the patch body.

Trust boundary: like every existing Hub endpoint, the service authenticates with a single static API key and the caller passes tenant_id; there is no per-tenant authn here — isolation is data-scoping (consistent with the ENG-1289 conclusion).

Deferred to follow-ups (out of scope):

A repo-wide request body-limit middleware (the 8 KiB cap here is per-handler; other endpoints remain uncapped).
Making the settings cache multi-replica-coherent (TTL/Redis): the eviction added below is per-process, so cross-replica staleness is still TTL-bounded.

API examples

PUT /v1/tenants/org-123/settings    {"target_language":"de-de"}
→ 200 {"tenant_id":"org-123","settings":{"target_language":"de-DE"}}

PATCH /v1/tenants/org-123/settings  {"target_language":"fr-fr"}   (merge; unspecified keys preserved)
→ 200 {"tenant_id":"org-123","settings":{"target_language":"fr-FR"}}

PATCH /v1/tenants/org-123/settings  {"target_language":null}      (RFC 7396 null removes the key)
→ 200 {"tenant_id":"org-123","settings":{}}

GET /v1/tenants/never-configured/settings
→ 200 {"tenant_id":"never-configured","settings":{}}

PUT /v1/tenants/org-123/settings    (body > 8 KiB)
→ 413 {"code":"content_too_large","title":"Request Entity Too Large", ...}

Added during review

Reviewer feedback (thanks @BhagyaAmarasinghe) surfaced gaps between a bare settings store and a working enrichment path. Addressed in this PR:

Deployment default language. New TRANSLATION_DEFAULT_LANGUAGE env (BCP-47, canonicalized at load) — the fallback target for tenants with no target_language of their own; empty keeps translation strictly per-tenant opt-in. It is threaded through the service so a tenant's effective target — COALESCE(its own target_language, TRANSLATION_DEFAULT_LANGUAGE) — is resolved consistently at the enqueue gate, both backfills (global and per-tenant), and the SetTranslation write-guard. So an operator can turn translation on for every tenant with one env var; the guard persists a default-resolved write while still superseding a stale former-explicit-target write after a tenant clears its target; and clearing a tenant's target re-translates its existing records to the default.
Cache coherence. A settings write now evicts the tenant's cached settings (composed with the translation backfill listener), so a changed or newly enabled target is seen by the enqueue gate immediately instead of after the cache TTL.
Stale translation clear. An update to value_text/language now clears value_text_translated/translation_lang_key — but only on an actual change — so an edited record can't keep a stale translation; the row falls back to the original text and stays recoverable by a backfill.

On ENG-1254's original contract: the ticket sketched translation_enabled + target_language + env-default + explicit disabled-vs-default. We simplified to env-based enablement (TRANSLATION_PROVIDER + TRANSLATION_MODEL switch the whole deployment) + target_language + TRANSLATION_DEFAULT_LANGUAGE fallback — no per-tenant translation_enabled. Opt-in is "the tenant has a target" (or the deployment default), which removes a redundant flag and the ambiguous enabled-but-no-target state. ENG-1254 has been updated to this design.

How should this be tested?

Config: API_KEY=<key>, DATABASE_URL=postgres://postgres:postgres@localhost:5432/test_db?sslmode=disable. To exercise translation end to end, also set TRANSLATION_PROVIDER + TRANSLATION_MODEL (provider creds) and optionally TRANSLATION_DEFAULT_LANGUAGE.

Automated:

make init-db && make river-migrate (applies schema incl. 012_tenant_settings.sql)
make tests — integration tests in tests/: round-trip, defaults (200 not 404), locale normalization, tenant A↔B isolation, body-tenant_id rejected, invalid locale → 400 (PUT and PATCH), purge-removes-settings, 413 on an oversized body (PUT and PATCH, asserts the content_too_large code), and the RFC 7396 PATCH paths — value-merge preserves other keys, null removes a key (siblings kept, key deleted from the JSONB), an empty string is rejected (400), and PATCH creates the row for a brand-new tenant. Plus an Optional[T] tri-state unit test (absent/null/value).
Default-language fallback (added during review): config validation/canonicalization of TRANSLATION_DEFAULT_LANGUAGE (unit), the enqueue-gate fallback + per-tenant override (provider unit), the default-aware global and per-tenant backfills, and the write-guard — a default-resolved write persists, while a stale former-explicit-target write (after a tenant clears its target) is superseded (integration in tests/).
make tests-coverage — new code: service 100%, handler 100%, Optional 100%, repository file ~85%

Manual smoke (server on :8099):

curl -X PUT -H "Authorization: Bearer $API_KEY" -H 'Content-Type: application/json' \
  -d '{"target_language":"de-de"}' localhost:8099/v1/tenants/org-123/settings   # 200, de-DE
curl -X PATCH -H "Authorization: Bearer $API_KEY" -H 'Content-Type: application/merge-patch+json' \
  -d '{"target_language":"fr-fr"}' localhost:8099/v1/tenants/org-123/settings   # 200, fr-FR (merge)
curl -X PATCH -H "Authorization: Bearer $API_KEY" -H 'Content-Type: application/merge-patch+json' \
  -d '{"target_language":null}'   localhost:8099/v1/tenants/org-123/settings    # 200, {} (removed)
curl -H "Authorization: Bearer $API_KEY" localhost:8099/v1/tenants/unset/settings # 200, {}
curl localhost:8099/v1/tenants/org-123/settings                                   # 401 (no key)

Checklist

Required

Appreciated

If API changed: added or updated OpenAPI spec and ran contract tests (make tests)
If API behavior changed: added request/response examples to this PR
Updated docs in docs/ if changes were necessary — companion hub-api-docs PRs (stainless-sdks/hub-api-docs#8 tenant settings, stainless-sdks/hub-api-docs#9 translated feedback incl. the TRANSLATION_DEFAULT_LANGUAGE env); the API reference is auto-generated from the OpenAPI spec
Ran make tests-coverage for meaningful logic changes

Add a tenant-scoped settings store so language enrichment can be configured per tenant, starting with target_language (a normalized BCP-47 locale, e.g. en-US). First task in the language consolidation chain; unblocks per-tenant translated enrichment (ENG-1255). - migration 012: tenant_settings keyed by tenant_id (natural PK) with an open-ended settings JSONB, so new settings are added as struct fields without a schema migration - repository: tenant-scoped Get and a lock-serialized upsert (INSERT ... ON CONFLICT (tenant_id)) reusing the tenant write lock - service: GetSettings returns documented defaults when unset (never 404); target_language is validated/normalized via golang.org/x/text - HTTP: GET/PUT /v1/tenants/{tenant_id}/settings; tenant_id is path-only so a request can only ever address its own tenant - tenant data purge now also removes tenant_settings - OpenAPI spec, unit tests, and integration tests (round-trip, defaults, tenant isolation, purge cleanup) ENG-1254

Cap PUT /v1/tenants/{tenant_id}/settings request bodies at 8 KiB via http.MaxBytesReader; an oversized body is rejected with 413 before it is read into memory. Add a dedicated content_too_large problem code and type to the shared response layer so a 413 is distinguishable from a malformed-body 400 via the machine-readable `code` clients branch on, and document it in the OpenAPI ErrorModel enum and the PUT responses. - handler: 8 KiB MaxBytesReader -> 413; malformed JSON still -> 400 - response: CodeContentTooLarge / ProblemTypeContentTooLarge + 413 arms in codeForStatus/problemTypeForStatus (+ unit test) - openapi: 413 response on PUT settings; content_too_large in the ErrorModel code enum - tests: integration test asserts the 413 status and content_too_large code ENG-1254

github-actions · 2026-06-22T11:15:56Z

✱ Stainless preview builds

This PR will update the hub SDKs with the following commit message.

feat: add tenant settings for language enrichment

✅ hub-openapi studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅

✅ hub-typescript studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅ → build ✅ → lint ✅ → test ✅

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-06-26 19:18:07 UTC

coderabbitai · 2026-06-22T11:18:42Z

Walkthrough

This pull request introduces a tenant-scoped enrichment settings API. A new tenant_settings PostgreSQL table is added via a Goose migration, storing per-tenant settings as JSONB. Go model types (EnrichmentSettings, TenantSettings, UpdateTenantSettingsRequest, PatchTenantSettingsRequest) are defined, followed by TenantSettingsRepository (Get, Upsert, Patch with JSONB merge), TenantSettingsService (BCP-47 locale normalization via normalizeTargetLanguage), and TenantSettingsHandler (GET/PUT/PATCH with body size capping and unknown-field rejection). A new content_too_large RFC 9457 problem type and code handle HTTP 413. Routes are registered at /v1/tenants/{tenant_id}/settings. Tenant purge (DeleteByTenant) is extended to also delete tenant_settings rows. The OpenAPI spec documents all three operations and new schemas. Unit and integration tests cover the full stack.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding tenant-scoped settings support for language enrichment configuration.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description follows the template well, with scope, issue link, testing steps, examples, and checklist items filled in.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

Add PATCH /v1/tenants/{tenant_id}/settings for partial updates: only the fields present in the body change, omitted fields are left untouched, and an empty string clears a field. PUT remains a full replace. Implemented as an atomic top-level JSONB merge (settings = settings || patch) inside the existing shared tenant write lock, so concurrent writes don't race and settings the patch doesn't mention survive. Pointer fields in the request distinguish "absent" (leave unchanged) from "set" (including empty = clear). - repository: Patch (|| merge) + a shared writeSettings helper used by both Upsert and Patch - handler: Patch + a shared decodeSettingsBody helper (8KB cap, 413, unknown-field rejection) used by both Update and Patch - openapi: PATCH operation + PatchTenantSettingsInputBody schema - tests: service unit tests + integration tests, including a merge test proving a key the patch omits is preserved ENG-1254

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@migrations/012_tenant_settings.sql`:
- Line 1: The Goose migration annotations in this file are using capitalized
forms that do not match the project's coding guidelines. Change the `-- +goose
Up` annotation to `-- +goose up` (lowercase) and the `-- +goose Down` annotation
to `-- +goose down` (lowercase) to maintain consistency with the documented
style guidelines. Both annotations should use lowercase variants throughout the
migration file.

In `@tests/tenant_settings_test.go`:
- Around line 341-349: The TestTenantSettings_PatchBodyTooLargeRejected function
currently only asserts the HTTP status code but does not verify the RFC 9457
problem code in the response body, unlike the corresponding PUT test. After
asserting the StatusCode is http.StatusRequestEntityTooLarge, you need to read
and parse the resp.Body to extract the code field and assert it equals the
expected RFC 9457 code value (likely "content_too_large" to match the PUT test
behavior). This prevents regressions where PATCH returns the wrong problem code
despite having the correct status code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8f3c484b-e03a-4723-9807-cc8da50c5836

📥 Commits

Reviewing files that changed from the base of the PR and between 4141d58 and e54efb6.

📒 Files selected for processing (16)

cmd/api/app.go
cmd/api/app_test.go
internal/api/handlers/tenant_settings_handler.go
internal/api/response/problem.go
internal/api/response/response_test.go
internal/models/tenant_settings.go
internal/repository/tenant_data_repository.go
internal/repository/tenant_data_repository_test.go
internal/repository/tenant_settings_repository.go
internal/service/tenant_settings_service.go
internal/service/tenant_settings_service_test.go
migrations/012_tenant_settings.sql
openapi.yaml
tests/helpers.go
tests/integration_test.go
tests/tenant_settings_test.go

Add an integration test for the PATCH insert branch (a tenant with no settings row yet): PATCH creates the row with the patched field and a subsequent GET returns it. The other PATCH tests all seed a row first, so this is the only coverage of the advertised create-on-new-tenant path. ENG-1254

Extend TestTenantSettings_InvalidLocaleRejected to drive both PUT and PATCH. Both verbs normalize target_language through the same path, so both must reject an invalid locale with 400; previously only PUT was exercised, leaving the PATCH handler's error-response branch (77.8%, below the 80% bar) untested. The PATCH handler is now fully covered. ENG-1254

AGENTS.md documents `-- +goose up` / `-- +goose down` (lowercase) as the migration annotation style; 012 used the capitalized form. goose treats the keywords case-insensitively, so this is style-only — no re-run, the applied schema is unchanged. ENG-1254

The PATCH oversized-body test only checked the 413 status, while the PUT test also asserts the RFC 9457 `content_too_large` problem code. Mirror it so a PATCH returning the wrong code under a correct 413 is caught. ENG-1254

CleanupTestData has no callers anywhere in the repo (there is no TestMain, and setupTestServer's cleanup never invokes it), so the DELETE FROM tenant_settings line it was extended with never ran — dead code giving false confidence of cleanup. The settings tests isolate via UUID-unique tenant ids (testTenantID), like the rest of the suite, so no cleanup is needed. Reverts tests/helpers.go to its prior state. ENG-1254

PATCH now follows JSON Merge Patch (RFC 7396): a member with a value sets that setting, an explicit JSON null removes the key, and an omitted member is left unchanged. Previously an empty string cleared a field and there was no way to remove a key (the JSONB `||` merge could only add or overwrite). - models: add a small generic Optional[T] that distinguishes absent vs null vs value when decoding a merge-patch body; PatchTenantSettingsRequest uses it instead of *string. - service: translate the typed request into keys-to-set and keys-to-remove; reject an explicit empty string (clear via null), keeping the value contract "a valid BCP-47 locale, or null to remove". - repository: Patch applies (settings || set) - removeKeys::text[], so null members delete keys while unmentioned keys survive; writeSettings takes the extra bind. Upsert (full replace) is unchanged. - openapi: declare application/merge-patch+json (application/json still accepted), make target_language nullable, document the null-removes rule. - tests: Optional tri-state unit test; service set/remove/empty-rejected cases; integration PatchNullRemovesField (key deleted, siblings kept) and PatchEmptyStringRejected. ENG-1254

Spectral 6.16.0 (nimma) crashes with "Cannot read properties of null (reading 'enum')" when a lint rule traverses a literal null leaf in an example value — which the merge-patch `remove` example introduced (target_language: null). Drop the structured null example and show the null-removes form in the operation description instead. SDK codegen still learns nullability from the schema (target_language type: [string, "null"]), so the contract is unchanged. ENG-1254

Switching the PATCH field to Optional[string] dropped the struct-tag validation (no_null_bytes, max=35) — go-playground's validator can't reach through the custom type — so PATCH accepted null bytes and over-long values that PUT and the OpenAPI maxLength:35 still reject. Enforce the same bounds in the PATCH value path (normalizeProvidedTargetLanguage), restoring PUT/PATCH parity and honoring the contract. Also pin settingKeyTargetLanguage to the EnrichmentSettings json tag with a unit test, so a tag rename can't silently break PATCH null-removal. ENG-1254

ENG-1255 (persistence): scaffold language-enrichment storage. Adds the value_text_translated + translation_lang_key columns (migration 013), surfaces them as read-only fields on FeedbackRecord and the OpenAPI output schema, includes them in every feedback-record read path, and adds a tenant-write-locked SetTranslation repo method that does not publish a domain event (so writing a translation can't loop the enrichment pipeline). The async translation worker that populates them follows. ENG-1255

ENG-1255: add TranslationConfig (TRANSLATION_* env; translation stays disabled unless provider and model are set) and a per-process TTL+LRU tenant-settings cache — the cache deferred from ENG-1254. The cache wraps the settings accessor so the translation enqueue gate and worker can resolve a tenant's target language without a DB read per feedback event; staleness is TTL-bounded and self-corrects because the worker records the target it actually used. Registers a bounded "tenant_settings" cache metric label. TRANSLATION_BASE_URL is normalized/validated like EMBEDDING_BASE_URL. ENG-1255

Re-review follow-ups on the ENG-1255 foundation: - Unify the feedback-record column/scan duplication onto one feedbackRecordColumns const + the existing scanFeedbackRecord helper. The repo previously inlined 4 column lists + 4 scans while taxonomy_repository.go had its own scanner; that scanner is now the single one (extended to the two translation columns) and the taxonomy node-records query selects them too. Collapses 8 sites to 2 so column and scan order cannot drift (a silent runtime scan error otherwise). - config: default TENANT_SETTINGS_CACHE_TTL in applyDefaults too, mirroring the other nested DurationSec defaults, so a dropped env-default tag can't silently disable the cache; disable explicitly via TENANT_SETTINGS_CACHE_SIZE=0. - observability: cache metric descriptions now list tenant_settings as a label. - tests: add a SetTranslation integration test (persist, clear, NotFound) — the highest-risk new write path, covering the value/NULL round-trip. ENG-1255

ENG-1255: the LLM seam for translation enrichment. Adds a TranslationClient interface (Translate(TranslateRequest) -> text) and a provider-agnostic factory mirroring the embedding factory — openai / google / google-gemini, explicit TRANSLATION_PROVIDER (no default), validating API key, base URL, and Gemini project+location. The OpenAI and Google SDK wrappers gain a low-level Translate (chat completions / generate-content at temperature 0); a prompt adapter builds a Formbricks-style "professional translator" prompt using human-readable language names (x/text/language/display), falling back to "original language" when the source is unknown. Unit tests cover config validation, the language-name helper, and prompt rendering. ENG-1255

ENG-1255: the enqueue side of translation enrichment, mirroring the embedding provider. Adds FeedbackTranslationArgs (River job, unique by record + target + value_text hash) and TranslationProvider — on a feedback-record create (non-empty open text) or an update that changed value_text, it enqueues a job, but only for text fields with non-empty value_text whose tenant has a target language configured (read via the settings cache). Failures are logged and swallowed so record ingestion is never blocked. Consolidates the two identical per-feature River inserter interfaces into one shared RiverJobInserter, and reuses TenantSettingsReader for the target lookup (the iface linter flagged the duplicates). Unit tests cover enqueue eligibility, target gating, update-changed-fields, and the error/skip paths. ENG-1255

ENG-1255: the worker side. FeedbackTranslationWorker loads the record, translates value_text into the job's target language via the TranslationClient — or copies value_text verbatim when the source language already shares the target's base language (no LLM call) — and persists it through FeedbackRecordsService.SetTranslation. Mirrors the embedding worker's error handling: a missing record completes the job, a tenant write conflict retries, a provider error retries then fails on the final attempt; value_text that became empty since enqueue is skipped. Registered in the River wiring, gated on a configured TranslationClient. Adds FeedbackRecordsService.SetTranslation (+ the repository interface method) and 8 worker unit tests (translate, source==target copy, skip-empty, not-found, provider retry/fail, tenant-write-conflict, record-gone-on-write). ENG-1255

Address review findings on the translation pipeline: - Clear-on-empty was unwired: editing value_text to empty left a stale value_text_translated forever. The provider now enqueues on an update that empties value_text (mirroring the embedding provider) and the worker clears the translation (SetTranslation nil) instead of skipping; the repository nulls both translation columns when the translation is nil. - The source==target short-circuit compared only the base language, so it would copy zh-Hans text as a zh-Hant "translation". It now compares base AND script (sameLanguageAndScript): different scripts translate, regional variants (en-US/en-GB) still copy. - Update the openai/googleai package docs to mention chat/generate-content. Adds provider (update-to-empty enqueues a clear) and worker (clears on empty, translates across scripts) tests. ENG-1255

Address the full-feature review: - Undetermined source ("und") was coerced to a guessed base by likely-subtags, so the source==target short-circuit copied text untranslated. Guard against language.Und in sameLanguageAndScript so an undetermined tag always translates. - A source-language correction never re-translated: the provider only re-enqueued on value_text changes and the dedup hash ignored the source language. The provider now also enqueues on a language change, and the dedup hash folds in the source language (translationContentHash), so a correction produces a fresh job. - Drop the stray "backfill" mention from the job-args doc (no translation backfill exists) and document the TRANSLATION_* and TENANT_SETTINGS_CACHE_* env vars in .env.example, mirroring EMBEDDING_*. Tests: provider re-enqueues on a language change + the content hash varies by source language; worker translates (not copies) an "und" source. ENG-1255

ENG-1255: activate the translation pipeline, gated on TRANSLATION_PROVIDER + TRANSLATION_MODEL (disabled otherwise), mirroring the embedding wiring. - cmd/api: register the FeedbackTranslationWorker + translations queue (so the combined api process also translates, like embeddings) and register the TranslationProvider with the message manager, resolving the tenant target language through a short-TTL CachedTenantSettings over tenant settings. - cmd/worker: build the TranslationClient and populate RiverDeps so the dedicated worker process translates jobs. End to end: the provider enqueues on a feedback-record create/update for a text field whose tenant has a target language; the worker translates value_text (or copies when source==target) and persists it. ENG-1255

ENG-1255: re-translate existing records when translation is first enabled or a tenant changes its target language. - repository.ListTranslationBackfillTargets joins tenant_settings to find text records with non-empty value_text whose tenant has a target language and whose stored translation_lang_key differs from it (never translated, or stale). - FeedbackRecordsService.BackfillTranslations enqueues a "backfill" job per target; the inserter/queue/attempts are caller-provided (a one-off command), so the shared service keeps no backfill-only dependency. - cmd/backfill-translations wires the client + a River producer and runs it, mirroring cmd/backfill-embeddings. Gated on TRANSLATION_PROVIDER+MODEL. Tests: service unit (enqueues one job per target; repo error propagates) and a Postgres integration test for the backfill query (untranslated/stale included, already-current and no-target excluded). ENG-1255

Drive FeedbackTranslationWorker end to end against Postgres with a fake TranslationClient: translate + persist (source value_text preserved), copy verbatim when source base+script matches the target (no provider call), and clear a stale translation when value_text is empty. Complements the repo-level SetTranslation and backfill-query integration tests. Also clarify the cmd/api comment: the API registers the translation worker only to satisfy River's insert-time validation; jobs are processed by hub-worker, not in the API process (mirrors the embedding wiring). ENG-1255

Mirror the embedding pipeline's metrics on the translation provider and worker so the new enrichment path is observable in production: hub_translation_jobs_enqueued_total hub_translation_provider_errors_total{reason} hub_translation_outcomes_total{status} hub_translation_worker_errors_total{reason} hub_translation_duration_seconds{status} TranslationMetrics mirrors EmbeddingMetrics (nil when metrics are disabled) and is wired through NewMetrics, RiverDeps, cmd/api, and cmd/worker. The provider records enqueue counts plus settings-read/enqueue errors; the worker records outcome + duration + worker errors at every branch (success, skip, clear, retry, failed_final). Backfill stays producer-only (nil metrics). Bounded label sets gate every reason/status to "other". Unlike the embedding worker — which emits "tenant_write_conflict" without listing it, so it buckets to "other" — the translation worker's reason set includes it, so the conflict metric is labeled accurately. Covered by observability metric tests (real SDK manual reader), provider/worker recording tests, and the existing pipeline tests. ENG-1255

…y guard Worker: a not-found GetFeedbackRecord is a benign delete/purge race, not a terminal failure. Record it as "skipped" (consistent with the not-found-on-write path) instead of "failed_final" plus a worker error, so it no longer trips failure alerts. Config: honor an explicit TENANT_SETTINGS_CACHE_SIZE=0 to disable the cache (the documented behavior; NewCachedTenantSettings already treats size <= 0 as "no caching"). The previous `<= 0` default reset 0 to 2048, so disable could never take effect — now the size is defaulted only when the env var is unset. Service: SetTranslation rejects a (translated, "") pair via ErrTranslationLangKeyRequired so a translation can never persist without the locale it was produced in; clearing (nil translation) still passes through. Also: clear-path integration tests now assert translation_lang_key is nulled too, and the NewMetrics doc comment lists CacheMetrics. ENG-1255

Changing a tenant's target_language previously had no effect on existing feedback records — only newly created/edited records were translated, and refreshing the backlog required the global backfill CLI. A settings change now automatically triggers a per-tenant re-translation backfill. Design (clean separation; generalizes to future enrichment settings): - TenantSettingsService gains a translation-free SettingsChangeListener port and fires it after a successful write with the keys that changed (PUT: all settable keys; PATCH: keys present, incl. a null removal; skipped on a no-op PATCH). A listener issue never fails the settings write. - EnrichmentSettingsListener (adapter) maps a changed key to the enrichment backfill it triggers via a config-built handler map: target_language enqueues a durable TenantTranslationBackfillArgs job. Not routed through the webhook-coupled, lossy event bus. Adding a future setting (e.g. sentiment_enabled) is one map entry. - TenantTranslationBackfillWorker keyset-paginates the tenant's stale records and enqueues the existing per-record FeedbackTranslationArgs jobs, off the request path. Unique by TenantID (one in-flight backfill per tenant); crash-safe via the idempotent "translation_lang_key IS DISTINCT FROM target_language" query. - Repo ListTranslationBackfillTargetsForTenant (keyset, shares SQL with the global query) + service BackfillTranslationsForTenant. Clearing target_language leaves existing translations in place (the query's non-empty guard makes the triggered backfill a no-op). The global backfill CLI remains the guaranteed recovery path. ENG-1255

BackfillTranslations materialized every eligible target across all tenants in one slice; on a large deployment that is a memory spike when the CLI runs. It now keyset- paginates (id > cursor LIMIT n) like the per-tenant backfill, with the shared loop factored into backfillTranslationsPaged so both paths stream identically. ENG-1255

A provider 429 was treated as a generic error, so the translation worker burned through River's retry attempts on the fast backoff and dropped the work as failed_final, ignoring the provider's Retry-After / RetryInfo hint. The openai and google clients now classify a 429 (RESOURCE_EXHAUSTED) as a shared huberrors.RateLimitError carrying the provider's retry-after hint, and the worker snoozes for that delay (river.JobSnooze) instead of failing. Snoozing re-queues without consuming an attempt, so a burst against a rate-limited model defers rather than drops work. The delay is clamped (5s-5min, default 30s) and a per-job window (1h) bounds indefinite snoozing against a standing quota; past it the job fails normally and a backfill recovers it. A rate_limited worker-error metric records the deferral.

…tion A feedback translation job carries the tenant's target language captured at enqueue time. If the target changed (or was read from a stale settings cache), an out-of-order older job could finish after a newer-target job and overwrite value_text_translated / translation_lang_key with the stale target. SetTranslation now persists only while the tenant's current target_language still matches the job's target (atomic UPDATE ... FROM tenant_settings); a stale write matches no row and returns huberrors.ErrTranslationSuperseded, which the worker records as a benign skip. The clear path stays unconditional. Adds a worker unit test, an end-to-end worker/repo regression test, and a distinct "superseded" worker-error metric label.

…ck-records feat: enrich feedback records with translated text

…anslation on edit Addresses two review findings on the tenant-settings / translation enqueue path: - Settings cache staleness: CachedTenantSettings now implements SettingsChangeListener and is composed with the backfill listener, so a settings write evicts the tenant's cached entry. A freshly enabled/changed target is visible to the enqueue gate immediately instead of after TTL, so records created in the former staleness window are no longer skipped. (Eviction is per-process; cross-replica stays TTL-bounded.) - Stale translation after a content edit: an Update that changes value_text or language now clears value_text_translated / translation_lang_key, but only when the value actually changes (CASE ... IS DISTINCT FROM the pre-update column). The row falls back to the original and becomes a backfill target, while an unchanged re-send keeps the valid translation (avoiding a deduped re-translation stranding it). Adds buildUpdateQuery + cache-eviction unit tests and an Update integration test.

Tenants with no target_language of their own fall back to a deployment-wide default (TRANSLATION_DEFAULT_LANGUAGE); an empty default keeps translation per-tenant opt-in. The fallback is applied at the enqueue gate and the global backfill. The SetTranslation write guard now also accepts a default-resolved write for a tenant with no stored target, while still rejecting stale writes for tenants that set an explicit target.

The SetTranslation write-guard and the settings-change (per-tenant) backfill now resolve the tenant's effective target as COALESCE(its own target_language, TRANSLATION_DEFAULT_LANGUAGE), threaded through the service. A stale job carrying a tenant's former explicit target no longer writes once the tenant falls back to the default, and clearing a target re-translates existing records to the default instead of no-opping.

…settings-hub-language-enrichment

xernobyl added 2 commits June 19, 2026 17:39

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread migrations/012_tenant_settings.sql Outdated

Comment thread tests/tenant_settings_test.go

xernobyl added 5 commits June 22, 2026 13:55

test: assert content_too_large code on oversized PATCH body

28b5293

The PATCH oversized-body test only checked the 413 status, while the PUT test also asserts the RFC 9457 `content_too_large` problem code. Mirror it so a PATCH returning the wrong code under a correct 413 is caught. ENG-1254

xernobyl marked this pull request as draft June 22, 2026 14:28

xernobyl added 2 commits June 22, 2026 15:21

xernobyl requested a review from BhagyaAmarasinghe June 22, 2026 15:53

xernobyl marked this pull request as ready for review June 22, 2026 15:54

xernobyl added 13 commits June 22, 2026 16:04

xernobyl mentioned this pull request Jun 23, 2026

feat: enrich feedback records with translated text #90

Merged

14 tasks

xernobyl marked this pull request as draft June 23, 2026 14:36

xernobyl added 3 commits June 23, 2026 15:24

xernobyl marked this pull request as ready for review June 23, 2026 17:20

xernobyl and others added 2 commits June 24, 2026 15:32

Merge pull request #90 from formbricks/feat/ENG-1255_translate-feedba…

4b0575d

…ck-records feat: enrich feedback records with translated text

BhagyaAmarasinghe requested changes Jun 26, 2026

View reviewed changes

Comment thread internal/service/tenant_settings_service.go

Comment thread internal/repository/feedback_records_repository.go Outdated

Comment thread internal/models/tenant_settings.go

xernobyl added 2 commits June 26, 2026 13:56

xernobyl requested a review from BhagyaAmarasinghe June 26, 2026 15:12

BhagyaAmarasinghe reviewed Jun 26, 2026

View reviewed changes

Comment thread internal/models/tenant_settings.go

Comment thread internal/repository/feedback_records_repository.go

Comment thread internal/repository/feedback_records_repository.go Outdated

xernobyl added 2 commits June 26, 2026 16:02

Merge remote-tracking branch 'origin/main' into feat/ENG-1254_tenant-…

2b7d5bc

…settings-hub-language-enrichment

xernobyl requested a review from BhagyaAmarasinghe June 26, 2026 16:29

xernobyl enabled auto-merge June 26, 2026 16:36

BhagyaAmarasinghe approved these changes Jun 26, 2026

View reviewed changes

xernobyl added this pull request to the merge queue Jun 26, 2026

Merged via the queue into main with commit 0f4e36f Jun 26, 2026
12 checks passed

xernobyl deleted the feat/ENG-1254_tenant-settings-hub-language-enrichment branch June 26, 2026 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add tenant settings for language enrichment#89

feat: add tenant settings for language enrichment#89
xernobyl merged 33 commits into
mainfrom
feat/ENG-1254_tenant-settings-hub-language-enrichment

xernobyl commented Jun 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xernobyl commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Added during review

How should this be tested?

Checklist

Required

Appreciated

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xernobyl commented Jun 22, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading