Skip to content

feat(ha): Add per-tenant configurable failover timeout#7481

Open
yeya24 wants to merge 1 commit intocortexproject:masterfrom
yeya24:ha-failover-timeout-per-tenant
Open

feat(ha): Add per-tenant configurable failover timeout#7481
yeya24 wants to merge 1 commit intocortexproject:masterfrom
yeya24:ha-failover-timeout-per-tenant

Conversation

@yeya24
Copy link
Copy Markdown
Contributor

@yeya24 yeya24 commented May 6, 2026

Add a per-tenant runtime override for the HA tracker failover timeout via the ha_tracker_failover_timeout field in the limits config (flag: -distributor.ha-tracker.failover-timeout-override). When set to a non-zero value for a tenant, it overrides the global -distributor.ha-tracker.failover-timeout.

This allows operators to configure different failover timeouts for different tenants based on their HA setup requirements.

What this PR does:

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

@yeya24 yeya24 force-pushed the ha-failover-timeout-per-tenant branch 2 times, most recently from c202aa9 to 5ce8f5f Compare May 6, 2026 04:43
@SungJin1212
Copy link
Copy Markdown
Member

Why don't you extend the existing distributor.ha-tracker.failover-timeout flag?

@yeya24
Copy link
Copy Markdown
Contributor Author

yeya24 commented May 7, 2026

Let me rename the new flag to use the same name as distributor.ha-tracker.failover-timeout.

@yeya24 yeya24 force-pushed the ha-failover-timeout-per-tenant branch from 5ce8f5f to 7f44369 Compare May 7, 2026 04:36
@pull-request-size pull-request-size Bot added size/L and removed size/M labels May 7, 2026
@yeya24 yeya24 force-pushed the ha-failover-timeout-per-tenant branch from 7f44369 to 067b6fb Compare May 7, 2026 05:10
Move -distributor.ha-tracker.failover-timeout from HATrackerConfig (global)
to the per-tenant Limits struct. The flag name and default value (30s)
remain the same, but it can now be overridden per-tenant via runtime config:

  overrides:
    "tenant-1":
      ha_tracker_failover_timeout: 60s

Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 force-pushed the ha-failover-timeout-per-tenant branch from 067b6fb to af59017 Compare May 7, 2026 05:20
@yeya24
Copy link
Copy Markdown
Contributor Author

yeya24 commented May 7, 2026

Updated to reuse the same config name but make it per tenant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants