TSI adaptive cache size

**ETA:** InfluxDB 1.13.0, early August 2026
**PR:** [feat: add statistics and adaptive growth to TagValueSeriesIDCache](https://github.com/influxdata/influxdb/pull/27480)

==============================================
Adaptive sizing for the TSI tag-value series-ID cache
Commit e1f7dcc265 (PR #27480)
================================================================

WHAT THIS CACHE IS

The TSI index keeps an in-memory LRU cache mapping a
{measurement, tag key, tag value} tuple to its set of series IDs.
It lets queries that repeatedly filter on the same tag predicates
skip merging many on-disk bitmaps. The cache lives per TSI index
partition. Before this change it had a single fixed capacity
(series-id-set-cache-size, default 100 entries).

This change adds two things:

  1. Statistics. The cache now reports hit/miss/eviction/size/
     capacity counters so you can see whether it is helping.
  2. Adaptive sizing. The cache can grow and shrink itself
     between an operator-set floor and ceiling, instead of
     staying at one fixed size.

Default behavior is UNCHANGED. Adaptive sizing is off unless you
explicitly turn it on.

----------------------------------------------------------------
CONFIGURATION PARAMETERS  ([data] section)
----------------------------------------------------------------

series-id-set-cache-size           (existing; default 100)
  The fixed capacity when adaptive sizing is off. When adaptive
  sizing is on, this becomes the FLOOR: the starting capacity and
  the smallest size the cache will ever shrink back to. 0 disables
  the cache entirely. Must be > 0 to use adaptive sizing.

series-id-set-cache-max-size       (new; default 0 = off)
  The CEILING for adaptive growth. The cache will never grow past
  this many entries. Must be > series-id-set-cache-size. 0 means
  adaptive sizing is disabled.

series-id-set-cache-target-hit-rate  (new; default 0.0 = off)
  The hit rate you want the cache to achieve, as a fraction in
  the open interval (0.0, 1.0). The cache grows toward max-size
  while its measured hit rate is below this target. 0.0 disables
  adaptive sizing. 1.0 is rejected (unachievable — the cache would
  never stop trying to grow).

series-id-set-cache-shrink-conservatism  (new; default 2.5)
  How reluctant the shrink logic is to give memory back, measured
  in standard deviations. Range [0.0, +Inf). Higher = more
  conservative = holds onto memory longer and resists rapid
  grow/shrink oscillation. Only consulted when adaptive sizing is
  on. The default of 2.5 is deliberately conservative; see below.

ENABLING / DISABLING

Adaptive sizing turns on only when BOTH max-size and
target-hit-rate are set to non-zero values. Set both to 0 (the
default) to keep the old fixed-capacity behavior. Setting exactly
one of them is a configuration error and the server will refuse
to start.

Other validation, all checked at startup:
  - max-size must be >= 0
  - target-hit-rate must be in [0.0, 1.0)
  - if adaptive: size must be > 0 and max-size > size
  - shrink-conservatism must be a finite value >= 0.0

Note on robustness: if the cache constructor itself is handed
bad values, it logs each problem and falls back to a working
fixed-size cache rather than crashing. A bad config is still
caught earlier, at startup validation.

----------------------------------------------------------------
WHEN THE CACHE GROWS
----------------------------------------------------------------

Growth is driven by eviction pressure (the write/insert path).

  - Every time the cache is full and a new entry forces out the
    least-recently-used entry, that is a "forced eviction."
  - After roughly one full turnover (about `capacity` forced
    evictions), the cache samples its hit rate over that window.
  - It DOUBLES its capacity (clamped to max-size) when ALL of:
      * the window saw at least 100 Gets (a noise floor, so a
        tiny or write-only window does not trigger growth),
      * the windowed hit rate is below target-hit-rate,
      * capacity is still below max-size.

So growth happens only when the cache is actively churning AND
not meeting your hit-rate target. A cache that is missing its
target but never evicting (because it is not full) will not grow.

----------------------------------------------------------------
WHEN THE CACHE SHRINKS
----------------------------------------------------------------

Shrinking is driven by the read path (Gets), so it can fire even
when the cache is quiet and nothing is being evicted.

After a self-tuning observation window of Gets (long enough that
an entry left untouched is genuinely cold, not merely unsampled —
roughly 3x the occupancy at target 0.95, scaling with target),
the cache may shrink if BOTH gates pass:

  - Quiet gate: forced evictions during the window are below a
    statistical threshold derived from the target hit rate and
    shrink-conservatism (it is performing at least as well as a
    cache at target should).
  - Hit-rate gate: the windowed hit rate is at or above target.

When both pass, it shrinks in one of two ways:

  - Slack reclaim: if capacity is above current occupancy, it
    simply drops the unused headroom down to occupancy.
  - Cold-tail trim: if the cache is full, it trims toward the
    observed working set ("warm" entries touched this window),
    shedding only least-recently-used entries that went untouched.

Shrinking never goes below series-id-set-cache-size (the floor).
Each shrink event is bounded (at most half the cache, and at most
1024 entries per event) so the write lock is not held too long;
further decay continues over later windows.

ANTI-OSCILLATION

After any resize (grow or shrink) there is a cooldown — sized to
a full observation window of the new capacity — during which the
cache will not shrink again. This, plus the conservatism margin,
prevents the cache from flapping between sizes.

----------------------------------------------------------------
STATISTICS  (SHOW STATS, measurement "tsi1_cache")
----------------------------------------------------------------

  hit              cumulative cache hits
  miss             cumulative cache misses
  eviction         forced evictions (entries pushed out under
                   write pressure) — the "cache is under pressure"
                   signal that drives growth
  shrink_eviction  entries voluntarily released by the shrink
                   policy — the "cache is giving memory back"
                   signal
  size             current number of entries held
  capacity         current capacity (the live limit; with
                   adaptive sizing this moves between
                   series-id-set-cache-size and -max-size)

eviction and shrink_eviction are reported separately so you can
tell pressure-driven turnover apart from voluntary reclaim.

Capacity changes are also logged at INFO level: "tsi cache
capacity increased" / "tsi cache capacity decreased", with the
old and new capacity, the measured hit rate, the target, and how
many entries were evicted.

----------------------------------------------------------------
PICKING VALUES
----------------------------------------------------------------

target-hit-rate: a value in the 0.85-0.95 range is a reasonable
starting point. 0.95 is the design point the windowing math is
tuned around. Trade-offs:
  - Too high (near 1.0): the cache almost never meets target, so
    it tends to grow straight to max-size and stay there. It also
    lengthens the observation windows (slower to react). 1.0 is
    rejected outright.
  - Too low: the cache rarely grows, so you give up the benefit.
I cannot tell you the optimal value for your workload; it depends
on your query mix. Start around 0.9, then watch the hit/miss and
capacity stats and adjust.

max-size: set it to the largest cache you are willing to give
memory to. A few multiples of series-id-set-cache-size (for
example 5x-10x) is a sensible first try. The cache only reaches
max-size if your working set of tag predicates actually needs it;
growth is demand-driven.

size (the floor): keep the existing default (100) unless you
already know you need more; adaptive sizing will grow it as
needed and shrink it back to this floor when demand falls.

shrink-conservatism: leave at 2.5 unless you observe a specific
problem. Lower it (toward 0.0) if the cache holds memory longer
than you want after a workload spike subsides; 0.0 lets it shrink
as soon as it is performing at target. Raise it if you see the
cache oscillating on a skewed/bursty workload. The default is
conservative on purpose: the underlying statistical model assumes
independent cache misses, but real access is correlated (locality
and skew), so the true variation is wider than the model — the
extra margin trades a little memory for fewer false shrinks.

WHAT TO EXPECT WHEN IT IS WORKING

With adaptive sizing on, capacity should settle somewhere between
your floor and ceiling that matches the active working set, the
hit rate should track toward your target, and heap used by this
cache should follow the current working set rather than its
historical peak. This trades higher memory use for fewer cache
misses on workloads whose set of tag predicates exceeds the fixed
series-id-set-cache-size.


##### Relevant URLs
- [Add stats for TagValueSeriesIDCache and adaptive cache growth](https://github.com/influxdata/influxdb/issues/27479#top)

InfluxDB v1.13.0 OSS & Enterprise



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TSI adaptive cache size #7312

==============================================
Adaptive sizing for the TSI tag-value series-ID cache
Commit e1f7dcc265 (PR #27480)

CONFIGURATION PARAMETERS ([data] section)

WHEN THE CACHE GROWS

WHEN THE CACHE SHRINKS

STATISTICS (SHOW STATS, measurement "tsi1_cache")

PICKING VALUES

Relevant URLs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

TSI adaptive cache size #7312

Description

============================================== Adaptive sizing for the TSI tag-value series-ID cache Commit e1f7dcc265 (PR #27480)

CONFIGURATION PARAMETERS ([data] section)

WHEN THE CACHE GROWS

WHEN THE CACHE SHRINKS

STATISTICS (SHOW STATS, measurement "tsi1_cache")

PICKING VALUES

Relevant URLs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

==============================================
Adaptive sizing for the TSI tag-value series-ID cache
Commit e1f7dcc265 (PR #27480)