ETA: InfluxDB 1.13.0, early August 2026
PR: feat: add statistics and adaptive growth to TagValueSeriesIDCache
==============================================
Adaptive sizing for the TSI tag-value series-ID cache
Commit e1f7dcc265 (PR #27480)
WHAT THIS CACHE IS
The TSI index keeps an in-memory LRU cache mapping a
{measurement, tag key, tag value} tuple to its set of series IDs.
It lets queries that repeatedly filter on the same tag predicates
skip merging many on-disk bitmaps. The cache lives per TSI index
partition. Before this change it had a single fixed capacity
(series-id-set-cache-size, default 100 entries).
This change adds two things:
- Statistics. The cache now reports hit/miss/eviction/size/
capacity counters so you can see whether it is helping.
- Adaptive sizing. The cache can grow and shrink itself
between an operator-set floor and ceiling, instead of
staying at one fixed size.
Default behavior is UNCHANGED. Adaptive sizing is off unless you
explicitly turn it on.
CONFIGURATION PARAMETERS ([data] section)
series-id-set-cache-size (existing; default 100)
The fixed capacity when adaptive sizing is off. When adaptive
sizing is on, this becomes the FLOOR: the starting capacity and
the smallest size the cache will ever shrink back to. 0 disables
the cache entirely. Must be > 0 to use adaptive sizing.
series-id-set-cache-max-size (new; default 0 = off)
The CEILING for adaptive growth. The cache will never grow past
this many entries. Must be > series-id-set-cache-size. 0 means
adaptive sizing is disabled.
series-id-set-cache-target-hit-rate (new; default 0.0 = off)
The hit rate you want the cache to achieve, as a fraction in
the open interval (0.0, 1.0). The cache grows toward max-size
while its measured hit rate is below this target. 0.0 disables
adaptive sizing. 1.0 is rejected (unachievable — the cache would
never stop trying to grow).
series-id-set-cache-shrink-conservatism (new; default 2.5)
How reluctant the shrink logic is to give memory back, measured
in standard deviations. Range [0.0, +Inf). Higher = more
conservative = holds onto memory longer and resists rapid
grow/shrink oscillation. Only consulted when adaptive sizing is
on. The default of 2.5 is deliberately conservative; see below.
ENABLING / DISABLING
Adaptive sizing turns on only when BOTH max-size and
target-hit-rate are set to non-zero values. Set both to 0 (the
default) to keep the old fixed-capacity behavior. Setting exactly
one of them is a configuration error and the server will refuse
to start.
Other validation, all checked at startup:
- max-size must be >= 0
- target-hit-rate must be in [0.0, 1.0)
- if adaptive: size must be > 0 and max-size > size
- shrink-conservatism must be a finite value >= 0.0
Note on robustness: if the cache constructor itself is handed
bad values, it logs each problem and falls back to a working
fixed-size cache rather than crashing. A bad config is still
caught earlier, at startup validation.
WHEN THE CACHE GROWS
Growth is driven by eviction pressure (the write/insert path).
- Every time the cache is full and a new entry forces out the
least-recently-used entry, that is a "forced eviction."
- After roughly one full turnover (about
capacity forced
evictions), the cache samples its hit rate over that window.
- It DOUBLES its capacity (clamped to max-size) when ALL of:
- the window saw at least 100 Gets (a noise floor, so a
tiny or write-only window does not trigger growth),
- the windowed hit rate is below target-hit-rate,
- capacity is still below max-size.
So growth happens only when the cache is actively churning AND
not meeting your hit-rate target. A cache that is missing its
target but never evicting (because it is not full) will not grow.
WHEN THE CACHE SHRINKS
Shrinking is driven by the read path (Gets), so it can fire even
when the cache is quiet and nothing is being evicted.
After a self-tuning observation window of Gets (long enough that
an entry left untouched is genuinely cold, not merely unsampled —
roughly 3x the occupancy at target 0.95, scaling with target),
the cache may shrink if BOTH gates pass:
- Quiet gate: forced evictions during the window are below a
statistical threshold derived from the target hit rate and
shrink-conservatism (it is performing at least as well as a
cache at target should).
- Hit-rate gate: the windowed hit rate is at or above target.
When both pass, it shrinks in one of two ways:
- Slack reclaim: if capacity is above current occupancy, it
simply drops the unused headroom down to occupancy.
- Cold-tail trim: if the cache is full, it trims toward the
observed working set ("warm" entries touched this window),
shedding only least-recently-used entries that went untouched.
Shrinking never goes below series-id-set-cache-size (the floor).
Each shrink event is bounded (at most half the cache, and at most
1024 entries per event) so the write lock is not held too long;
further decay continues over later windows.
ANTI-OSCILLATION
After any resize (grow or shrink) there is a cooldown — sized to
a full observation window of the new capacity — during which the
cache will not shrink again. This, plus the conservatism margin,
prevents the cache from flapping between sizes.
STATISTICS (SHOW STATS, measurement "tsi1_cache")
hit cumulative cache hits
miss cumulative cache misses
eviction forced evictions (entries pushed out under
write pressure) — the "cache is under pressure"
signal that drives growth
shrink_eviction entries voluntarily released by the shrink
policy — the "cache is giving memory back"
signal
size current number of entries held
capacity current capacity (the live limit; with
adaptive sizing this moves between
series-id-set-cache-size and -max-size)
eviction and shrink_eviction are reported separately so you can
tell pressure-driven turnover apart from voluntary reclaim.
Capacity changes are also logged at INFO level: "tsi cache
capacity increased" / "tsi cache capacity decreased", with the
old and new capacity, the measured hit rate, the target, and how
many entries were evicted.
PICKING VALUES
target-hit-rate: a value in the 0.85-0.95 range is a reasonable
starting point. 0.95 is the design point the windowing math is
tuned around. Trade-offs:
- Too high (near 1.0): the cache almost never meets target, so
it tends to grow straight to max-size and stay there. It also
lengthens the observation windows (slower to react). 1.0 is
rejected outright.
- Too low: the cache rarely grows, so you give up the benefit.
I cannot tell you the optimal value for your workload; it depends
on your query mix. Start around 0.9, then watch the hit/miss and
capacity stats and adjust.
max-size: set it to the largest cache you are willing to give
memory to. A few multiples of series-id-set-cache-size (for
example 5x-10x) is a sensible first try. The cache only reaches
max-size if your working set of tag predicates actually needs it;
growth is demand-driven.
size (the floor): keep the existing default (100) unless you
already know you need more; adaptive sizing will grow it as
needed and shrink it back to this floor when demand falls.
shrink-conservatism: leave at 2.5 unless you observe a specific
problem. Lower it (toward 0.0) if the cache holds memory longer
than you want after a workload spike subsides; 0.0 lets it shrink
as soon as it is performing at target. Raise it if you see the
cache oscillating on a skewed/bursty workload. The default is
conservative on purpose: the underlying statistical model assumes
independent cache misses, but real access is correlated (locality
and skew), so the true variation is wider than the model — the
extra margin trades a little memory for fewer false shrinks.
WHAT TO EXPECT WHEN IT IS WORKING
With adaptive sizing on, capacity should settle somewhere between
your floor and ceiling that matches the active working set, the
hit rate should track toward your target, and heap used by this
cache should follow the current working set rather than its
historical peak. This trades higher memory use for fewer cache
misses on workloads whose set of tag predicates exceeds the fixed
series-id-set-cache-size.
Relevant URLs
InfluxDB v1.13.0 OSS & Enterprise
ETA: InfluxDB 1.13.0, early August 2026
PR: feat: add statistics and adaptive growth to TagValueSeriesIDCache
==============================================
Adaptive sizing for the TSI tag-value series-ID cache
Commit e1f7dcc265 (PR #27480)
WHAT THIS CACHE IS
The TSI index keeps an in-memory LRU cache mapping a
{measurement, tag key, tag value} tuple to its set of series IDs.
It lets queries that repeatedly filter on the same tag predicates
skip merging many on-disk bitmaps. The cache lives per TSI index
partition. Before this change it had a single fixed capacity
(series-id-set-cache-size, default 100 entries).
This change adds two things:
capacity counters so you can see whether it is helping.
between an operator-set floor and ceiling, instead of
staying at one fixed size.
Default behavior is UNCHANGED. Adaptive sizing is off unless you
explicitly turn it on.
CONFIGURATION PARAMETERS ([data] section)
series-id-set-cache-size (existing; default 100)
The fixed capacity when adaptive sizing is off. When adaptive
sizing is on, this becomes the FLOOR: the starting capacity and
the smallest size the cache will ever shrink back to. 0 disables
the cache entirely. Must be > 0 to use adaptive sizing.
series-id-set-cache-max-size (new; default 0 = off)
The CEILING for adaptive growth. The cache will never grow past
this many entries. Must be > series-id-set-cache-size. 0 means
adaptive sizing is disabled.
series-id-set-cache-target-hit-rate (new; default 0.0 = off)
The hit rate you want the cache to achieve, as a fraction in
the open interval (0.0, 1.0). The cache grows toward max-size
while its measured hit rate is below this target. 0.0 disables
adaptive sizing. 1.0 is rejected (unachievable — the cache would
never stop trying to grow).
series-id-set-cache-shrink-conservatism (new; default 2.5)
How reluctant the shrink logic is to give memory back, measured
in standard deviations. Range [0.0, +Inf). Higher = more
conservative = holds onto memory longer and resists rapid
grow/shrink oscillation. Only consulted when adaptive sizing is
on. The default of 2.5 is deliberately conservative; see below.
ENABLING / DISABLING
Adaptive sizing turns on only when BOTH max-size and
target-hit-rate are set to non-zero values. Set both to 0 (the
default) to keep the old fixed-capacity behavior. Setting exactly
one of them is a configuration error and the server will refuse
to start.
Other validation, all checked at startup:
Note on robustness: if the cache constructor itself is handed
bad values, it logs each problem and falls back to a working
fixed-size cache rather than crashing. A bad config is still
caught earlier, at startup validation.
WHEN THE CACHE GROWS
Growth is driven by eviction pressure (the write/insert path).
least-recently-used entry, that is a "forced eviction."
capacityforcedevictions), the cache samples its hit rate over that window.
tiny or write-only window does not trigger growth),
So growth happens only when the cache is actively churning AND
not meeting your hit-rate target. A cache that is missing its
target but never evicting (because it is not full) will not grow.
WHEN THE CACHE SHRINKS
Shrinking is driven by the read path (Gets), so it can fire even
when the cache is quiet and nothing is being evicted.
After a self-tuning observation window of Gets (long enough that
an entry left untouched is genuinely cold, not merely unsampled —
roughly 3x the occupancy at target 0.95, scaling with target),
the cache may shrink if BOTH gates pass:
statistical threshold derived from the target hit rate and
shrink-conservatism (it is performing at least as well as a
cache at target should).
When both pass, it shrinks in one of two ways:
simply drops the unused headroom down to occupancy.
observed working set ("warm" entries touched this window),
shedding only least-recently-used entries that went untouched.
Shrinking never goes below series-id-set-cache-size (the floor).
Each shrink event is bounded (at most half the cache, and at most
1024 entries per event) so the write lock is not held too long;
further decay continues over later windows.
ANTI-OSCILLATION
After any resize (grow or shrink) there is a cooldown — sized to
a full observation window of the new capacity — during which the
cache will not shrink again. This, plus the conservatism margin,
prevents the cache from flapping between sizes.
STATISTICS (SHOW STATS, measurement "tsi1_cache")
hit cumulative cache hits
miss cumulative cache misses
eviction forced evictions (entries pushed out under
write pressure) — the "cache is under pressure"
signal that drives growth
shrink_eviction entries voluntarily released by the shrink
policy — the "cache is giving memory back"
signal
size current number of entries held
capacity current capacity (the live limit; with
adaptive sizing this moves between
series-id-set-cache-size and -max-size)
eviction and shrink_eviction are reported separately so you can
tell pressure-driven turnover apart from voluntary reclaim.
Capacity changes are also logged at INFO level: "tsi cache
capacity increased" / "tsi cache capacity decreased", with the
old and new capacity, the measured hit rate, the target, and how
many entries were evicted.
PICKING VALUES
target-hit-rate: a value in the 0.85-0.95 range is a reasonable
starting point. 0.95 is the design point the windowing math is
tuned around. Trade-offs:
it tends to grow straight to max-size and stay there. It also
lengthens the observation windows (slower to react). 1.0 is
rejected outright.
I cannot tell you the optimal value for your workload; it depends
on your query mix. Start around 0.9, then watch the hit/miss and
capacity stats and adjust.
max-size: set it to the largest cache you are willing to give
memory to. A few multiples of series-id-set-cache-size (for
example 5x-10x) is a sensible first try. The cache only reaches
max-size if your working set of tag predicates actually needs it;
growth is demand-driven.
size (the floor): keep the existing default (100) unless you
already know you need more; adaptive sizing will grow it as
needed and shrink it back to this floor when demand falls.
shrink-conservatism: leave at 2.5 unless you observe a specific
problem. Lower it (toward 0.0) if the cache holds memory longer
than you want after a workload spike subsides; 0.0 lets it shrink
as soon as it is performing at target. Raise it if you see the
cache oscillating on a skewed/bursty workload. The default is
conservative on purpose: the underlying statistical model assumes
independent cache misses, but real access is correlated (locality
and skew), so the true variation is wider than the model — the
extra margin trades a little memory for fewer false shrinks.
WHAT TO EXPECT WHEN IT IS WORKING
With adaptive sizing on, capacity should settle somewhere between
your floor and ceiling that matches the active working set, the
hit rate should track toward your target, and heap used by this
cache should follow the current working set rather than its
historical peak. This trades higher memory use for fewer cache
misses on workloads whose set of tag predicates exceeds the fixed
series-id-set-cache-size.
Relevant URLs
InfluxDB v1.13.0 OSS & Enterprise