DuckDB Query Condition Cache

query_condition_cache adds a predicate cache to DuckDB. It caches which row groups match a filter, so repeated queries can skip work and run much faster. It fits workloads like metrics monitoring dashboards, log investigation, and other recurring read paths where the same filters come up again and again.

Instead of re-evaluating a predicate across the whole table every time, it remembers which row-group vectors matched before and lets DuckDB skip vectors that are already known to be empty for that predicate.

Performance

On the HDFS_v2 log analytics benchmark (~71M log lines), query_condition_cache reached up to 314.3x faster repeated-query performance. The biggest gains showed up on selective HDFS log investigation queries, where the same predicate can be reused across runs.

Results depend on predicate selectivity, cache reuse, and the surrounding storage/cache state. Benchmark scripts and methodology live in [benchmark/README.md](benchmark/README.md) and [benchmark/run_hdfs_log_benchmark.py](benchmark/run_hdfs_log_benchmark.py).

Usage

FORCE INSTALL query_condition_cache FROM community;
LOAD 'query_condition_cache';

CREATE TABLE t AS
SELECT i AS id, i % 100 AS val, 'msg_' || (i % 10) AS msg
FROM range(500000) t(i);

You can either build the cache yourself for a known predicate, or let the optimizer do it on demand.

Automatic Cache Build And Apply

If you prefer not to manage cache entries manually, the optimizer path is enabled by default:

SET use_query_condition_cache = true;

-- Predicate cache is automatically populated.
SELECT count(*) FROM t WHERE val = 42;
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│         5000 │
└──────────────┘

SELECT * FROM condition_cache_info('t', 'val = 42');
┌───────────────────┬──────────────────┬────────────────────┬───────────────┐
│ cached_row_groups │ total_row_groups │ qualifying_vectors │ total_vectors │
│       int32       │      int32       │       int32        │     int32     │
├───────────────────┼──────────────────┼────────────────────┼───────────────┤
│                 5 │                5 │                245 │           245 │
└───────────────────┴──────────────────┴────────────────────┴───────────────┘

On a cache miss, the optimizer builds the cache inline for supported filtered table scans. On later queries, it injects a ROW_ID-backed filter so cached-empty vectors can be skipped. Setting use_query_condition_cache = false disables the optimizer path and clears cached entries.

Manual Cache Build

If you already know the filter you care about, build a cache entry directly for that table/predicate pair:

-- Build a cache entry for the table to the given predicate.
SELECT * FROM condition_cache_build('t', 'val = 42 AND msg LIKE ''%2''');
┌──────────────────────────────────────────────┐
│                    status                    │
│                   varchar                    │
├──────────────────────────────────────────────┤
│ Cache Built: 245/245 vectors, 5/5 row groups │
└──────────────────────────────────────────────┘


-- Query the cache entry.
SELECT * FROM condition_cache_info('t', 'val = 42 AND msg LIKE ''%2''');
┌───────────────────┬──────────────────┬────────────────────┬───────────────┐
│ cached_row_groups │ total_row_groups │ qualifying_vectors │ total_vectors │
│       int32       │      int32       │       int32        │     int32     │
├───────────────────┼──────────────────┼────────────────────┼───────────────┤
│                 5 │                5 │                245 │           245 │
└───────────────────┴──────────────────┴────────────────────┴───────────────┘

How It Works

Under the hood, the extension does five things:

Canonicalizes the predicate into a stable cache key
Scans only referenced columns plus ROW_ID
Records which vectors inside each row group contain qualifying rows
Stores entries in DuckDB's per-database ObjectCache
Treats uncached row groups as pass-through for correctness

Roadmap

Parquet predicate scan cache support

Development

GEN=ninja make
make test
./build/release/test/unittest --test-dir . "test/sql/condition_cache_auto.test"

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
benchmark		benchmark
docs		docs
duckdb @ 8a58519		duckdb @ 8a58519
extension-ci-tools @ ef15a2a		extension-ci-tools @ ef15a2a
scripts		scripts
src		src
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
extension_config.cmake		extension_config.cmake
pyproject.toml		pyproject.toml
uv.lock		uv.lock
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuckDB Query Condition Cache

Performance

Usage

Automatic Cache Build And Apply

Manual Cache Build

How It Works

Roadmap

Development

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DuckDB Query Condition Cache

Performance

Usage

Automatic Cache Build And Apply

Manual Cache Build

How It Works

Roadmap

Development

References

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages