Skip to content

Commit ff0e2d8

Browse files
updated docs
1 parent 4c2e055 commit ff0e2d8

1 file changed

Lines changed: 42 additions & 0 deletions

File tree

.design_docs/CAPABILITY_MATCHING_DESIGN.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,14 @@ compatibility rules:
9292
exactly one range per timestamp; you can't merge overlapping windows).
9393
- **Spatial-only** (`data_range_ms = None`): any window is compatible.
9494

95+
`QueryRequirements` intentionally does **not** carry start/end timestamps. Capability matching is
96+
about whether a config can serve a query's *shape* (how much historical data it needs), not *when*
97+
that data was recorded. The actual timestamps are computed separately in
98+
`calculate_query_timestamps_promql` / `calculate_query_timestamps_sql` and placed into
99+
`StoreQueryParams` after the aggregation has been selected. This keeps the two concerns — "can
100+
this config serve this query type?" and "fetch this time window from the store" — cleanly
101+
separated.
102+
95103
### Q9: Where does the capability matching logic live?
96104
**Decision**: `sketch_db_common` (the shared crate). Rationale: this logic is pure — it takes a
97105
map of `AggregationConfig` values and a `QueryRequirements` and produces an `AggregationIdInfo`.
@@ -116,6 +124,40 @@ semantically incorrect results.
116124

117125
---
118126

127+
## Known Limitations
128+
129+
### Cleanup policy is not considered
130+
`CleanupPolicy` (`CircularBuffer`, `ReadBased`, `NoCleanup`) and `num_aggregates_to_retain` live
131+
on `InferenceConfig` / `AggregationReference` — not on `AggregationConfig`. Capability matching
132+
only inspects `AggregationConfig`, so it has no visibility into how many historical windows a given
133+
aggregation is actually retaining.
134+
135+
**Practical consequence**: if a `CircularBuffer` aggregation retains only N windows but a query
136+
needs more, capability matching will still route to it. The failure surfaces at query execution
137+
time (the store returns insufficient data), not at routing time.
138+
139+
The `query_configs` path handles this correctly because `num_aggregates_to_retain` is set
140+
explicitly per query via `AggregationReference`, giving operators direct control. Capability
141+
matching has no equivalent mechanism today.
142+
143+
**Future mitigation**: add `data_range_ms` coverage check — verify that the store actually holds
144+
at least `ceil(data_range_ms / window_size_ms)` recent windows for the selected aggregation before
145+
committing to it.
146+
147+
### Label compatibility is strictly exact
148+
A config grouped by `{job, instance}` does **not** match a query grouping by `{job}` only, even
149+
though label collapsing is mathematically valid for simple accumulators. This is conservative: for
150+
sketch types (KLL, CountMin) label collapsing is not well-defined. See the TODO comment in
151+
`labels_compatible` in `capability_matching.rs` for the planned relaxation.
152+
153+
### No structured rejection reasons
154+
When no match is found, `find_compatible_aggregation` returns `None` without explaining which
155+
candidates were considered and why each was rejected. Debug-level logs record per-candidate
156+
rejections, but there is no structured error type. This makes diagnosing misconfigurations harder
157+
— see "Rich rejection errors" in the Rejected section below.
158+
159+
---
160+
119161
## What Was Rejected
120162

121163
### "Translate PromQL to SQL and execute via DataFusion SQL engine"

0 commit comments

Comments
 (0)