Summary
Extend SQL top-k support so SUM(col) … GROUP BY … ORDER BY <alias> DESC LIMIT k is detected and served by CountMinSketchWithHeap, using the same resolution paths as existing COUNT(col) top-k queries. Add capability-matching logic to distinguish count-weighted vs sum-weighted heap configs via count_events.
Background
SQL top-k is already supported for:
SELECT srcip, COUNT(pkt_len) AS transfer_events
FROM netflow_table
WHERE time BETWEEN … AND …
GROUP BY srcip
ORDER BY transfer_events DESC
LIMIT 10
This path:
- Detects the pattern via
detect_sql_topk
- Promotes it to
Statistic::Topk
- Routes to
CountMinSketchWithHeap with count_events: true (unit weight per event)
- Resolves self-keyed when the inference query_config lists a single aggregation, or via the multi-population fallback (heap + SetAggregator/DeltaSetAggregator) when no query config is present
SUM + ORDER BY + LIMIT was not supported: detection rejected non-COUNT aggregates, and capability matching could not tell apart two heap configs on the same metric (count vs sum weighting).
Problem
- Detection gap:
detect_sql_topk only accepted COUNT, so SUM(pkt_len) … ORDER BY total DESC LIMIT k fell through to the generic SQL path (post-processing sort/limit on a non-sketch aggregation).
- Capability-matching gap: Two
CountMinSketchWithHeap configs on the same metric (one count_events: true, one count_events: false) could not be disambiguated when falling back to capability matching.
- Parity: SUM top-k should follow the same resolution model as COUNT top-k (query-config self-keyed vs fallback separate-key), not introduce a new keying model.
Acceptance criteria
-
SUM(col) AS alias … GROUP BY key … ORDER BY alias DESC LIMIT k is detected as top-k and promoted to Statistic::Topk.
-
COUNT top-k behavior is unchanged.
-
Self-keyed SUM top-k works via a single-aggregation query_config (same as COUNT).
-
Capability fallback selects the correct heap when both count- and sum-weighted configs exist on the same metric.
-
SUM top-k does not match a count-only sketch (count_events: true).
-
Unit tests pass: cargo test -p asap_types --lib capability_matching, cargo test -p query_engine_rust --lib topk.
Summary
Extend SQL top-k support so
SUM(col) … GROUP BY … ORDER BY <alias> DESC LIMIT kis detected and served byCountMinSketchWithHeap, using the same resolution paths as existingCOUNT(col)top-k queries. Add capability-matching logic to distinguish count-weighted vs sum-weighted heap configs via count_events.Background
SQL top-k is already supported for:
This path:
detect_sql_topkStatistic::TopkCountMinSketchWithHeapwithcount_events: true(unit weight per event)SUM + ORDER BY + LIMITwas not supported: detection rejected non-COUNT aggregates, and capability matching could not tell apart two heap configs on the same metric (count vs sum weighting).Problem
detect_sql_topkonly acceptedCOUNT, soSUM(pkt_len) … ORDER BY total DESC LIMIT kfell through to the generic SQL path (post-processing sort/limit on a non-sketch aggregation).CountMinSketchWithHeapconfigs on the same metric (onecount_events: true, onecount_events: false) could not be disambiguated when falling back to capability matching.Acceptance criteria
SUM(col) AS alias … GROUP BY key … ORDER BY alias DESC LIMIT k is detected as top-k and promoted to Statistic::Topk.
COUNT top-k behavior is unchanged.
Self-keyed SUM top-k works via a single-aggregation query_config (same as COUNT).
Capability fallback selects the correct heap when both count- and sum-weighted configs exist on the same metric.
SUM top-k does not match a count-only sketch (count_events: true).
Unit tests pass: cargo test -p asap_types --lib capability_matching, cargo test -p query_engine_rust --lib topk.