Skip to content

feat(query-engine): SQL Top-K via CountMinSketchWithHeap (COUNT + ORDER BY + LIMIT) #388

@akanksha-akkihal

Description

@akanksha-akkihal

Support SQL Top-K via CountMinSketchWithHeap (COUNT + ORDER BY + LIMIT)

Summary

Support SQL top-k heavy-hitter queries using CountMinSketchWithHeap, for example:

SELECT srcip, COUNT(pkt_len) AS transfer_events
FROM netflow_table
WHERE time BETWEEN DATEADD(s, -11, NOW()) AND DATEADD(s, -10, NOW())
GROUP BY srcip
ORDER BY transfer_events DESC
LIMIT 10

The execution path should mirror PromQL topk(k, ...):

  • Precompute with CountMinSketchWithHeap
  • Retrieve candidate keys from the sketch heap at query time
  • Estimate counts from the CMS
  • Sort descending by estimated count
  • Return the top-k results

Results are approximate and based on CMS + heap estimates rather than exact ClickHouse aggregation.

Motivation

Current SQL support for ORDER BY ... DESC LIMIT k operates only as a post-processing step and requires materializing all group-by keys. This prevents use of the sketch heap and defeats the purpose of approximate top-k precomputation.

Supporting SQL top-k enables:

  • Efficient heavy-hitter queries without full group enumeration
  • Reuse of existing PromQL top-k infrastructure
  • Correct COUNT semantics (count events, not sum values)

Current Gaps

No SQL top-k detection

build_query_execution_context_sql mapped COUNT(...) to Statistic::Count, never Statistic::Topk. As a result, queries containing ORDER BY <count_alias> DESC LIMIT k were treated as ordinary grouped aggregations followed by SqlPostProcessing. The engine had to materialize all group-by keys before sorting and truncating, preventing use of the sketch heap and defeating the purpose of approximate top-k precomputation.

Wrong precompute accumulator

accumulator_factory.rs routed CountMinSketchWithHeap to CmsAccumulatorUpdater, which only understands a plain Count-Min Sketch and ignores the heap component entirely. While count estimates could still be queried for known keys, get_keys() returned None, leaving the execution layer with no way to retrieve heavy-hitter candidates from the sketch itself.

COUNT vs SUM weighting

CmsAccumulatorUpdater used the incoming sample value as the CMS weight. This is appropriate for sum-like statistics but incorrect for event counts. For a query such as COUNT(pkt_len), each observation should contribute a weight of 1, regardless of the value of pkt_len. The existing behavior effectively ranked groups by accumulated packet length rather than by number of events.

Single enable_topk flag

A single enable_topk flag controlled two unrelated behaviors: (a) heap-based candidate enumeration and top-k truncation, and (b) PromQL-specific __name__ label prefixing. SQL top-k requires the former but not the latter. Coupling both behaviors behind the same flag made it impossible to reuse the existing top-k machinery cleanly for SQL queries.

Capability fallback vs heap-only design

The SQL top-k design uses the GROUP BY column as the sketch's aggregated dimension, resulting in empty grouping_labels and one sketch per window. However, capability matching relies on strict label-layout equality and treats CountMinSketchWithHeap as a multi-population value type. Without an explicit query configuration, fallback matching still expected a paired key aggregation (e.g. SetAggregator / DeltaSetAggregator) and therefore could not resolve to the intended heap-only execution path.

Success Criteria

  • SQL queries of the form:
SELECT g, COUNT(v)
FROM t
GROUP BY g
ORDER BY COUNT(v) DESC
LIMIT k

are recognized as top-k queries.

  • Planner generates CountMinSketchWithHeap precomputes with the requested k.

  • Query execution retrieves candidate keys from the heap rather than enumerating all groups.

  • COUNT uses weight 1 per event when updating the sketch.

  • Results are sorted by estimated count and truncated to k.

  • Existing PromQL top-k behavior remains unchanged.

  • Non-top-k SQL aggregations continue to use existing execution paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions