feat(query-engine): SQL Top-K via CountMinSketchWithHeap (COUNT + ORDER BY + LIMIT)

# Support SQL Top-K via CountMinSketchWithHeap (COUNT + ORDER BY + LIMIT)

## Summary

Support SQL top-k heavy-hitter queries using `CountMinSketchWithHeap`, for example:

```sql
SELECT srcip, COUNT(pkt_len) AS transfer_events
FROM netflow_table
WHERE time BETWEEN DATEADD(s, -11, NOW()) AND DATEADD(s, -10, NOW())
GROUP BY srcip
ORDER BY transfer_events DESC
LIMIT 10
```

The execution path should mirror PromQL `topk(k, ...)`:

* Precompute with `CountMinSketchWithHeap`
* Retrieve candidate keys from the sketch heap at query time
* Estimate counts from the CMS
* Sort descending by estimated count
* Return the top-k results

Results are approximate and based on CMS + heap estimates rather than exact ClickHouse aggregation.

## Motivation

Current SQL support for `ORDER BY ... DESC LIMIT k` operates only as a post-processing step and requires materializing all group-by keys. This prevents use of the sketch heap and defeats the purpose of approximate top-k precomputation.

Supporting SQL top-k enables:

* Efficient heavy-hitter queries without full group enumeration
* Reuse of existing PromQL top-k infrastructure
* Correct COUNT semantics (count events, not sum values)

## Current Gaps 

### No SQL top-k detection

`build_query_execution_context_sql` mapped `COUNT(...)` to `Statistic::Count`, never `Statistic::Topk`. As a result, queries containing `ORDER BY <count_alias> DESC LIMIT k` were treated as ordinary grouped aggregations followed by `SqlPostProcessing`. The engine had to materialize all group-by keys before sorting and truncating, preventing use of the sketch heap and defeating the purpose of approximate top-k precomputation.

### Wrong precompute accumulator

`accumulator_factory.rs` routed `CountMinSketchWithHeap` to `CmsAccumulatorUpdater`, which only understands a plain Count-Min Sketch and ignores the heap component entirely. While count estimates could still be queried for known keys, `get_keys()` returned `None`, leaving the execution layer with no way to retrieve heavy-hitter candidates from the sketch itself.

### COUNT vs SUM weighting

`CmsAccumulatorUpdater` used the incoming sample value as the CMS weight. This is appropriate for sum-like statistics but incorrect for event counts. For a query such as `COUNT(pkt_len)`, each observation should contribute a weight of `1`, regardless of the value of `pkt_len`. The existing behavior effectively ranked groups by accumulated packet length rather than by number of events.

### Single `enable_topk` flag

A single `enable_topk` flag controlled two unrelated behaviors: (a) heap-based candidate enumeration and top-k truncation, and (b) PromQL-specific `__name__` label prefixing. SQL top-k requires the former but not the latter. Coupling both behaviors behind the same flag made it impossible to reuse the existing top-k machinery cleanly for SQL queries.

### Capability fallback vs heap-only design

The SQL top-k design uses the GROUP BY column as the sketch's aggregated dimension, resulting in empty `grouping_labels` and one sketch per window. However, capability matching relies on strict label-layout equality and treats `CountMinSketchWithHeap` as a multi-population value type. Without an explicit query configuration, fallback matching still expected a paired key aggregation (e.g. `SetAggregator` / `DeltaSetAggregator`) and therefore could not resolve to the intended heap-only execution path.


## Success Criteria

* SQL queries of the form:

```sql
SELECT g, COUNT(v)
FROM t
GROUP BY g
ORDER BY COUNT(v) DESC
LIMIT k
```

are recognized as top-k queries.

* Planner generates `CountMinSketchWithHeap` precomputes with the requested `k`.

* Query execution retrieves candidate keys from the heap rather than enumerating all groups.

* COUNT uses weight `1` per event when updating the sketch.

* Results are sorted by estimated count and truncated to `k`.

* Existing PromQL top-k behavior remains unchanged.

* Non-top-k SQL aggregations continue to use existing execution paths.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(query-engine): SQL Top-K via CountMinSketchWithHeap (COUNT + ORDER BY + LIMIT) #388

Support SQL Top-K via CountMinSketchWithHeap (COUNT + ORDER BY + LIMIT)

Summary

Motivation

Current Gaps

No SQL top-k detection

Wrong precompute accumulator

COUNT vs SUM weighting

Single `enable_topk` flag

Capability fallback vs heap-only design

Success Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(query-engine): SQL Top-K via CountMinSketchWithHeap (COUNT + ORDER BY + LIMIT) #388

Description

Support SQL Top-K via CountMinSketchWithHeap (COUNT + ORDER BY + LIMIT)

Summary

Motivation

Current Gaps

No SQL top-k detection

Wrong precompute accumulator

COUNT vs SUM weighting

Single enable_topk flag

Capability fallback vs heap-only design

Success Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Single `enable_topk` flag