Skip to content

feat(query-engine) : Support SQL SUM … ORDER BY … LIMIT top-k via CountMinSketchWithHeap #392

@akanksha-akkihal

Description

@akanksha-akkihal

Summary

Extend SQL top-k support so SUM(col) … GROUP BY … ORDER BY <alias> DESC LIMIT k is detected and served by CountMinSketchWithHeap, using the same resolution paths as existing COUNT(col) top-k queries. Add capability-matching logic to distinguish count-weighted vs sum-weighted heap configs via count_events.

Background

SQL top-k is already supported for:

SELECT srcip, COUNT(pkt_len) AS transfer_events
FROM netflow_table
WHERE time BETWEEN … ANDGROUP BY srcip
ORDER BY transfer_events DESC
LIMIT 10

This path:

  • Detects the pattern via detect_sql_topk
  • Promotes it to Statistic::Topk
  • Routes to CountMinSketchWithHeap with count_events: true (unit weight per event)
  • Resolves self-keyed when the inference query_config lists a single aggregation, or via the multi-population fallback (heap + SetAggregator/DeltaSetAggregator) when no query config is present

SUM + ORDER BY + LIMIT was not supported: detection rejected non-COUNT aggregates, and capability matching could not tell apart two heap configs on the same metric (count vs sum weighting).

Problem

  • Detection gap: detect_sql_topk only accepted COUNT, so SUM(pkt_len) … ORDER BY total DESC LIMIT k fell through to the generic SQL path (post-processing sort/limit on a non-sketch aggregation).
  • Capability-matching gap: Two CountMinSketchWithHeap configs on the same metric (one count_events: true, one count_events: false) could not be disambiguated when falling back to capability matching.
  • Parity: SUM top-k should follow the same resolution model as COUNT top-k (query-config self-keyed vs fallback separate-key), not introduce a new keying model.

Acceptance criteria

  • SUM(col) AS alias … GROUP BY key … ORDER BY alias DESC LIMIT k is detected as top-k and promoted to Statistic::Topk.

  • COUNT top-k behavior is unchanged.

  • Self-keyed SUM top-k works via a single-aggregation query_config (same as COUNT).

  • Capability fallback selects the correct heap when both count- and sum-weighted configs exist on the same metric.

  • SUM top-k does not match a count-only sketch (count_events: true).

  • Unit tests pass: cargo test -p asap_types --lib capability_matching, cargo test -p query_engine_rust --lib topk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions