Skip to content

feat(query-engine): Support for SUM Top-k queries using CountMinSketchWithHeap#393

Merged
milindsrivastava1997 merged 5 commits into
mainfrom
392-feat-query-engine-support-sql-sum-order-by-limit-top-k
Jun 8, 2026
Merged

feat(query-engine): Support for SUM Top-k queries using CountMinSketchWithHeap#393
milindsrivastava1997 merged 5 commits into
mainfrom
392-feat-query-engine-support-sql-sum-order-by-limit-top-k

Conversation

@akanksha-akkihal

Copy link
Copy Markdown
Collaborator

Summary

Extends SQL top-k support so SUM(col) … GROUP BY … ORDER BY <alias> DESC LIMIT k is detected and served by CountMinSketchWithHeap, using the same resolution model as existing COUNT(col) top-k queries.

Changes

asap-query-engine/src/engines/simple_engine/sql.rs

Main implementation file.

  • New types: TopkWeighting (Count | Sum) and SqlTopk { k, weighting } with count_events() helper.
  • detect_sql_topk: return type changed from Option<u64>Option<SqlTopk>. Now accepts both COUNT(col) and SUM(col) (same ORDER BY alias DESC + LIMIT rules). Rejects other aggregates. Documents non-negative SUM assumption.
  • Context building: topk_k replaced with topk; still promotes to Statistic::Topk and threads k into query_kwargs.
  • build_query_requirements_sql: signature takes Option<SqlTopk> instead of Option<u64>; sets topk_count_events from detected weighting for capability fallback.
  • detect_topk_tests:
    • COUNT test now asserts TopkWeighting::Count.
    • New sum_order_by_alias_desc_limit_is_topk.
    • sum_aggregate_is_not_topk replaced with min_aggregate_is_not_topk.
  • topk_pipeline_tests:
    • New sum_topk_query() helper.
    • New build_sum_topk_engine() — single-aggregation query config, count_events: false.
    • New sum_topk_resolves_self_keyed_heap — mirrors COUNT self-keyed path (key id == value id, no keys_query).

asap-common/dependencies/rs/asap_types/src/query_requirements.rs (+9 lines)

  • New field: topk_count_events: Option<bool> on QueryRequirements.
    • Some(true) → COUNT top-k sketch (count_events: true)
    • Some(false) → SUM top-k sketch (count_events: false)
    • None → no weighting constraint

asap-common/dependencies/rs/asap_types/src/capability_matching.rs

Capability-matching fallback for count vs sum top-k.

  • New helpers:
    • config_count_events() — reads count_events from heap config (default true).
    • topk_weighting_compatible() — filters Statistic::Topk candidates by requested weighting.
  • find_compatible_aggregation: adds topk_weighting_compatible(...) to the candidate filter. Top-k still uses the standard multi-population path (heap + key agg in fallback; self-keyed via query config).
  • Test helper req(): initializes topk_count_events: None.
  • New tests: topk_count_query_picks_count_events_sketch, topk_sum_query_picks_value_weighted_sketch, topk_sum_query_rejects_count_only_sketch, topk_count_query_matches_sketch_without_explicit_flag, topk_unconstrained_weighting_matches_any_sketch (all provision a DeltaSetAggregator for the fallback path).

asap-query-engine/src/engines/simple_engine/promql.rs

  • build_query_requirements_promql: sets topk_count_events: None so PromQL top-k does not constrain sketch weighting during capability matching.

@milindsrivastava1997 milindsrivastava1997 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. detect_sql_topk — topk detected from outer_data() only

In build_query_execution_context_sql, detect_sql_topk is called with query_data from match_result.outer_data(). For a OneTemporalOneSpatial pattern the temporal aggregate lives in inner_data. SUM top-k only makes sense as a flat OnlyTemporal query, but there's no guard here — if a subquery-form SUM ORDER BY LIMIT accidentally matches, it would silently detect top-k from the outer data and produce confusing behavior. The COUNT path had the same latent issue before this PR, but it's worth noting since SUM makes the shape more likely to appear in nested queries.

  1. Silent failure for negative SUM columns

sql.rs:150-156 documents the non-negative assumption but there's no runtime check or even a warn!() at query time. A user querying SUM(delta) where the column can go negative will get plausible-looking but wrong top-k results silently. Even a single warn! when the SUM top-k path is taken ("SUM top-k assumes non-negative values; results are undefined for columns with negative entries") would be better than nothing.

  1. Test gap: SUM capability-matching fallback path

topk_pipeline_tests has sum_topk_resolves_self_keyed_heap, which exercises the query_config path (single-aggregation reference). There's no integration-level test for the capability-matching fallback (when no query_config matches the SUM query). The unit tests in capability_matching.rs cover the logic, but the wiring from the engine into the matcher is untested for SUM. The COUNT pipeline tests have the same gap. Also, the fact that count_events is true as a default is also not tested.

@milindsrivastava1997 milindsrivastava1997 merged commit 84d03cc into main Jun 8, 2026
11 checks passed
@milindsrivastava1997 milindsrivastava1997 deleted the 392-feat-query-engine-support-sql-sum-order-by-limit-top-k branch June 8, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(query-engine) : Support SQL SUM … ORDER BY … LIMIT top-k via CountMinSketchWithHeap

2 participants