feat(query-engine): Support for SUM Top-k queries using CountMinSketchWithHeap#393
Conversation
milindsrivastava1997
left a comment
There was a problem hiding this comment.
- detect_sql_topk — topk detected from outer_data() only
In build_query_execution_context_sql, detect_sql_topk is called with query_data from match_result.outer_data(). For a OneTemporalOneSpatial pattern the temporal aggregate lives in inner_data. SUM top-k only makes sense as a flat OnlyTemporal query, but there's no guard here — if a subquery-form SUM ORDER BY LIMIT accidentally matches, it would silently detect top-k from the outer data and produce confusing behavior. The COUNT path had the same latent issue before this PR, but it's worth noting since SUM makes the shape more likely to appear in nested queries.
- Silent failure for negative SUM columns
sql.rs:150-156 documents the non-negative assumption but there's no runtime check or even a warn!() at query time. A user querying SUM(delta) where the column can go negative will get plausible-looking but wrong top-k results silently. Even a single warn! when the SUM top-k path is taken ("SUM top-k assumes non-negative values; results are undefined for columns with negative entries") would be better than nothing.
- Test gap: SUM capability-matching fallback path
topk_pipeline_tests has sum_topk_resolves_self_keyed_heap, which exercises the query_config path (single-aggregation reference). There's no integration-level test for the capability-matching fallback (when no query_config matches the SUM query). The unit tests in capability_matching.rs cover the logic, but the wiring from the engine into the matcher is untested for SUM. The COUNT pipeline tests have the same gap. Also, the fact that count_events is true as a default is also not tested.
Summary
Extends SQL top-k support so
SUM(col) … GROUP BY … ORDER BY <alias> DESC LIMIT kis detected and served byCountMinSketchWithHeap, using the same resolution model as existingCOUNT(col)top-k queries.Changes
asap-query-engine/src/engines/simple_engine/sql.rsMain implementation file.
TopkWeighting(Count|Sum) andSqlTopk { k, weighting }withcount_events()helper.detect_sql_topk: return type changed fromOption<u64>→Option<SqlTopk>. Now accepts bothCOUNT(col)andSUM(col)(same ORDER BY alias DESC + LIMIT rules). Rejects other aggregates. Documents non-negative SUM assumption.topk_kreplaced withtopk; still promotes toStatistic::Topkand threadskintoquery_kwargs.build_query_requirements_sql: signature takesOption<SqlTopk>instead ofOption<u64>; setstopk_count_eventsfrom detected weighting for capability fallback.detect_topk_tests:TopkWeighting::Count.sum_order_by_alias_desc_limit_is_topk.sum_aggregate_is_not_topkreplaced withmin_aggregate_is_not_topk.topk_pipeline_tests:sum_topk_query()helper.build_sum_topk_engine()— single-aggregation query config,count_events: false.sum_topk_resolves_self_keyed_heap— mirrors COUNT self-keyed path (key id == value id, nokeys_query).asap-common/dependencies/rs/asap_types/src/query_requirements.rs(+9 lines)topk_count_events: Option<bool>onQueryRequirements.Some(true)→ COUNT top-k sketch (count_events: true)Some(false)→ SUM top-k sketch (count_events: false)None→ no weighting constraintasap-common/dependencies/rs/asap_types/src/capability_matching.rsCapability-matching fallback for count vs sum top-k.
config_count_events()— readscount_eventsfrom heap config (defaulttrue).topk_weighting_compatible()— filtersStatistic::Topkcandidates by requested weighting.find_compatible_aggregation: addstopk_weighting_compatible(...)to the candidate filter. Top-k still uses the standard multi-population path (heap + key agg in fallback; self-keyed via query config).req(): initializestopk_count_events: None.topk_count_query_picks_count_events_sketch,topk_sum_query_picks_value_weighted_sketch,topk_sum_query_rejects_count_only_sketch,topk_count_query_matches_sketch_without_explicit_flag,topk_unconstrained_weighting_matches_any_sketch(all provision aDeltaSetAggregatorfor the fallback path).asap-query-engine/src/engines/simple_engine/promql.rsbuild_query_requirements_promql: setstopk_count_events: Noneso PromQL top-k does not constrain sketch weighting during capability matching.