Streamline query internal representation#1226
Merged
Merged
Conversation
This mirrors the actual query ast and makes the invalid state of a wildcard select and a Some graph_select unrepresentable. Also updates nomenclature and changes the name of the "graph crawl" concept to "hydration" to match other databases and ORMs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch reshapes
fluree-db-query's intermediate representation to push more invariants into the type system, peel the catch-allQueryOptionsapart so each modifier lives where it's actually used, reorganize a couple of misnamed modules, and fix aSELECT DISTINCT *correctness regression along the way. Net change: ~75 files, +1.9k / -1.9k. The runtime executor is unchanged in behavior; the API surface aroundQuery,QueryOutput,ReasoningConfig, andSubqueryPatternshifts noticeably and is documented under "Migration notes" below.Why
The old
QueryIR mixed three different kinds of state in one place and let the type system express several illegal combinations:QueryOptionswas a 7-field grab-bag carrying LIMIT, OFFSET, ORDER BY, GROUP BY, aggregates, HAVING, post-binds, DISTINCT, and the rewriter's reasoning configuration. The first eight are surface modifiers; the last is rewriter input. They had no business riding together, and the bag made it possible to construct nonsense (e.g. an ASK query carrying GROUP BY).selectOneandselectDistinctwere both modeled as boolean flags on the parser surface, which let an internal caller construct "selectOne distinct" — meaningless becauseRestriction::Onechanges the output shape (bare row vs. one-element array) whileDistinctdoes not.Vec-typed fields (group_by,aggregates,having,post_binds) that had to be checked for joint validity at every consumer. "GROUP BY non-empty implies aggregates may be empty (dedup mode)", "HAVING requires aggregates", "post-binds require aggregates" were prose contracts.expressionthat also contained a re-exported submoduleeval— two names for the same thing.The cumulative effect was that operator-tree construction and dependency analysis spent significant prose in
if !x.is_empty() && y.is_some() && ...gates, and a single conceptual change ("does this query group?") required updating four parallel data structures plus their cross-checks.What changed
Type-level invariants
Restrictionenum (fluree_db_query::ir::Restriction) replaces theselectDistinct/selectOneboolean pair onQueryOutput::Select. VariantsDistinctandOneare mutually exclusive, expressed asOption<Restriction>.Restriction::Onecarries an explicit doc note distinguishing it fromquery.limit = Some(1)because they differ in output shape (bare row vs. one-element array).Groupingenum (fluree_db_query::ir::Grouping) replaces the parallel-Vec encoding of the aggregation phase. Variants:Implicit { aggregation, having }— single implicit group, always carries anAggregation.Explicit { group_by, aggregation, having }— partitioned byNonEmpty<VarId>;aggregationisOption<Aggregation>because GROUP BY without aggregates is a legal dedup-by-key form.Helpers:
Grouping::assemble(group_by, aggregates, binds, having) -> Option<Self>selects the right variant from loose lowering pieces;Grouping::aggregates(),Grouping::binds(),Grouping::having(),Grouping::aggregation(),Grouping::group_by_vars()give variant-agnostic iteration. The previous "is HAVING legal here?" / "isbinds.is_empty()ok?" checks at consumer sites collapse into pattern matches that the compiler exhausts.Aggregationsub-struct bundlesaggregates: NonEmpty<AggregateSpec>withbinds: Vec<(VarId, Expression)>so that post-aggregation binds can only exist where an aggregation stage is present. Eliminates the "GROUP BY without aggregates plus post-binds" representable-but-meaningless combination.NonEmpty<T>promoted tofluree-db-core(fluree_db_core::NonEmpty) with structural non-emptiness viahead: T, tail: Vec<T>. Used byGroupingforgroup_byand aggregate lists. Public field representation is intentional — supports destructure-rebuild inrevert.rs::RevertSource::Commits.Query field reshuffle
QueryOptions's modifiers were peeled off one axis at a time and lifted ontoQuerydirectly:The wrapper that remained — only two fields, both rewriter-side configuration — was renamed to express its actual purpose:
ExecutableQuery.options: QueryOptionsbecameExecutableQuery.reasoning: ReasoningConfigto match.Module reorganization
fluree-db-query/src/expression*→fluree-db-query/src/eval/. Theexpressionmodule was a kitchen sink containing both the AST (which is IR) and the evaluator (which is runtime); the IR types moved up tocrate::ir::expression, and the runtimeevaldirectory now contains just the dispatcher and operator implementations (arithmetic,cast,compare,datetime,string,numeric,logical,conditional,geo,hash,rdf,fluree,fulltext, etc.).fluree-db-query/src/ir/options.rs→fluree-db-query/src/ir/reasoning.rs. Module name now matches the struct it exports (ReasoningConfig).FilterValueenum removed in favor of the existingFlakeValue— the duplicate value-domain type is gone.Projectiontype hierarchy was reshaped to make invalid projections unrepresentable (Tuple/Scalar/Wildcardcarry their column shape directly).graph_selectwas moved off the top-level intoQueryOutput::Select, where it actually applies.Operator-tree cleanup
implicit_single_aggregate(query) -> Option<&AggregateSpec>helper replaces thelet Some(Grouping::Implicit { aggregation: Aggregation { aggregates, binds }, having: None }) = … else { return None }; if aggregates.len() != 1 || …gate that was duplicated across eleven fast-path detectors. Each detector is now ~5 lines shorter and shares the gate definition.build_operator_tree(query, stats, planning)lost itsoptions: &QueryOptionsparameter — the function (and its inner) never read it. Same drop applied to all 33detect_*private helpers andtry_build_count_plan.compute_variable_deps(query)is now single-arg for the same reason.Grouping::aggregates()drops aBox<dyn Iterator>allocation per call in favor ofimpl Iterator, matching the shape ofbinds().dependency.rs::compute_variable_depsreads post-aggregation binds via&[(VarId, Expression)]borrowed offAggregation::bindsinstead of materializing aVec<&(VarId, Expression)>.SPARQL lowering
lower_base_modifiers(modifiers) -> Result<BaseModifiers>replaces the prior(modifiers, &mut QueryOptions)mutation pattern. ReturnsBaseModifiers { limit, offset, ordering }for the caller to lift ontoQuerydirectly. All four call sites (SELECT, CONSTRUCT, DESCRIBE, ASK) destructure cleanly. The redundantlower_construct_modifierswrapper went away.limit: Some(1), offset: None, ordering: Vec::new(), grouping: Noneand discards the parsed solution modifiers (per SPARQL semantics — ASK answers a single boolean and the modifiers are inert).Grouping::assemble(...)is used uniformly across all three lowering paths (parse/lower.rs,sparql/lower/mod.rs,sparql/lower/select.rs) to construct the grouping phase from loose pieces, replacing three near-identical variant-selection ladders.SubqueryPatternfield namingSubqueryPattern.order_by→orderingandSubqueryPattern::with_order_by→with_ordering, mirroring the top-levelQuery.orderingrename. The remainingSubqueryPattern.distinct: boolis left as a bool intentionally —selectOnedoesn't apply to subqueries (subqueries always produce a row stream), so the binaryRestrictionaxis isn't meaningful there.Bugfix:
SELECT DISTINCT *regressionfluree-db-sparql/src/lower/mod.rswas silently downgradingSELECT DISTINCT *toSELECT *because the wildcard match arm dropped thedistinctflag. Added aQueryOutput::wildcard_distinct()constructor and split the match so(SelectVariables::Star, true)produces a wildcard withRestriction::Distinct.Drive-by cleanup
input_varsmethods after confirmingreferenced_varscovers every consumer.decode_lookup_errorlifted from a method into a free function to break a circular dep throughexpression.Debug/Clone/PartialEqimpls replaced by#[derive(...)].is_boundconsolidated to a single function across two consumers.operator_tree.rscollapsed eight separateuse crate::ir::*lines into one grouped import;dependency.rstest module dedup'd.QueryOptions/expressionframing — includingir.rs,ir/query.rs,ir/reasoning.rs, and theRestriction::Onedoc.Migration notes (for external consumers)
The public API of
fluree-db-queryshifts in several ways. None of these are subtle — the compiler will tell you about each one — but the rename targets are worth knowing in advance:query.options.limitquery.limitquery.options.offsetquery.offsetquery.options.order_byquery.orderingquery.options.reasoningquery.reasoning.modesquery.options.schema_bundlequery.reasoning.schema_bundleQueryOptionsReasoningConfigQueryOptions::with_limit(n)Query.limit = Some(n)directlyQueryOptions::with_offset(n)Query.offset = Some(n)directlyQueryOptions::with_order_by(specs)Query.ordering = specsdirectlyQueryOptions::with_reasoning(modes)ReasoningConfig::with_modes(modes)QueryOptions::has_modifiers()ExecutableQuery.optionsExecutableQuery.reasoningbuild_operator_tree(query, options, …)build_operator_tree(query, …)(3 args)compute_variable_deps(query, options)compute_variable_deps(query)(1 arg)SubqueryPattern.order_bySubqueryPattern.orderingSubqueryPattern::with_order_by(specs)SubqueryPattern::with_ordering(specs)crate::expression::*(eval runtime)crate::eval::*crate::ir::options::ReasoningConfigcrate::ir::reasoning::ReasoningConfigFilterValueFlakeValue(already existed)QueryOutput::Select { distinct: bool, … }QueryOutput::Select { restriction: Option<Restriction>, … }New IR types you'll see at use sites:
Restriction,Grouping,Aggregation,AggregateFn,AggregateSpec,ReasoningConfig,ReasoningModes(renamed),BaseModifiers(SPARQL lower-internal), andfluree_db_core::NonEmpty<T>.Queryliterals carry more fields than before (every modifier moved up). Test fixtures andwith_patterns-style copies were updated accordingly.Known limitations / follow-ups
limit=1, ordering=[], grouping=Noneto honor the boolean-output semantics. The JSON-LD lowering path (parse/lower.rs::lower_query) doesn't yet apply the same scrub — a JSON-LD ASK withgroupByororderBywill carry those fields through to execution where they execute and produce wasted work (the format layer ignores all but "any non-empty batch?"). The semantics are preserved; only efficiency is affected. Parser-side cleanup is planned as a follow-up.SubqueryPatterncarries its ownselect,patterns,limit,offset,distinct: bool,ordering,groupingfields rather than embedding aQuery. Convergence into a shared "QueryBody" struct was discussed but not pursued — the two surfaces have different invariants (subqueries are always row streams, never ASK; their projection is aVec<VarId>rather than the richerQueryOutput).lower/select.rs::lower_subselect); pre-existing limitation, not introduced by this branch.build_operator_tree's top-of-functionaggregates_vec/group_by_vec/post_binds_vecextraction: still materializes looseVecs for downstream operator constructors that consume owned data. Acceptable trade-off for one-shot setup; flagged for a future round when the operator constructors gainArc<[T]>-shaped APIs.subquery.rs::execute_subquery_for_row: still rebuilds the operator stack each parent row. Real fix requires hoisting grouping/aggregation setup intoSubqueryOperatorconstruction with shared ownership. Out of scope for this branch.