Skip to content

feat: scalar expressions in JSON-LD select (parity with SPARQL)#1227

Merged
bplatz merged 2 commits into
mainfrom
feature/expand-jldq-select
May 8, 2026
Merged

feat: scalar expressions in JSON-LD select (parity with SPARQL)#1227
bplatz merged 2 commits into
mainfrom
feature/expand-jldq-select

Conversation

@bplatz
Copy link
Copy Markdown
Contributor

@bplatz bplatz commented May 8, 2026

SELECT (expr AS ?alias) is now expressible in JSON-LD queries via
(as <expr> ?alias). The two languages share the same query IR, so this is
purely a JSON-LD parse/lower change — no SPARQL or executor changes.

{
  "select": ["?p", "(as (coalesce ?titleFr ?titleEn \"untitled\") ?title)"]
}

Previously rejected with Unknown aggregate function: coalesce, because the
JSON-LD select clause assumed every S-expression item was an aggregate. Now:

  • (<aggregate> ...) — existing auto-aliased aggregate
  • (as (<aggregate> ...) ?alias) — existing aliased aggregate
  • (as <expr> ?alias)new: scalar expression desugared to Pattern::Bind

A bare scalar without an alias errors with a clear pointer to the (as ...) form.

Function parity

Every SPARQL FunctionName variant has a corresponding name in JSON-LD's
function lookup, including the XSD cast constructors (xsd:boolean,
xsd:integer, xsd:float, xsd:double, xsd:decimal, xsd:string
both compact and full-IRI forms). The only deliberate exception is
in/not-in inside select expressions; the bracketed-list syntax they
require can't ride the SexprToken channel.

Bugs fixed in the same change

  • Quoted vs unquoted token collapse. SexprToken::Atom previously held
    both false and "false", so a select expression argument lost its
    literal type. Split into Atom (unquoted: variables, numbers, booleans,
    symbols) and String (quoted literal). Strict accessors (expect_atom)
    reject the new String variant; new as_str() returns inner text from
    either. (coalesce ?v "false") now correctly yields the string "false".

  • Chained post-aggregate binds. A select expression that referenced
    another post-aggregate bind (e.g. (as (+ ?adjusted 1) ?again), where
    ?adjusted itself was a post-bind) was routed as a pre-aggregation
    Pattern::Bind and evaluated against an unbound variable. Lowering now
    tracks post-bind aliases as it walks select columns and routes chained
    references to options.post_binds in source order.

Where the lowering happens

  • fluree-db-query/src/parse/ast.rs — added UnresolvedColumn::Expr.
  • fluree-db-query/src/parse/sexpr_tokenize.rs — split Atom / String.
  • fluree-db-query/src/parse/filter_sexpr.rsexpr_from_sexpr_token keeps
    literal type for String tokens.
  • fluree-db-query/src/parse/mod.rsparse_select_string dispatches
    aggregate vs scalar after a single tokenize; removed dead helpers; added
    XSD cast names.
  • fluree-db-query/src/parse/lower.rslower_query desugars Expr columns
    to Pattern::Bind (or options.post_binds if the expr references an
    aggregate output or earlier post-bind alias). Mirrors SPARQL's
    lower_select_expression_binds.

Subqueries support pre-aggregation expression binds; post-aggregation refs
inside subqueries are explicitly rejected because SubqueryPattern has no
post_binds field (separate enhancement).

Tests

fluree-db-query unit tests:

  • expression-column AST shape, lowering to Pattern::Bind
  • post-aggregate routing for (+ ?cnt 1)
  • chained post-binds (?cnt?adjusted?again) all in post_binds
  • quoted "false" / "42" / "?notavar" stay string literals
  • XSD cast lowers to Function::XsdInteger / Function::XsdString
  • missing-alias error
  • IF function in select
  • token-level distinction between String("false") and Atom("false")

fluree-db-api integration tests:

  • COALESCE with constant fallback over OPTIONAL
  • COALESCE + COUNT in one SELECT with GROUP BY (the user-reported case)
  • IF in select producing per-row labels

Docs

docs/query/jsonld-query.md:

  • New "S-expression columns" subsection in select covering both aggregate
    and scalar forms
  • Fixed three stale ["count", "?var"] examples that the parser actually
    rejects
  • Expanded the aggregation function list (count-distinct, median, variance,
    stddev, groupconcat with separator)

@bplatz bplatz requested review from aaj3f and zonotope May 8, 2026 14:38
Copy link
Copy Markdown
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good besides two small things.

Comment thread fluree-db-query/src/parse/ast.rs Outdated
/// `Column::Var(alias)` projection plus a `Pattern::Bind { var: alias, expr }`
/// injected into the patterns list — or into `options.post_binds` when
/// the expression references an aggregate output variable.
Expr {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal if it's hard to change, but naming the type the same thing as one of its fields can be confusing. Perhaps Computation would be better for the variant name, or code for the field name would disambiguate things.

Comment on lines +1119 to +1138
for column in columns {
if let UnresolvedColumn::Expr { expr, alias } = column {
// Subqueries do not currently expose post-aggregation binds, so
// disallow expressions that would need to run after aggregation.
// This mirrors the limitation of `SubqueryPattern` (no `post_binds`).
if aggregate_output_vars.contains(alias)
|| expression_references_any(expr, &aggregate_output_vars)
{
return Err(ParseError::InvalidSelect(format!(
"select expression '{alias}' references an aggregate output; \
post-aggregation BINDs are not supported inside subqueries"
)));
}
let alias_var = vars.get_or_insert(alias);
let lowered_expr = lower_filter_expr_with_encoder(expr, vars, encoder, pp_counter)?;
patterns.push(Pattern::Bind {
var: alias_var,
expr: lowered_expr,
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this structure is very close the one for lower_query. I think you could have a helper that processes one column at a time returning a pre or post bind (you'd also have to define an enum so you can differentiate the two cases).

Then, the lower_query loop calls that helper and accumulates both lists, and this lower_subquery loop also calls the same helper, but returns an error if the helper ever returns a post-bind.

That seems like a lot of ceremony for two private functions, but it will make the behavior easier to change in the future, and make any bugs that might exist in the logic easier to fix.

@bplatz
Copy link
Copy Markdown
Contributor Author

bplatz commented May 8, 2026

@zonotope addressed feedback in f7350f3

@bplatz bplatz merged commit ae9b251 into main May 8, 2026
13 checks passed
@bplatz bplatz deleted the feature/expand-jldq-select branch May 8, 2026 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants