Skip to content

[SPARK-57727][SQL] Fix incorrect query result due to inferAdditionalConstraints incorrectly substituting attributes with non-binary-stable collations#56836

Draft
jiwen624 wants to merge 1 commit into
apache:masterfrom
jiwen624:SPARK-constraint-collation
Draft

Conversation

@jiwen624

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Guard the attribute-substitution cases in QueryPlanConstraints.inferAdditionalConstraints with isBinaryStable, so the transitive inference is skipped when the equated attributes have a non-binary-stable type (e.g. a non-UTF8_BINARY collation). Binary-stable types are unaffected.

This mirrors the existing guard in another rule ConstantPropagation (SPARK-55647).

Why are the changes needed?

inferAdditionalConstraints infers a predicate on b from a = b and a predicate on a by substituting a with b. This is only valid when a and b are byte-for-byte interchangeable. Under a non-binary-stable collation, a = b is a collation equality (e.g. 'hello' = 'HELLO' under UTF8_LCASE), not byte equality, so substituting into a comparison evaluated in a different collation produces a wrong constraint and silently drops rows:

CREATE TABLE t (a STRING COLLATE UTF8_LCASE, b STRING COLLATE UTF8_LCASE);
INSERT INTO t VALUES ('hello', 'HELLO');
SELECT a, b FROM t WHERE a = b AND a = 'hello' COLLATE UTF8_BINARY;

Does this PR introduce any user-facing change?

Yes, fixed incorrect query result.

How was this patch tested?

Added UT.

Was this patch authored or co-authored using generative AI tooling?

Yes. Claude Code

…uting attributes with non-binary-stable collations
@jiwen624

Copy link
Copy Markdown
Contributor Author

While this fix should be correct but it blocks pushdown of same-collation, e.g.,:

-- a, b : UTF8_LCASE ; everything same collation
... t1.a = t2.b   AND   t1.a = 'x'      -- no COLLATE override
infer: t2.b = 'x'   (all UTF8_LCASE)

I'm thinking about a second approach, probably a bit more complex though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant