Skip to content

Enhance subschema checking, Draft-07 support, and test coverage#41

Open
MrLYC wants to merge 8 commits into
IBM:masterfrom
MrLYC:master
Open

Enhance subschema checking, Draft-07 support, and test coverage#41
MrLYC wants to merge 8 commits into
IBM:masterfrom
MrLYC:master

Conversation

@MrLYC

@MrLYC MrLYC commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR extends jsonsubschema with several correctness, compatibility, and
usability improvements:

  • Add failure-reason support for subschema checks.
  • Add schemaDiff API and CLI support for schema compatibility analysis.
  • Support Draft-07 if/then/else and contains keywords.
  • Implement enum canonicalization for array and object values.
  • Improve negation handling for numeric multipleOf constraints.
  • Support negation of array/object schemas with size constraints.
  • Add LRU caching for repeated greenery regex parsing.
  • Expand test coverage across CLI behavior, explanations, edge cases, numeric
    constraints, pattern properties, Draft-07 behavior, and performance cases.

Changes

  • Updated core checking/canonicalization logic.
  • Added explanation and schema diff APIs.
  • Added CLI diff support.
  • Added comprehensive regression and feature tests.

Tests

  • Not run in this PR description draft.

MrLYC added 8 commits June 26, 2026 21:53
Signed-off-by: MrLYC <imyikong@gmail.com>
Signed-off-by: MrLYC <imyikong@gmail.com>
Previously, JSONTypeInteger.neg() and JSONTypeNumber.neg() ignored
multipleOf constraints entirely (marked as TODO). This caused incorrect
results when negating schemas like {type:integer, min:0, max:10, multipleOf:3}.

For bounded ranges, the negation now correctly computes gap ranges between
consecutive multiples, producing precise complement representations.
For unbounded ranges with multipleOf, the limitation is documented.

Adds 6 test cases covering bounded multipleOf negation for both integer
and number types, including acceptance and rejection scenarios.

Signed-off-by: MrLYC <imyikong@gmail.com>
Previously, array and object type enums raised UnsupportedEnumCanonicalization.
Now they are decomposed into anyOf of precise schemas per the paper's approach:
- Array enum values become schemas with exact items, minItems=maxItems
- Object enum values become schemas with exact properties, required, additionalProperties=false
- Inner enum values are recursively canonicalized to reach primitive types

Updates tests that previously expected exceptions to verify correct behavior.

Signed-off-by: MrLYC <imyikong@gmail.com>
Wrap greenery's parse() with functools.lru_cache(maxsize=1024) to avoid
redundant FSM construction for repeated patterns. This benefits schemas
with overlapping string patterns and recursive subtype checks where the
same regex is parsed multiple times.

All regex utility functions (regex_meet, regex_isSubset,
regex_matches_string, complement_of_string_pattern) and one call site
in _checkers.py now use the cached variant.

Signed-off-by: MrLYC <imyikong@gmail.com>
Previously, any array or object schema with constraints raised
UnsupportedNegatedArray/Object. Now schemas with only minItems/maxItems
(array) or minProperties/maxProperties (object) can be negated by
computing complement ranges.

For example, not({array, minItems:3}) produces anyOf([non-array types,
{array, maxItems:2}]).

Schemas with items, additionalItems, uniqueItems, properties, etc.
still raise exceptions as these require more complex complement logic.

Signed-off-by: MrLYC <imyikong@gmail.com>
New schemaDiff(s1, s2) function returns the compatibility relationship
between two schemas: 'equivalent', 'backward_compatible',
'forward_compatible', 'breaking', or 'unknown'.

This enables the schema evolution bug detection scenario described in
the paper (Section 5.2.1, Snowplow). Also adds --diff CLI flag.

Includes 8 test cases covering all result types plus the Washington Post
API evolution example from the paper.

Signed-off-by: MrLYC <imyikong@gmail.com>
if/then/else is canonicalized to equivalent anyOf+allOf form:
  {if:C, then:T, else:E} → {anyOf: [{allOf:[C,T]}, {allOf:[{not:C},E]}]}

contains is added as a new array keyword with subtype checking:
  If RHS has contains, LHS must also guarantee the constraint.

Adds 9 test cases covering both features including edge cases
(missing else, combined with type constraints, tuple items).

Signed-off-by: MrLYC <imyikong@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant