Canonical RDF ontology, graph-native v2 bundles & SHACL validation#707
Conversation
Collapse three drifted RDF dialects for the same concept into one canonical vocabulary (RS topology + XBRL vocabulary), per local/docs/specs/rdf-ontology.md. Ontology (frameworks/ontology/v1/): - context.jsonld: published canonical @context (superset of every seed term) - ontology.ttl: RDFS/OWL class + property declarations - shapes.ttl: SHACL — positive shapes + negative shapes banning the retired dialects (xbrli:contextRef, arcFrom, summationOf, …) Vocabulary (arelle/context.py): - balance/periodType -> xbrli:; bind link/xlink/xbrldi/iso4217 - structural arcs reified: from/to (xlink), arcrole/role (xlink), weight/order (link), associationType (rs); direct summationOf/parent/ generalOf/dimensionOf/hypercubeOf RETIRED - equivalence stays direct owl:equivalentClass (symmetric, no arc metadata) - absorb all domain terms (drules, rules, traits, style) so it is the superset extractor.py: emit reified rs:Association (weight/order/preferredLabel from Arelle) + xbrli concept attrs; deterministic content-hashed association IRIs. serializer.py: compact predicate keys via the context (readable seeds). loader.py: read the single canonical reified form + xbrli attrs; structural direct-predicates dropped (equivalence + drules kept). Seeds: all 18 frameworks/**/taxonomy.jsonld regenerated to canonical form (semantics-preserving — identical element/association counts). Deps: + pyshacl. Verified: tests/arelle + tests/taxonomy green (211); all 18 seeds SHACL-conform; ruff + format + basedpyright clean. Runtime reseed + demo round-trip next.
…nto facts Phase B of the canonical RDF ontology migration: the export StatementBundle becomes graph-native (RS topology + XBRL vocabulary), mirroring the LadybugDB reporting graph instead of re-encoding an XBRL instance. bundle.py: BundlePeriod nodes replace BundleContext; facts carry period_ref / unit_ref / entity_ref directly (the FACT_HAS_* edges); _mint_periods replaces _mint_contexts. The XBRL context is no longer stored on the bundle. rdf/jsonld.py: rewritten to emit rs:Fact with direct element/entity/period/unit edges, rs:Element (xbrli:balance/periodType), reified rs:Association under rs:Structure, and rs:Period/rs:Unit aspect nodes — using the canonical CANONICAL_CONTEXT. serializationVersion → 2.0. validate_graph now runs SHACL (frameworks/ontology/v1/shapes.ttl), so the same shapes that gate the seeds gate the export, including the negative shapes that ban xbrli:contextRef. xbrl/xbrl_21.py: _derive_contexts reconstructs <xbrli:context> from the period nodes + entity at emit time (XBRL 2.1 requires shared contexts), so the emitted instance.xml is unchanged and stays Arelle-valid. Verified: 95 serialization tests green (incl. cross-encoder fact-set equivalence and a negative-shape rejection of re-introduced contextRef); 10,306-test unit suite green. Both Seattle Method demos + the RoboLedger demo re-run on fresh graphs emit serializationVersion 2.0 JSON-LD (no contextRef) with Arelle-valid XBRL and unchanged reconcile figures; sample_output refreshed accordingly.
…rence Adds the rendered-statement reconcile discussed early on: diffs the four-statement Report's seven anchor totals against Charlie Hoffman's published XBRL reference instance (mini/ref-num/instance.xml — the source of his index2.html), complementing the GL-pivot reconcile (which validates ingestion vs SummaryOfTransactions.csv). Our values are read straight from the v2 graph-native bundle (rs:Fact nodes) — the export artifact is the reconciliation source, which is the payoff of the ontology reshape. A mini→rs-gaap anchor map bridges the vocabularies; matching is by period position (Charlie's reference is labelled FY2022/EUR, ours spans 2023→2028, amounts tie regardless). Result: 7/7 anchors tie to the penny, current + prior — Assets, Liabilities & Equity, Net Income, Receivables, PP&E, Long-term Debt, and the −€648K Cash (which is in Charlie's reference report too, not an ingestion error). Wired as demo step 11 + `just demo-world-online-statement-reconcile`.
The graph-native bundle is the first *published* bundle ontology — the XBRL-aligned draft never shipped beyond a one-day demo, so there is no released predecessor to supersede. Stamp it accordingly: - SERIALIZATION_VERSION "2.0" → "1.0" (the value on every bundle's root) - IB-envelope datatype IRI /datatype/v2/ → /datatype/v1/ - docstrings/comments describing the artifact as "v2.0" → "v1.0" - refreshed sample bundles carry serializationVersion "1.0" The design history (XBRL-aligned → graph-native) lives in the specs; the published artifact + the ontology dir (frameworks/ontology/v1/) are both v1.
…utputs Each demo now validates the artifacts it just downloaded — on the host, against the on-disk output/ files, with the stack down (no API, no DB, no container): - examples/_common/validate.py — one shared validator: JSON-LD → pyshacl vs frameworks/ontology/v1/shapes.ttl (semantic conformance) and XBRL zip → Arelle vs the XBRL 2.1 spec (structural conformance). Writes a markdown evidence report per projection. - Wired as a single `validate` step in all three demos (Seattle Method, World Online, RoboLedger), reading the downloaded .jsonld/.zip. Replaces the old seattle xbrl_validate.py, which re-fetched from the API + queried the DB (a container dependency) — removed. World Online and RoboLedger gain XBRL/Arelle validation they didn't have. - tests/operations/serialization/test_sample_bundles_shacl.py — pytest that SHACL-validates every committed demo sample bundle against the ontology, so a non-conformant sample can't land. Evidence committed: all three demos' sample_output now carries both a *-shacl-validation.md (conforms, 0 violations) and a *-xbrl-validation.md (valid XBRL 2.1, 0 errors).
…the Report Resolves the publish-path latency concern: SHACL validation of the report bundle is now opt-in, and when it runs its result is persisted with the Report. - jsonld.py: decouple validation from serialization. serialize_to_jsonld no longer auto-validates (serialization shouldn't block). Add shacl_report(graph) -> ShaclResult (non-raising: ran/conforms/violations/shapes_checked/report + as_dict() for storage); validate_graph stays as the raising/strict wrapper. - config: REPORT_BUNDLE_SHACL_VALIDATION = off | warn | strict (default off, so the publish path stays fast — the standalone validator + the SHACL regression test cover demos/CI). - reports.py: _record_bundle_validation in the publish hook — when not off, SHACL-checks the bundle and records the structured outcome on Report.metadata['bundle_validation'] (audit trail); strict also raises on non-conformance to block the publish. Uses the existing JSONB metadata column — no migration. Tests: shacl_report on conforming + violating graphs, as_dict bounding, and the off/warn/strict hook behaviors (7 cases). Full serialization + roboledger suites green (649).
|
@claude please review this PR |
|
Claude finished @jfrench9's task in 4m 1s —— View job PR #707 Review — Canonical RDF Ontology, v2 Bundles & SHACL ValidationTasks
OverviewThis is a large, well-motivated PR with a coherent architecture. The core ideas — collapsing XBRL contexts onto Fact nodes, reifying arcs as Findings1.
|
- reports.py: guard _record_bundle_validation against validation-infra exceptions so a pyshacl/build_graph failure can't break a warn-mode publish (only strict re-raises); type bundle as StatementBundle. - jsonld.py: derive shapes_version from _SHAPES_PATH instead of a duplicated literal; close the _SHAPES_CACHE write race with a double-checked lock. - statement_reconcile.py: tolerate a bare (non-@graph) compacted bundle root in _load_actual. - _common/validate.py: resolve the XBRL entry point from the zip namelist instead of hard-coding instance.xml. - Make implicit string concatenations explicit (CodeQL) in validate.py and statement_reconcile.py. - Add tests: strict+conforming no-raise, warn swallows validation exception, strict re-raises it.
Summary
This PR introduces a canonical RDF ontology for the taxonomy, migrates bundles to a graph-native v2 serialization format, and adds opt-in SHACL validation at publish time. It also consolidates validation tooling across all demo examples and adds statement-level reconciliation capabilities.
Key Accomplishments
Canonical RDF Ontology (
frameworks/ontology/v1/)ontology.ttl— Formal OWL/RDF ontology that reifies XBRL arcs andxbrliconcept attributes as first-class RDF properties and classes.shapes.ttl— SHACL shapes graph for validating bundle conformance against the ontology.context.jsonld— Shared JSON-LD context enabling compact, human-readable serialization aligned with the ontology.Graph-Native v2 Bundle Serialization
SHACL Validation at Publish
Reportobject.Consolidated Validation Tooling
examples/_common/validate.pymodule provides container-free SHACL and Arelle (XBRL 2.1) validation, replacing the per-demoxbrl_validate.pyscript.Statement-Level Reconciliation (World Online)
statement_reconcile.pyperforms line-item reconciliation of generated financial statements against Charlie's reference data.world-online-statement-reconciliation.md).Breaking Changes
.jsonldoutput will see a different graph shape — dimensional context fields now appear inline on fact nodes rather than as separate context objects. Any downstream tooling that relies on the v1 context/fact indirection will need to be updated.@contextprefixxbrl→linkrenamed:CANONICAL_CONTEXTnow binds the XBRL linkbase namespace (http://www.xbrl.org/2003/linkbase#) to the prefixlink(wasxbrl). All in-repo seeds and bundles are regenerated against the new context, so the repo is internally consistent — but any external consumer, cached snapshot, or tool that references the oldxbrl:prefix in JSON-LD will silently produce unresolvable IRIs and must update tolink:.xbrl_validate.pyremoved: The per-demo validation script inseattle_method_demohas been deleted in favor of the shared common module.Testing
test_publish_validation.py— Verifies opt-in SHACL validation fires during publish and results propagate to the Report.test_sample_bundles_shacl.py— Validates all sample bundle outputs against the SHACL shapes graph.Infrastructure Considerations
uv.lock/pyproject.tomlchanges).🤖 Generated with Claude Code
Branch Info:
feature/ontology-refactormainCo-Authored-By: Claude noreply@anthropic.com