Skip to content

Clean up XBRL serialization output and add bundle support#708

Merged
jfrench9 merged 2 commits into
mainfrom
feature/serialization-cleanup
May 30, 2026
Merged

Clean up XBRL serialization output and add bundle support#708
jfrench9 merged 2 commits into
mainfrom
feature/serialization-cleanup

Conversation

@jfrench9
Copy link
Copy Markdown
Member

@jfrench9 jfrench9 commented May 30, 2026

Summary

Follow-on to #707, addressing the XBRL-validity issues Charlie Hoffman flagged on the first shipped serialization (validated against XBRL Cloud + Instar). All fixes are framework-independent — rs-gaap stays the report framework. The canonical RDF ontology is the interop layer; XBRL 2.1 is the bridge (mini-as-reporting-framework is explicitly out of scope).

Changes

All four changed source/test files are modifications to existing modules (no new module or runtime dependency):

  • xbrl/xbrl_21.py — fact dedup; label linkbase (report-lab.xml); conditional emission of every linkbase file (presentation/calc/def/label all gated on having content); NS_XML constant.
  • serialization/bundle.py_source_calculation_arcs: sources calc arcs onto the rendered Network's ELR and emits a subtotal only when every child's stored weight sign is XBRL-legal for the endpoints' balance types (§5.1.1.2 Table 6).
  • rdf/jsonld.py — scope the Association IRI by link group so a presentation arc and a sourced calc arc at the same index under one Structure don't collapse onto one node (which failed AssociationShape).
  • tests/test_xbrl_emitter.py (dedup, labels, conditional linkbases) + new test_calc_sourcing.py (the balance-legality heuristic).

What it fixes (Charlie's review)

  • Duplicate facts — dedupe the instance on the full (concept, contextRef, unitRef, value, decimals) tuple. The same value legitimately flows through several FactSets (NetIncomeLoss via IS/CF/SE); deduping on the full tuple removes the redundant copies while still surfacing a genuine inconsistency (same aspects, different value) to the validator.
  • No labels — emit report-lab.xml from the element display name (standard label role, xml:lang=en); concepts whose only text echoes the QName local part stay unlabelled.
  • No calculation relations — calc arcs lived on separate rs-gaap-calculation Structures and never reached the bundle. The legality filter drops the cash-flow rollups (indirect-method children mix debit/credit under a cash subtotal — the canonical case real filers omit) and keeps the well-behaved BS/IS summations.
  • Empty linkbase noise — suppress empty calc/def linkbase files (the linkbaseRefs were already gated).

Validation

End-to-end on all three demos (lemonade, World Online, roboledger): 0 duplicate facts, 62 calculation arcs, 75 labels, SHACL conforms, Arelle valid XBRL 2.1. sample_output/ refreshed. Full serialization suite (122 tests) green; ruff/format/basedpyright clean.

Out of scope (deliberate)

  • Mini-as-reporting-framework output — would regress the CoA→rs-gaap projection story.
  • Luca/DFRNT Excel/JSON import format — proprietary vendor ingestion shape; XBRL 2.1 is already the interop layer.

Addresses the XBRL-validity issues Charlie Hoffman flagged on the first
shipped serialization (XBRL Cloud + Instar): duplicate facts, missing
labels, empty/absent calculation linkbase. All framework-independent
(rs-gaap output unchanged in framework; mini-as-reporting-framework is
out of scope).

- Dedupe facts in the instance: the same concept value flows through
  several FactSets (NetIncomeLoss via IS/CF/SE), yielding identical
  <concept contextRef unitRef>value tuples. Collapse on the full tuple
  incl. value, so a genuine inconsistency (same aspects, different
  value) is still surfaced to the validator.
- Emit a label linkbase (report-lab.xml) from the element display name
  (standard label role, xml:lang=en); skip concepts whose only text
  echoes the QName local part.
- Source the calculation linkbase: calc arcs live on separate
  rs-gaap-calculation Structures and never reached the bundle. Pull calc
  arcs among the report's concepts and host each under the rendered
  Network (shared presentation/calculation ELR). Emit a subtotal's
  children only when every child's stored weight sign is XBRL-legal for
  the endpoints' balance types (§5.1.1.2 Table 6) — drops the cash-flow
  rollups whose indirect-method children mix debit/credit (the canonical
  case filers omit), keeps the well-behaved BS/IS summations.
- Suppress empty calc/def linkbase files (an empty <link:linkbase/>
  reads as a missing-relations bug); the linkbaseRef was already gated.
- Scope the JSON-LD Association IRI by link group so a presentation arc
  and a sourced calc arc at the same index under one Structure don't
  collapse onto one node (which would fail AssociationShape).

Validated end-to-end on all three demos (lemonade, World Online,
roboledger): no duplicate facts, 62 calculation arcs, 75 labels, SHACL
conforms, Arelle valid XBRL 2.1. sample_output refreshed.
@jfrench9
Copy link
Copy Markdown
Member Author

@claude please review this PR

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @jfrench9's task in 3m 5s —— View job


PR Review: Clean up XBRL serialization output and add bundle support

Tasks

  • Gather context and understand the request
  • Read core serialization files (bundle.py, xbrl_21.py, jsonld.py)
  • Review test changes (test_xbrl_emitter.py)
  • Spot-check sample output changes
  • Post comprehensive review feedback

Overall this is a solid, well-motivated cleanup. The fact deduplication, conditional linkbase emission, association URI scoping fix, and label linkbase are all correct and address real XBRL conformance problems. A few issues to look at before merging:


Bug: report-pre.xml always emitted, inconsistent with other linkbases

xbrl_21.py:86

zf.writestr("report-pre.xml", _serialize_xml(_build_presentation_linkbase(bundle)))
# vs line 347-348:
if bundle.linkbases.presentation_links:
    _append_linkbase_ref(appinfo, "report-pre.xml", "presentationLinkbaseRef")

When there are no presentation arcs: the zip contains report-pre.xml (an empty linkbase) but the schema has no linkbaseRef pointing to it. The other three linkbases (-cal, -def, -lab) are all gated on having content. An unreferenced extra file in the zip is noise; worse, it's inconsistent with the comment immediately following ("Only emit linkbase files that actually carry arcs/labels"). Should follow the same pattern:

if bundle.linkbases.presentation_links:
    zf.writestr("report-pre.xml", _serialize_xml(_build_presentation_linkbase(bundle)))

The existing test_no_linkbase_ref_when_no_arcs_emitted test only checks the schema ref — it won't catch the file still landing in the zip. Fix this →


Missing test: _source_calculation_arcs balance-legality heuristic

bundle.py:679-792

The new _source_calculation_arcs is the most complex piece in this PR — it silently drops cash-flow calculation arcs while keeping IS/BS summations. That filtering logic (the balance-direction legality check, the host-network selection, the per-subtotal grouping) is pure computation with no database dependency, so it's unit-testable with SimpleNamespace fixtures. Without a test, the next person who touches it has no safety net.

What to cover at minimum:

  • A two-level BS/IS summation (parent=debit, child=debit, weight=+1) → included
  • An indirect-CF subtotal (mixed debit/credit children) → dropped
  • An arc where neither endpoint is in a rendered network → not included (host=None case)

Fix this →


Style: NS_XML constant defined mid-file

xbrl_21.py:557

# Standard XBRL label-linkbase URIs (XBRL 2.1 §5.2.2).
NS_XML = "http://www.w3.org/XML/1998/namespace"

All other NS_* constants (NS_XBRLI, NS_LINK, NS_XLINK, etc.) are grouped at the top of the file (lines 42–57). Defining NS_XML mid-file makes it easy to miss. Minor, but can confuse readers who grep for namespace constants.


Minor: redundant str() wrapping on typed field

xbrl_21.py:154

str(fact.unit_ref),   # unit_ref is typed str in BundleFact

BundleFact.unit_ref: str — the str() wrapping is a no-op. Nit only.


Observations (no action needed)

  • Fact dedup key design (xbrl_21.py:146-158): Deduplicating on the full (element_qname, context_ref, unit_ref, value, decimals) tuple is correct — same-aspects-different-value is preserved for the validator. The TestFactDedup tests confirm both paths.

  • Association URI scoping fix (jsonld.py:310): The removesuffix("Link") → group-segment fix is exactly right. Before this, a presentation+calculation share of the same structure_id would mint the same IRI for index-0 arcs in both groups, merging their xlink:from triples and failing AssociationShape. Clean one-liner fix.

  • _source_calculation_arcs session safety (bundle.py:515-517): The associations.extend(result) pattern is safe because _source_calculation_arcs returns a materialized list before .extend() runs. No generator aliasing risk.

  • Sample outputs: The JSON-LD line count growth (~744 lines per example) is consistent with the new calculation arcs and label nodes being emitted. The ZIP size jump (7KB→12KB) lines up with the addition of report-lab.xml and richer report-cal.xml. Looks credible.

  • Tests overall: The new TestLabelLinkbase, TestFactDedup, and parametrized test_format_value are well-designed and cover meaningful edge cases. The regression tests for xbrli:balance suppression on abstract/non-monetary concepts (test_abstract_concept_does_not_carry_balance, test_non_monetary_concept_does_not_carry_balance) are good Arelle conformance guards.


- Gate report-pre.xml on presentation_links so the file matches its
  (already-gated) linkbaseRef — no unreferenced empty linkbase in the
  zip; extend the no-arcs test to assert the file is absent too.
- Add unit tests for _source_calculation_arcs (the balance-legality
  heuristic): same-balance summation kept, mixed-balance CF subtotal
  dropped, per-subtotal drop when any child is illegal, no-host arc
  excluded, negative-weight legal for opposite balances, empty inputs.
- Move NS_XML up with the other NS_* constants.
- Drop redundant str() on the typed unit_ref/decimals dedup-key fields.
@jfrench9 jfrench9 merged commit 29d4753 into main May 30, 2026
7 checks passed
@jfrench9 jfrench9 deleted the feature/serialization-cleanup branch May 30, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant