Clean up XBRL serialization output and add bundle support#708
Conversation
Addresses the XBRL-validity issues Charlie Hoffman flagged on the first shipped serialization (XBRL Cloud + Instar): duplicate facts, missing labels, empty/absent calculation linkbase. All framework-independent (rs-gaap output unchanged in framework; mini-as-reporting-framework is out of scope). - Dedupe facts in the instance: the same concept value flows through several FactSets (NetIncomeLoss via IS/CF/SE), yielding identical <concept contextRef unitRef>value tuples. Collapse on the full tuple incl. value, so a genuine inconsistency (same aspects, different value) is still surfaced to the validator. - Emit a label linkbase (report-lab.xml) from the element display name (standard label role, xml:lang=en); skip concepts whose only text echoes the QName local part. - Source the calculation linkbase: calc arcs live on separate rs-gaap-calculation Structures and never reached the bundle. Pull calc arcs among the report's concepts and host each under the rendered Network (shared presentation/calculation ELR). Emit a subtotal's children only when every child's stored weight sign is XBRL-legal for the endpoints' balance types (§5.1.1.2 Table 6) — drops the cash-flow rollups whose indirect-method children mix debit/credit (the canonical case filers omit), keeps the well-behaved BS/IS summations. - Suppress empty calc/def linkbase files (an empty <link:linkbase/> reads as a missing-relations bug); the linkbaseRef was already gated. - Scope the JSON-LD Association IRI by link group so a presentation arc and a sourced calc arc at the same index under one Structure don't collapse onto one node (which would fail AssociationShape). Validated end-to-end on all three demos (lemonade, World Online, roboledger): no duplicate facts, 62 calculation arcs, 75 labels, SHACL conforms, Arelle valid XBRL 2.1. sample_output refreshed.
|
@claude please review this PR |
|
Claude finished @jfrench9's task in 3m 5s —— View job PR Review: Clean up XBRL serialization output and add bundle supportTasks
Overall this is a solid, well-motivated cleanup. The fact deduplication, conditional linkbase emission, association URI scoping fix, and label linkbase are all correct and address real XBRL conformance problems. A few issues to look at before merging: Bug:
|
- Gate report-pre.xml on presentation_links so the file matches its (already-gated) linkbaseRef — no unreferenced empty linkbase in the zip; extend the no-arcs test to assert the file is absent too. - Add unit tests for _source_calculation_arcs (the balance-legality heuristic): same-balance summation kept, mixed-balance CF subtotal dropped, per-subtotal drop when any child is illegal, no-host arc excluded, negative-weight legal for opposite balances, empty inputs. - Move NS_XML up with the other NS_* constants. - Drop redundant str() on the typed unit_ref/decimals dedup-key fields.
Summary
Follow-on to #707, addressing the XBRL-validity issues Charlie Hoffman flagged on the first shipped serialization (validated against XBRL Cloud + Instar). All fixes are framework-independent — rs-gaap stays the report framework. The canonical RDF ontology is the interop layer; XBRL 2.1 is the bridge (mini-as-reporting-framework is explicitly out of scope).
Changes
All four changed source/test files are modifications to existing modules (no new module or runtime dependency):
xbrl/xbrl_21.py— fact dedup; label linkbase (report-lab.xml); conditional emission of every linkbase file (presentation/calc/def/label all gated on having content);NS_XMLconstant.serialization/bundle.py—_source_calculation_arcs: sources calc arcs onto the rendered Network's ELR and emits a subtotal only when every child's stored weight sign is XBRL-legal for the endpoints' balance types (§5.1.1.2 Table 6).rdf/jsonld.py— scope the Association IRI by link group so a presentation arc and a sourced calc arc at the same index under one Structure don't collapse onto one node (which failedAssociationShape).tests/—test_xbrl_emitter.py(dedup, labels, conditional linkbases) + newtest_calc_sourcing.py(the balance-legality heuristic).What it fixes (Charlie's review)
(concept, contextRef, unitRef, value, decimals)tuple. The same value legitimately flows through several FactSets (NetIncomeLoss via IS/CF/SE); deduping on the full tuple removes the redundant copies while still surfacing a genuine inconsistency (same aspects, different value) to the validator.report-lab.xmlfrom the element display name (standard label role,xml:lang=en); concepts whose only text echoes the QName local part stay unlabelled.Validation
End-to-end on all three demos (lemonade, World Online, roboledger): 0 duplicate facts, 62 calculation arcs, 75 labels, SHACL conforms, Arelle valid XBRL 2.1.
sample_output/refreshed. Full serialization suite (122 tests) green; ruff/format/basedpyright clean.Out of scope (deliberate)