Linkage state machine + set_rust_owned for orphan ownership#195
Merged
Conversation
Replace the `_Node.unlinked: bool` field with a 3-variant `Linkage` enum
(`Linked` / `Unlinked` / `RustOwned`) that drives `_Node::Drop`. The old
"free if unlinked" heuristic conflated tree-detachment with Rust
ownership and could free memory the source document still considered
live (corrupting the dictionary on the next `xmlCopyNode` against a
sibling).
The new Drop rules:
* `Linked` — source document owns; never free.
* `Unlinked` — `xmlUnlinkNode` ran but `node->doc` is set; free only
if `node->doc == NULL` (true C-level orphan).
* `RustOwned` — caller transferred ownership via `set_rust_owned`;
always free.
Public API additions: `Node::set_rust_owned`, `Node::is_rust_owned`.
`set_rust_owned` debug-asserts on `Linked`-input. `set_linked` (crate
internal, called from `add_child` etc.) debug-asserts on `RustOwned`-
input to catch re-link-after-takeover misuse. `debug_assert!` is fully
elided in release builds, so hot paths keep their prior cost.
`unlink_node`'s rustdoc spells out the three valid post-unlink
dispositions: re-attach, copy out via `Document::dup_node_into_new_doc`,
or call `set_rust_owned` to free. An unlinked node that does none of
these leaks until process exit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `Document::dup_node_into_new_doc(&Node) -> Result<Document, ()>`: build a fresh document whose root is a deep copy of the given node's subtree. Unlike `import_node`, it does not require the source to be unlinked and does not mutate the source wrapper's state. This is the canonical path for repeatedly extracting subtrees from a single source document — `xmlDocCopyNode(src, dst, 1)` returns NULL on the second sibling within one source doc; this method works around that by copying into a fresh doc per call. Drop the legacy `node.set_linked()` workaround inside `dup_node_into_new_doc`. With the new `_Node::Drop` rules, a detached subtree whose `node->doc` still points at the source is treated as doc-owned and not freed by the wrapper; the workaround is unneeded. Harden `Document::import_node` to reject `RustOwned` source nodes. Without the early return, a release build would silently no-op on the internal `set_linked` transition (debug-assert is compiled out) and set up a later double-free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`RoNode` is a `Copy` borrow into a document tree owned by a parent `Document`. It has no `Drop` and cannot take ownership of the C allocation, so `is_rust_owned` is unconditionally `false`. Mirrors `Node::is_rust_owned` so callers can write code generic over the two node types without special-casing `RoNode`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new test files exercising the wrapper's lifetime model: * tests/dup_node_into_new_doc_tests.rs (9 tests) — basic, multi-sibling extraction, source-dropped-first, post-unlink chain, post-XPath mutation, repeated-namespaces, large-doc stress (uses the new resources/large_doc.xml fixture), and mixed-XPath at scale. * tests/xml_copy_invariant_tests.rs (10 tests) — direct libxml2 FFI invariants. Verify that source-doc state remains intact across xmlCopyDoc / xmlCopyNode round-trips, that consecutive xmlCopyNode calls against the same source don't corrupt the dictionary, and that the split-extract pattern lifecycles cleanly. * tests/rust_owned_tests.rs (11 tests) — `set_rust_owned` happy paths (default state, idempotent, sticky-across-clones, post-dup, drop-before-source-doc, mixed reattach, descendant-wrappers-drop- before-parent, 16-section stress), plus misuse-mode debug-assert guards (`#[should_panic]` tests for `set_rust_owned` on Linked input and re-link via add_child) and the `import_node` rejection test. All 49 tests pass; the rust_owned suite is leak-clean under `-Z sanitizer=address`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Patch-version bump to bindgen 0.72.1 for an upstream codegen fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`cargo clippy --fix --all-targets` plus a few manual touch-ups: * `bool_assert_comparison` — auto-fixed in tests/c14n.rs, tests/tree_tests.rs, tests/xml_copy_invariant_tests.rs, tests/xpath_tests.rs. * `assertions_on_constants` — replaced `assert!(false)` with `unreachable!` in tests/tree_tests.rs and dropped a no-op `assert!(true, ...)` in tests/xpath_tests.rs. * `manual_c_str_literals` — switched to `c"..."` literals in tests/xml_copy_invariant_tests.rs. * `doc_lazy_continuation` — added paragraph-break blank `///` lines after embedded bullet/numbered lists in src/tree/node.rs and tests/rust_owned_tests.rs rustdoc. Remaining warnings (13) all originate in the bindgen-generated `bindings.rs` under target/; they regenerate on every build and are not editable from this crate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the
_Node.unlinked: boolfield with a 3-variantLinkageenum(
Linked/Unlinked/RustOwned) that drives_Node::Drop. The old"free if unlinked" heuristic conflated tree-detachment with Rust ownership
and could fire
xmlFreeNodeagainst memory the source document stillconsidered live — corrupting the dictionary on the next
xmlCopyNodeagainst a sibling, with no last-error.
The new Drop rules:
Linked— source document owns; never free.Unlinked—xmlUnlinkNoderan butnode->docis still set; freeonly if
node->doc == NULL(true C-level orphan).RustOwned— caller transferred ownership viaset_rust_owned;always free.
New API
Node::set_rust_owned(&self)— transfer the C allocation's lifetime tothe wrapper. After this, the last
Nodeclone'sDropcallsxmlFreeNode, regardless ofnode->doc. Sticky across clones.Node::is_rust_owned(&self) -> bool(also onRoNode, alwaysfalse).Document::dup_node_into_new_doc(&Node) -> Result<Document, ()>— deepcopy a subtree into a fresh, independent document. Works around
xmlDocCopyNodereturning NULL on the second call within one sourcedoc.
Misuse-mode guards
set_rust_owneddebug-asserts onLinkedinput.set_linked(crate-internal, called fromadd_child/add_*_sibling)debug-asserts on
RustOwnedinput to catch re-link-after-takeover.Document::import_nodereturnsErr(())onRustOwnedsource —release-mode guard since the
set_linkeddebug-assert is compiled out.debug_assert!is fully elided in release builds, so hot paths keeptheir prior cost.
Test plan
cargo test --tests— 98 tests across 11 files, all passset_rust_ownedhappy paths: default state, idempotent, stickyacross clones, post-
dup_node_into_new_doc, drop-before-source-doc,mixed reattach, descendant-wrappers-drop-before-parent, 16-section
stress
#[should_panic]tests in debug buildsimport_noderejectsRustOwnedsource (release-mode regressiontest)
-Z sanitizer=address): zero leaks across allrust_owned_*tests; the pre-existing
node_can_unbindleak (379 B / 6 allocs) ispreserved as the documented "unlinked-and-forgotten" baseline that
set_rust_ownedis the cure forNote on documented constraints
Three lifetime constraints are documented but not enforced in release:
RustOwnedorphan must drop before its source document is freed(libxml2's
xmlFreeNodewalksnode->doc->dict).set_rust_ownedon aLinkednode.RustOwnednode viaadd_child/add_*_sibling/import_node.Each is debug-asserted; misuse in release is silent UB. This matches the
prior shape of the crate's safety contracts.
Motivation
The latexml-oxide post-processing pipeline implements
--splitby walkingthe source document and, for each chapter or section, unlinking the
subtree, deep-copying it into a fresh sub-document, serializing it as a
standalone page, and discarding the source-side detached node. This
exercises every edge in the new state machine: the
Unlinked → RustOwnedtransition replaces the prior UAF (the second
xmlCopyNodereturned NULLbecause the first wrapper drop had freed memory the source dict still
referenced),
dup_node_into_new_docis the per-page extraction primitive,and
set_rust_ownedreclaims each extracted node so the per-documentallocation cost stays bounded across hundreds of split pages.