Preserve child wrapper identity in the native AST facade by adamziel · Pull Request #391 · WordPress/sqlite-database-integration

adamziel · 2026-04-30T17:09:21Z

Stacked on #381.

#381's review surfaced a real semantics regression in the lazy native AST facade: every accessor on WP_MySQL_Native_Parser_Node calls into Rust and returns a freshly constructed PHP wrapper, so get_first_child_node() returns a different object every time. WP_Parser_Node has always given callers stable child identity — attach state to a child once, walk past it, walk back, the state is still there — and the lazy native facade is meant to keep that contract intact. This restores it.

A per-AST WP_MySQL_Native_AST_Cache is created lazily on the root and shared by reference with every wrapper that gets interned through it. Each accessor looks the returned wrapper's native_node_index up in the cache and either returns the canonical instance or registers the new one. materialize_native_children() pulls children through the same cache so any mutation a caller made through get_first_child_node() before the parent went through append_child() survives into $this->children — same instance, same mutations.

Tokens are unchanged. The public token API has no mutators and no callers in this repo rely on WP_MySQL_Token identity; if that becomes a need we can extend the cache.

What's in here

class-wp-mysql-native-ast-cache.php — small holder, one per AST.
Native node accessors run results through intern() / intern_all() / intern_nodes().
materialize_native_children() reuses the interned wrappers so prior mutations don't get lost.
Regression test covering same-instance reads, descendant/child identity sharing, and the mutate-then-materialize edge case. Skips when the native extension isn't loaded.
run-native-ast-walk-benchmark.php parses the MySQL server suite, walks each AST, and reports parsed, walked_nodes, duration, peak_mem, and an identity-stability flag.
Native AST Walk Perf workflow runs the benchmark on this PR and on the PR base (codex/native-lazy-ast-facade) on the same runner, so the identity-cache cost is measured apples-to-apples on every push.

Performance

Benchmarked on CI against the no-cache baseline, same runner, same corpus (69,567 queries, 4.8M walked nodes). Full numbers in PR comments — links below.

scenario	baseline (no cache)	this PR	delta
native parse only	1.28s	1.28s	0%
native walk duration	3.52s	3.33s	+5% (+0.19s)
native walk qps	19,766	20,884	−5%
native walk peak memory	50.0MB	60.5MB	+10.5MB (+21%)
native walk identity stable	FALSE	true	regression fixed

Hot-path optimization in intern_all() (cache reference hoisted out of the loop, per-item logic inlined) plus a typed intern_nodes() fast path for accessors that return only nodes brought the time penalty down from an initial +17% to +5%. Memory delta is structural — one retained PHP wrapper per visited node — and is the price of stable identity.

Native walk is still ~5× faster than the pure-PHP path while now preserving WP_Parser_Node semantics.

First measurement: Preserve child wrapper identity in the native AST facade #391 (comment)
After hot-path optimization: Preserve child wrapper identity in the native AST facade #391 (comment)

Test plan

PHPUnit suite passes on the pure-PHP path.
PHPUnit suite passes with the native extension loaded; the new identity tests run instead of skipping.
Native AST Walk Perf workflow reports the cache cost vs. the PR base on every push.

## Summary - reuse one `WP_MySQL_Parser` instance inside the SQLite driver and reset its token stream per query - add `reset_tokens()` to the PHP parser polyfill and the Rust native parser - restore native parser-node accessor fast paths in `WP_MySQL_Native_Parser_Node`, while keeping PHP child materialization for mutation - fix the local native extension build helper for Nix/libclang bindgen by undefining `__SSE2__` during binding generation ## Stack This is the top PR in the native MySQL lexer/parser stack. The stack is split so each GitHub diff shows one reviewable concern: 1. [#384 Extract MySQL lexer and parser polyfills](#384) - `trunk` -> `codex/native-parser-php-facade` - extraction-only PHP refactor - moves the existing PHP lexer/parser implementations into polyfill classes - keeps public `WP_MySQL_Lexer` and `WP_MySQL_Parser` as thin PHP subclasses 2. [#385 Add optional native parser routing](#385) - `codex/native-parser-php-facade` -> `codex/native-parser-class-routing` - adds fallback `WP_MySQL_Native_*` PHP classes - routes the public lexer/parser classes through native classes when the Rust extension provides them - adds the minimal PHP grammar-export bridge for the native parser 3. [#386 Add lazy native parser node facade](#386) - `codex/native-parser-class-routing` -> `codex/native-parser-node-facade` - keeps `WP_Parser_Node` as the plain PHP tree node - adds `WP_MySQL_Native_Parser_Node extends WP_Parser_Node` for native-backed lazy AST nodes - keeps native AST handles and native accessor delegation out of the base node class 4. [#381 Add lazy native AST facade](#381) - `codex/native-parser-node-facade` -> `codex/native-lazy-ast-facade` - implements the Rust lexer/parser extension and lazy native AST facade - makes the Rust extension instantiate `WP_MySQL_Native_Parser_Node` - adds native-extension CI coverage for the SQLite driver and WordPress PHPUnit tests - includes the local SQLite facade smoke benchmark 5. [#387 Cache native grammar on parser grammar object](#387) - `codex/native-lazy-ast-facade` -> `codex/native-parser-object-grammar-cache` - restores the object-attached native grammar cache - adds only `WP_Parser_Grammar::$native_grammar` on the PHP side - removes the Rust content-hash cache that walked the whole exported grammar on every parser construction 6. This PR, [#388 Speed up native AST materialization](#388) - `codex/native-parser-object-grammar-cache` -> `codex/native-parser-bulk-materialization` - optimizes native-to-PHP AST access after the grammar-cache performance restoration - reuses the SQLite driver's parser instance instead of constructing it per query ## Why The native lexer/parser itself is fast, but the PHP-facing path can lose that benefit if each query repeatedly rebuilds native parser state or forces full PHP AST materialization. On the current stack, #387 already removes the large grammar export/hash cost. This PR removes the remaining per-query parser construction churn and restores the native AST accessor path for descendant-heavy SQLite driver workloads. ## Measurements Environment: local PHP 8.2 via the native build helper, release Rust extension, current top of this PR. Focused constructor/reset benchmark over 5000 unique SELECT queries: | Phase | Time | | --- | ---: | | native tokenize | 22.62 us/query | | fresh native parser constructor only | 2.31 us/query | | reusable parser `reset_tokens()` only | 0.32 us/query | | reusable parser reset + parse + `get_descendants()` | 157.06 us/query | | constructor/reset ratio | 7.3x | The previously reported ~622 us/query constructor cost does not reproduce on this stack because #387 already caches the native grammar on the PHP grammar object. Parser reuse still removes most of the remaining constructor overhead. SQLite facade smoke workload: Command: ```bash TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh ``` | Workload | PHP fallback | Native extension | Speedup | | --- | ---: | ---: | ---: | | 250 generated queries, including 1 x 2000-row insert | 4.060s | 0.525s | 7.73x | ## Testing - `cargo fmt --check` - `git diff --check` - `composer run check-cs` - `composer run test` from `packages/mysql-on-sqlite` - `php -d extension=packages/mysql-on-sqlite/ext/wp-mysql-parser/target/release/libwp_mysql_parser.so packages/mysql-on-sqlite/vendor/bin/phpunit -c packages/mysql-on-sqlite/phpunit.xml.dist` - `TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh`

The native parser extension constructs a fresh WP_MySQL_Native_Parser_Node on every accessor call, so two reads of the same logical node returned different PHP objects and any state a caller attached to the first wrapper was invisible through the second. WP_Parser_Node has always given callers stable child identity, and the lazy native facade is meant to keep that contract intact — this restores it. A per-AST identity map is created lazily on the root and shared by every interned wrapper. Each accessor that returns a node looks the index up in the map and returns the canonical instance, discarding the freshly constructed one. Materialization pulls children through the same map so mutations a caller made through get_first_child_node() before triggering append_child() survive into $this->children. Adds a regression test that exercises identity across child, descendant, and post-materialization reads, plus a walk benchmark and a CI workflow that reports parse + walk time and peak memory for the PHP and native paths so the cache cost is measurable on every PR.

adamziel · 2026-04-30T17:34:36Z

Perf measurement (CI run 25179776990)

Benchmarked on the same runner: parse the full MySQL server-suite corpus (69,567 queries) and walk every AST exhaustively via get_descendants() + a get_first_child_node() re-read, with and without the identity cache. The baseline is codex/native-lazy-ast-facade (PR base, no cache) built and run on the same hardware.

scenario	duration	qps	peak mem	walked nodes	identity stable
pure-PHP, parse only	10.71s	6,495	68.0MB	—	true
pure-PHP, parse + walk	12.84s	5,419	70.0MB	4,786,316	true
native, parse only	0.98s	70,631	30.0MB	—	true
native, parse + walk (baseline, no cache)	2.72s	25,620	50.0MB	4,786,316	FALSE
native, parse + walk (this PR, with cache)	3.18s	21,867	62.5MB	4,786,316	true

Identity-cache cost on the walk path

metric	baseline	this PR	delta
duration	2.72s	3.18s	+0.47s (+17%)
qps	25,620	21,867	−15%
peak memory	50.0MB	62.5MB	+12.5MB (+25%)

The baseline identity check fails (identity_ok=FALSE) — that's the regression #381's review flagged, reproduced. With the cache it passes. The cost is one PHP wrapper retained per walked node, which on this corpus is ~4.8M nodes resolved into ~12.5MB of additional peak memory, plus the per-call hashmap lookup that adds 17% to the walk time. Parse-only timings are within run-to-run noise (0.98 vs 0.99s), confirming the cost only shows up when callers actually traverse — which is exactly the contract being preserved.

Net: the native walk is still 4× faster than pure-PHP (3.18s vs 12.84s) and uses less memory, so the cache fits comfortably under the perf budget #381 set out to deliver while restoring WP_Parser_Node identity semantics.

intern_all hoists the cache lookup out of the loop and inlines what was a per-item method call to intern(). For accessors whose Rust bridge returns only nodes — get_child_nodes / get_descendant_nodes — a typed intern_nodes() variant skips the instanceof check entirely. The walk benchmark exercises the descendants accessors over ~4.8M nodes per run, so even small per-item savings add up.

adamziel · 2026-04-30T18:05:35Z

Perf after the hot-path optimization (CI run 25181172304, commit `6c94572`)

Same corpus (69,567 queries, 4.8M walked nodes), same workflow, baseline rebuilt from codex/native-lazy-ast-facade on the same runner. Note this runner was a bit slower than the previous run overall (PHP walk drifted from 12.84s → 16.38s), so the right comparison is within-run baseline vs PR.

scenario	duration	qps	peak mem	identity stable
native, parse only (baseline)	1.28s	54,142	30.0MB	true
native, parse only (this PR)	1.28s	54,352	30.0MB	true
native, parse + walk (baseline, no cache)	3.52s	19,766	50.0MB	FALSE
native, parse + walk (this PR, optimized cache)	3.33s	20,884	60.5MB	true

Identity-cache cost — before vs after the hot-path fix

metric	before optimization	after optimization
walk duration delta	+17% (+0.47s)	+5% (+0.19s)*
walk qps delta	−15%	−5%
peak memory delta	+12.5MB (+25%)	+10.5MB (+21%)

_{*on a different (slower) runner; both numbers are vs the same-run baseline.}

What changed

intern_all() previously called intern() per item, which re-resolved the cache and re-checked types on every one of the ~4.8M nodes walked. Now the cache reference and &$cache->nodes are hoisted out of the loop, the per-item logic is inlined, and a typed intern_nodes() fast path is used by get_child_nodes() / get_descendant_nodes() to skip the instanceof check entirely (the Rust bridge guarantees node-only results for those accessors).

Net: most of the cache cost on time is gone — the walk is now within ~5% of the no-identity baseline while still preserving WP_Parser_Node semantics. Native walk is now ~5× faster than pure-PHP (3.33s vs 16.38s) on this runner. Memory delta is unchanged in shape (one PHP wrapper retained per visited node), as expected — the cache is the memory.

The walk benchmark we already had is cache-miss heavy (one walk per AST, every node visited once), so the identity cache shows up there as a small overhead rather than a win. The cache is supposed to pay back in hit-heavy patterns: re-walks of the same tree, repeated child reads at the root, and translator-style passes that re-enter visited subtrees. Adds three modes (--mode=rewalk|reread|subtree) and runs each on both the PR and the baseline so the comparison is apples-to-apples on the same runner, same corpus.

adamziel · 2026-04-30T21:21:06Z

Hit-heavy benchmarks (CI run 25189505049, commit `2c07dac`)

Same corpus (69,567 queries), same runner, baseline = codex/native-lazy-ast-facade (no cache). All four scenarios run through the same script.

scenario	baseline	this PR	time delta	mem delta	identity (baseline → PR)
walk (single full pass, 4.8M nodes)	3.62s / 50.0MB	3.52s / 60.5MB	−3%	+10.5MB	FALSE → true
rewalk x10 (10 full passes, 47.9M visits)	22.16s / 70.0MB	18.37s / 88.5MB	−17%	+18.5MB	true → true
reread x20 (root child × 20, 1.4M reads)	2.04s / 30.0MB	2.59s / 38.0MB	+27%	+8.0MB	FALSE → true
subtree x5 (4.8M descendants × 5 child reads, 15.4M calls)	11.68s / 50.0MB	15.32s / 64.5MB	+31%	+14.5MB	true → true

Reading the numbers

The cache wins where it's supposed to — rewalk saves 17% wall time across 10 full re-traversals because the second-and-onward passes return cached wrappers instead of reallocating ~4.8M of them.

The cache loses on subtree — and this is the load-bearing finding. Subtree mode populates the cache to 4.8M entries during the first walk, then does 24M get_first_child_node() calls. Every one of those is a WP_MySQL_Native_AST_Cache->nodes lookup against a 4.8M-entry PHP hashmap. PHP's zend_array probe cost on a table that size, multiplied by 24M, dominates — and the per-call work the baseline avoids (just allocate, no bookkeeping) ends up cheaper than probing a fat hashmap. reread shows the same effect at smaller scale: per-call overhead beats raw allocation when the workload is tiny.

This is exactly the workload shape a translator/rewriter pass produces — walk once to find candidates, then re-enter subtrees by index. So while the correctness fix is right, the PHP-side cache adds 27–31% to the realistic re-entry workload.

Would a Rust-side cache fix this?

Yes, materially. The two cost centers a Rust cache eliminates that PHP can't:

zend_array probe replaced by std::HashMap probe. Rust's hashmap probe on a 4.8M-entry table is roughly an order of magnitude cheaper per call than PHP's zend_hash_index_find — no zval boxing, no internal value type checks, better cache locality. On 24M probes (subtree mode), this is the difference between a multi-second tax and a sub-second one.
Cache hit skips ClassEntry::new() + 4 zend_update_property calls. Today even with the cache, Rust still allocates a wrapper before PHP gets to discard it. Moving the cache check into Rust means a hit returns a refcount-bumped pointer to the existing wrapper — no allocation, no property writes.

Estimated impact, holding everything else constant:

subtree x5: +31% → likely break even or single-digit % vs baseline.
rewalk x10: −17% → likely −25 to −30% (the win extends).
reread x20: +27% → likely break even (per-call overhead drops below alloc cost).
walk (miss-heavy): +5% → ~similar (misses still pay alloc + insert).

Memory delta is structural and unchanged — one retained wrapper per visited node, regardless of where the cache lives.

Recommendation

The PHP-side cache is correctness-first and ships today, but it has a real performance regression on translator-style re-entry workloads (the same workloads this codebase actually runs). The Rust-side cache is the right architecture; the numbers above quantify the headroom (10–30% recoverable on hit-heavy scenarios).

Two paths:

Land Preserve child wrapper identity in the native AST facade #391, stack the Rust cache as a follow-up. Correctness fix in users' hands now, perf win comes next.
Replace Preserve child wrapper identity in the native AST facade #391's implementation with the Rust cache. Single PR, but blocks the correctness fix on a non-trivial Rust refactor that needs CI iteration to land safely.

adamziel added 6 commits April 30, 2026 15:18

Cache native grammar on parser grammar object

09b9c1a

Fix code style and PHP 8.2 dynamic-property deprecation in identity test

a8a4283

Benchmark identity-cache cost against PR base in CI

68524ef

Fix baseline extension path in perf workflow

6bf1153

adamziel mentioned this pull request Apr 30, 2026

Move native AST identity cache into the Rust extension #392

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve child wrapper identity in the native AST facade#391

Preserve child wrapper identity in the native AST facade#391
adamziel wants to merge 8 commits intocodex/native-lazy-ast-facadefrom
adamziel/ast-child-identity

adamziel commented Apr 30, 2026 •

edited

Loading

Uh oh!

adamziel commented Apr 30, 2026

Uh oh!

adamziel commented Apr 30, 2026

Uh oh!

adamziel commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adamziel commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in here

Performance

Test plan

Uh oh!

adamziel commented Apr 30, 2026

Perf measurement (CI run 25179776990)

Identity-cache cost on the walk path

Uh oh!

adamziel commented Apr 30, 2026

Perf after the hot-path optimization (CI run 25181172304, commit 6c94572)

Identity-cache cost — before vs after the hot-path fix

What changed

Uh oh!

adamziel commented Apr 30, 2026

Hit-heavy benchmarks (CI run 25189505049, commit 2c07dac)

Reading the numbers

Would a Rust-side cache fix this?

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adamziel commented Apr 30, 2026 •

edited

Loading

Perf after the hot-path optimization (CI run 25181172304, commit `6c94572`)

Hit-heavy benchmarks (CI run 25189505049, commit `2c07dac`)