Skip to content

Preserve child wrapper identity in the native AST facade#391

Open
adamziel wants to merge 8 commits intocodex/native-lazy-ast-facadefrom
adamziel/ast-child-identity
Open

Preserve child wrapper identity in the native AST facade#391
adamziel wants to merge 8 commits intocodex/native-lazy-ast-facadefrom
adamziel/ast-child-identity

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented Apr 30, 2026

Stacked on #381.

#381's review surfaced a real semantics regression in the lazy native AST facade: every accessor on WP_MySQL_Native_Parser_Node calls into Rust and returns a freshly constructed PHP wrapper, so get_first_child_node() returns a different object every time. WP_Parser_Node has always given callers stable child identity — attach state to a child once, walk past it, walk back, the state is still there — and the lazy native facade is meant to keep that contract intact. This restores it.

A per-AST WP_MySQL_Native_AST_Cache is created lazily on the root and shared by reference with every wrapper that gets interned through it. Each accessor looks the returned wrapper's native_node_index up in the cache and either returns the canonical instance or registers the new one. materialize_native_children() pulls children through the same cache so any mutation a caller made through get_first_child_node() before the parent went through append_child() survives into $this->children — same instance, same mutations.

Tokens are unchanged. The public token API has no mutators and no callers in this repo rely on WP_MySQL_Token identity; if that becomes a need we can extend the cache.

What's in here

  • class-wp-mysql-native-ast-cache.php — small holder, one per AST.
  • Native node accessors run results through intern() / intern_all() / intern_nodes().
  • materialize_native_children() reuses the interned wrappers so prior mutations don't get lost.
  • Regression test covering same-instance reads, descendant/child identity sharing, and the mutate-then-materialize edge case. Skips when the native extension isn't loaded.
  • run-native-ast-walk-benchmark.php parses the MySQL server suite, walks each AST, and reports parsed, walked_nodes, duration, peak_mem, and an identity-stability flag.
  • Native AST Walk Perf workflow runs the benchmark on this PR and on the PR base (codex/native-lazy-ast-facade) on the same runner, so the identity-cache cost is measured apples-to-apples on every push.

Performance

Benchmarked on CI against the no-cache baseline, same runner, same corpus (69,567 queries, 4.8M walked nodes). Full numbers in PR comments — links below.

scenario baseline (no cache) this PR delta
native parse only 1.28s 1.28s 0%
native walk duration 3.52s 3.33s +5% (+0.19s)
native walk qps 19,766 20,884 −5%
native walk peak memory 50.0MB 60.5MB +10.5MB (+21%)
native walk identity stable FALSE true regression fixed

Hot-path optimization in intern_all() (cache reference hoisted out of the loop, per-item logic inlined) plus a typed intern_nodes() fast path for accessors that return only nodes brought the time penalty down from an initial +17% to +5%. Memory delta is structural — one retained PHP wrapper per visited node — and is the price of stable identity.

Native walk is still ~5× faster than the pure-PHP path while now preserving WP_Parser_Node semantics.

Test plan

  • PHPUnit suite passes on the pure-PHP path.
  • PHPUnit suite passes with the native extension loaded; the new identity tests run instead of skipping.
  • Native AST Walk Perf workflow reports the cache cost vs. the PR base on every push.

## Summary
- reuse one `WP_MySQL_Parser` instance inside the SQLite driver and
reset its token stream per query
- add `reset_tokens()` to the PHP parser polyfill and the Rust native
parser
- restore native parser-node accessor fast paths in
`WP_MySQL_Native_Parser_Node`, while keeping PHP child materialization
for mutation
- fix the local native extension build helper for Nix/libclang bindgen
by undefining `__SSE2__` during binding generation

## Stack
This is the top PR in the native MySQL lexer/parser stack. The stack is
split so each GitHub diff shows one reviewable concern:

1. [#384 Extract MySQL lexer and parser
polyfills](#384)
   - `trunk` -> `codex/native-parser-php-facade`
   - extraction-only PHP refactor
- moves the existing PHP lexer/parser implementations into polyfill
classes
- keeps public `WP_MySQL_Lexer` and `WP_MySQL_Parser` as thin PHP
subclasses

2. [#385 Add optional native parser
routing](#385)
- `codex/native-parser-php-facade` ->
`codex/native-parser-class-routing`
   - adds fallback `WP_MySQL_Native_*` PHP classes
- routes the public lexer/parser classes through native classes when the
Rust extension provides them
   - adds the minimal PHP grammar-export bridge for the native parser

3. [#386 Add lazy native parser node
facade](#386)
- `codex/native-parser-class-routing` ->
`codex/native-parser-node-facade`
   - keeps `WP_Parser_Node` as the plain PHP tree node
- adds `WP_MySQL_Native_Parser_Node extends WP_Parser_Node` for
native-backed lazy AST nodes
- keeps native AST handles and native accessor delegation out of the
base node class

4. [#381 Add lazy native AST
facade](#381)
   - `codex/native-parser-node-facade` -> `codex/native-lazy-ast-facade`
- implements the Rust lexer/parser extension and lazy native AST facade
   - makes the Rust extension instantiate `WP_MySQL_Native_Parser_Node`
- adds native-extension CI coverage for the SQLite driver and WordPress
PHPUnit tests
   - includes the local SQLite facade smoke benchmark

5. [#387 Cache native grammar on parser grammar
object](#387)
- `codex/native-lazy-ast-facade` ->
`codex/native-parser-object-grammar-cache`
   - restores the object-attached native grammar cache
   - adds only `WP_Parser_Grammar::$native_grammar` on the PHP side
- removes the Rust content-hash cache that walked the whole exported
grammar on every parser construction

6. This PR, [#388 Speed up native AST
materialization](#388)
- `codex/native-parser-object-grammar-cache` ->
`codex/native-parser-bulk-materialization`
- optimizes native-to-PHP AST access after the grammar-cache performance
restoration
- reuses the SQLite driver's parser instance instead of constructing it
per query

## Why
The native lexer/parser itself is fast, but the PHP-facing path can lose
that benefit if each query repeatedly rebuilds native parser state or
forces full PHP AST materialization. On the current stack, #387 already
removes the large grammar export/hash cost. This PR removes the
remaining per-query parser construction churn and restores the native
AST accessor path for descendant-heavy SQLite driver workloads.

## Measurements
Environment: local PHP 8.2 via the native build helper, release Rust
extension, current top of this PR.

Focused constructor/reset benchmark over 5000 unique SELECT queries:

| Phase | Time |
| --- | ---: |
| native tokenize | 22.62 us/query |
| fresh native parser constructor only | 2.31 us/query |
| reusable parser `reset_tokens()` only | 0.32 us/query |
| reusable parser reset + parse + `get_descendants()` | 157.06 us/query
|
| constructor/reset ratio | 7.3x |

The previously reported ~622 us/query constructor cost does not
reproduce on this stack because #387 already caches the native grammar
on the PHP grammar object. Parser reuse still removes most of the
remaining constructor overhead.

SQLite facade smoke workload:

Command:

```bash
TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh
```

| Workload | PHP fallback | Native extension | Speedup |
| --- | ---: | ---: | ---: |
| 250 generated queries, including 1 x 2000-row insert | 4.060s | 0.525s
| 7.73x |

## Testing
- `cargo fmt --check`
- `git diff --check`
- `composer run check-cs`
- `composer run test` from `packages/mysql-on-sqlite`
- `php -d
extension=packages/mysql-on-sqlite/ext/wp-mysql-parser/target/release/libwp_mysql_parser.so
packages/mysql-on-sqlite/vendor/bin/phpunit -c
packages/mysql-on-sqlite/phpunit.xml.dist`
- `TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh`
The native parser extension constructs a fresh WP_MySQL_Native_Parser_Node
on every accessor call, so two reads of the same logical node returned
different PHP objects and any state a caller attached to the first wrapper
was invisible through the second. WP_Parser_Node has always given callers
stable child identity, and the lazy native facade is meant to keep that
contract intact — this restores it.

A per-AST identity map is created lazily on the root and shared by every
interned wrapper. Each accessor that returns a node looks the index up
in the map and returns the canonical instance, discarding the freshly
constructed one. Materialization pulls children through the same map so
mutations a caller made through get_first_child_node() before triggering
append_child() survive into $this->children.

Adds a regression test that exercises identity across child, descendant,
and post-materialization reads, plus a walk benchmark and a CI workflow
that reports parse + walk time and peak memory for the PHP and native
paths so the cache cost is measurable on every PR.
@adamziel
Copy link
Copy Markdown
Collaborator Author

Perf measurement (CI run 25179776990)

Benchmarked on the same runner: parse the full MySQL server-suite corpus (69,567 queries) and walk every AST exhaustively via get_descendants() + a get_first_child_node() re-read, with and without the identity cache. The baseline is codex/native-lazy-ast-facade (PR base, no cache) built and run on the same hardware.

scenario duration qps peak mem walked nodes identity stable
pure-PHP, parse only 10.71s 6,495 68.0MB true
pure-PHP, parse + walk 12.84s 5,419 70.0MB 4,786,316 true
native, parse only 0.98s 70,631 30.0MB true
native, parse + walk (baseline, no cache) 2.72s 25,620 50.0MB 4,786,316 FALSE
native, parse + walk (this PR, with cache) 3.18s 21,867 62.5MB 4,786,316 true

Identity-cache cost on the walk path

metric baseline this PR delta
duration 2.72s 3.18s +0.47s (+17%)
qps 25,620 21,867 −15%
peak memory 50.0MB 62.5MB +12.5MB (+25%)

The baseline identity check fails (identity_ok=FALSE) — that's the regression #381's review flagged, reproduced. With the cache it passes. The cost is one PHP wrapper retained per walked node, which on this corpus is ~4.8M nodes resolved into ~12.5MB of additional peak memory, plus the per-call hashmap lookup that adds 17% to the walk time. Parse-only timings are within run-to-run noise (0.98 vs 0.99s), confirming the cost only shows up when callers actually traverse — which is exactly the contract being preserved.

Net: the native walk is still faster than pure-PHP (3.18s vs 12.84s) and uses less memory, so the cache fits comfortably under the perf budget #381 set out to deliver while restoring WP_Parser_Node identity semantics.

intern_all hoists the cache lookup out of the loop and inlines what was
a per-item method call to intern(). For accessors whose Rust bridge
returns only nodes — get_child_nodes / get_descendant_nodes — a typed
intern_nodes() variant skips the instanceof check entirely. The walk
benchmark exercises the descendants accessors over ~4.8M nodes per run,
so even small per-item savings add up.
@adamziel
Copy link
Copy Markdown
Collaborator Author

Perf after the hot-path optimization (CI run 25181172304, commit 6c94572)

Same corpus (69,567 queries, 4.8M walked nodes), same workflow, baseline rebuilt from codex/native-lazy-ast-facade on the same runner. Note this runner was a bit slower than the previous run overall (PHP walk drifted from 12.84s → 16.38s), so the right comparison is within-run baseline vs PR.

scenario duration qps peak mem identity stable
native, parse only (baseline) 1.28s 54,142 30.0MB true
native, parse only (this PR) 1.28s 54,352 30.0MB true
native, parse + walk (baseline, no cache) 3.52s 19,766 50.0MB FALSE
native, parse + walk (this PR, optimized cache) 3.33s 20,884 60.5MB true

Identity-cache cost — before vs after the hot-path fix

metric before optimization after optimization
walk duration delta +17% (+0.47s) +5% (+0.19s)*
walk qps delta −15% −5%
peak memory delta +12.5MB (+25%) +10.5MB (+21%)

*on a different (slower) runner; both numbers are vs the same-run baseline.

What changed

intern_all() previously called intern() per item, which re-resolved the cache and re-checked types on every one of the ~4.8M nodes walked. Now the cache reference and &$cache->nodes are hoisted out of the loop, the per-item logic is inlined, and a typed intern_nodes() fast path is used by get_child_nodes() / get_descendant_nodes() to skip the instanceof check entirely (the Rust bridge guarantees node-only results for those accessors).

Net: most of the cache cost on time is gone — the walk is now within ~5% of the no-identity baseline while still preserving WP_Parser_Node semantics. Native walk is now ~5× faster than pure-PHP (3.33s vs 16.38s) on this runner. Memory delta is unchanged in shape (one PHP wrapper retained per visited node), as expected — the cache is the memory.

The walk benchmark we already had is cache-miss heavy (one walk per AST,
every node visited once), so the identity cache shows up there as a
small overhead rather than a win. The cache is supposed to pay back in
hit-heavy patterns: re-walks of the same tree, repeated child reads at
the root, and translator-style passes that re-enter visited subtrees.

Adds three modes (--mode=rewalk|reread|subtree) and runs each on both
the PR and the baseline so the comparison is apples-to-apples on the
same runner, same corpus.
@adamziel
Copy link
Copy Markdown
Collaborator Author

Hit-heavy benchmarks (CI run 25189505049, commit 2c07dac)

Same corpus (69,567 queries), same runner, baseline = codex/native-lazy-ast-facade (no cache). All four scenarios run through the same script.

scenario baseline this PR time delta mem delta identity (baseline → PR)
walk (single full pass, 4.8M nodes) 3.62s / 50.0MB 3.52s / 60.5MB −3% +10.5MB FALSE → true
rewalk x10 (10 full passes, 47.9M visits) 22.16s / 70.0MB 18.37s / 88.5MB −17% +18.5MB true → true
reread x20 (root child × 20, 1.4M reads) 2.04s / 30.0MB 2.59s / 38.0MB +27% +8.0MB FALSE → true
subtree x5 (4.8M descendants × 5 child reads, 15.4M calls) 11.68s / 50.0MB 15.32s / 64.5MB +31% +14.5MB true → true

Reading the numbers

The cache wins where it's supposed torewalk saves 17% wall time across 10 full re-traversals because the second-and-onward passes return cached wrappers instead of reallocating ~4.8M of them.

The cache loses on subtree — and this is the load-bearing finding. Subtree mode populates the cache to 4.8M entries during the first walk, then does 24M get_first_child_node() calls. Every one of those is a WP_MySQL_Native_AST_Cache->nodes lookup against a 4.8M-entry PHP hashmap. PHP's zend_array probe cost on a table that size, multiplied by 24M, dominates — and the per-call work the baseline avoids (just allocate, no bookkeeping) ends up cheaper than probing a fat hashmap. reread shows the same effect at smaller scale: per-call overhead beats raw allocation when the workload is tiny.

This is exactly the workload shape a translator/rewriter pass produces — walk once to find candidates, then re-enter subtrees by index. So while the correctness fix is right, the PHP-side cache adds 27–31% to the realistic re-entry workload.

Would a Rust-side cache fix this?

Yes, materially. The two cost centers a Rust cache eliminates that PHP can't:

  1. zend_array probe replaced by std::HashMap probe. Rust's hashmap probe on a 4.8M-entry table is roughly an order of magnitude cheaper per call than PHP's zend_hash_index_find — no zval boxing, no internal value type checks, better cache locality. On 24M probes (subtree mode), this is the difference between a multi-second tax and a sub-second one.
  2. Cache hit skips ClassEntry::new() + 4 zend_update_property calls. Today even with the cache, Rust still allocates a wrapper before PHP gets to discard it. Moving the cache check into Rust means a hit returns a refcount-bumped pointer to the existing wrapper — no allocation, no property writes.

Estimated impact, holding everything else constant:

  • subtree x5: +31% → likely break even or single-digit % vs baseline.
  • rewalk x10: −17% → likely −25 to −30% (the win extends).
  • reread x20: +27% → likely break even (per-call overhead drops below alloc cost).
  • walk (miss-heavy): +5% → ~similar (misses still pay alloc + insert).

Memory delta is structural and unchanged — one retained wrapper per visited node, regardless of where the cache lives.

Recommendation

The PHP-side cache is correctness-first and ships today, but it has a real performance regression on translator-style re-entry workloads (the same workloads this codebase actually runs). The Rust-side cache is the right architecture; the numbers above quantify the headroom (10–30% recoverable on hit-heavy scenarios).

Two paths:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant