Skip to content

Add native Rust-based MySQL parser extension#381

Open
adamziel wants to merge 22 commits intotrunkfrom
codex/native-lazy-ast-facade
Open

Add native Rust-based MySQL parser extension#381
adamziel wants to merge 22 commits intotrunkfrom
codex/native-lazy-ast-facade

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented Apr 28, 2026

Summary

Adds an optional Rust-based PHP extension that takes over MySQL lexing and parsing when loaded. The pure-PHP lexer and parser stay in place as the default; when the extension is present, it pre-declares WP_MySQL_Native_Lexer and WP_MySQL_Native_Parser and the load.php hook from #384 picks them up automatically. No public API changes — WP_MySQL_Lexer and WP_MySQL_Parser look the same to callers either way.

The extension lives at packages/php-ext-wp-mysql-parser/ (its own top-level package, not nested under mysql-on-sqlite) and exposes:

  • A native lexer that reuses the existing PHP lexer logic but yields tokens directly from Rust into a WP_MySQL_Native_Token_Stream.
  • A native parser that builds an AST in a Rust-owned buffer.
  • A WP_MySQL_Native_Parser_Node (introduced in Add lazy native parser node facade #386) whose read methods delegate into that buffer so children are never copied into PHP unless a caller actually walks the tree. Mutation triggers materialization.

The SQLite driver opts into the native path with an instanceof WP_MySQL_Native_Lexer check in create_parser() — when the extension is loaded, it streams tokens through the native lexer; otherwise it stays on the PHP path.

CI

  • New "MySQL Parser Extension Tests" workflow builds the Rust extension, runs the parser smoke check, and runs the mysql-on-sqlite PHPUnit suite with the extension loaded.
  • Existing "WordPress PHPUnit Tests" workflow grows a job that runs against a WordPress test container with the extension installed, to catch regressions in the SQLite driver's native code path.

Testing

  • Local: cargo build in the extension dir, then run the mysql-on-sqlite PHPUnit suite with -d extension=...libwp_mysql_parser.so. Tests pass identically with and without the extension.
  • CI runs both PHP-only and extension-loaded variants of the suites.

@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from bf50f10 to c2da5e4 Compare April 28, 2026 14:22
@adamziel adamziel changed the base branch from codex/native-token-stream-parser to trunk April 28, 2026 14:22
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from c219b31 to 2476729 Compare April 28, 2026 15:11
@adamziel adamziel changed the base branch from trunk to codex/native-parser-php-facade April 28, 2026 15:11
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch 2 times, most recently from f06ecf6 to 48db7c5 Compare April 28, 2026 15:22
@adamziel adamziel changed the base branch from codex/native-parser-php-facade to codex/native-parser-node-facade April 28, 2026 15:22
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 5fc4ca2 to e41fdaf Compare April 29, 2026 09:15
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from b20499f to 77a45df Compare April 30, 2026 11:52
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from e41fdaf to 9595995 Compare April 30, 2026 11:52
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from 77a45df to 830a9b2 Compare April 30, 2026 11:59
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 9595995 to 039eb69 Compare April 30, 2026 12:00
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from 830a9b2 to e66bab3 Compare April 30, 2026 12:16
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 039eb69 to 07a7777 Compare April 30, 2026 12:16
@adamziel adamziel changed the base branch from codex/native-parser-node-facade to trunk April 30, 2026 12:22
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 07a7777 to 9153c2e Compare April 30, 2026 12:24
@adamziel adamziel changed the base branch from trunk to codex/native-parser-node-facade April 30, 2026 12:24
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from bb3b8e6 to 89bc6a4 Compare April 30, 2026 12:37
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 9153c2e to d636f96 Compare April 30, 2026 12:37
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from 89bc6a4 to cd22199 Compare April 30, 2026 12:40
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from d636f96 to 6403031 Compare April 30, 2026 12:40
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from cd22199 to 23b1c02 Compare April 30, 2026 12:43
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 6403031 to c8f5b10 Compare April 30, 2026 12:43
When a native parser is in use, expose query results through a node
class that defers child materialization until callers actually walk the
tree. The base WP_Parser_Node::$children visibility is loosened to
protected so the facade can populate it on demand.
When a native parser is in use, expose query results through a node
class that defers child materialization until callers actually walk the
tree. The base WP_Parser_Node::$children visibility is loosened to
protected so the facade can populate it on demand.
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from c8f5b10 to 7b2099a Compare April 30, 2026 12:51
@adamziel adamziel changed the base branch from codex/native-parser-node-facade to trunk April 30, 2026 12:51
When the native extension is loaded, WP_MySQL_Lexer extends
WP_MySQL_Native_Lexer, so an instanceof check is more direct than
method_exists() — and it gives the IDE/static analyzer something to
work with.
Local-only build/smoke scripts that don't belong in the merged history.
Drop the tmp-test-native path filter and smoke-script step (the dir was
removed), and switch the inline native-availability checks from
method_exists() to instanceof WP_MySQL_Native_Lexer.
Lift it out of mysql-on-sqlite/ext/ to its own top-level package so it
sits alongside other packages and the path reads as what it is — a
standalone PHP extension.
@adamziel adamziel marked this pull request as ready for review April 30, 2026 13:40
@adamziel adamziel changed the title [codex] Add lazy native AST facade Add native Rust-based MySQL parser extension Apr 30, 2026
## Summary
- add one explicit `WP_Parser_Grammar::$native_grammar` cache slot
- store the compiled Rust grammar on the PHP grammar object instead of
in a content-hash cache
- remove the full exported-grammar hash walk from native parser
construction

## Why
The previous Rust-only content-key cache preserved a smaller PHP diff,
but every parser construction still exported and recursively hashed the
entire grammar before it could hit cache. In the SQLite smoke benchmark
that dropped the native path back to roughly 2x faster than PHP.

This restores the object-attached cache path we had before, but keeps
the PHP diff explicit and minimal: one new public cache property on
`WP_Parser_Grammar`.

## Measurements
Command:

```bash
TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh
```

| Run | PHP parser | Rust parser | Speedup |
| ---: | ---: | ---: | ---: |
| 1 | 3.088s | 0.389s | 7.94x |
| 2 | 3.126s | 0.386s | 8.10x |
| 3 | 2.927s | 0.348s | 8.41x |

Default 2000-query smoke workload:

| Workload | PHP parser | Rust parser | Speedup |
| --- | ---: | ---: | ---: |
| 2000 generated queries, including 8 x 2000-row inserts | 24.082s |
3.008s | 8.01x |

## Testing
- `cargo fmt --check`
- `php -l
packages/mysql-on-sqlite/src/parser/class-wp-parser-grammar.php`
- `git diff --check`
- `TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh`
- `./tmp-test-native/run.sh`

## Notes
This assumes `WP_Parser_Grammar` is immutable after construction for
native parsing purposes. That matches current use, and the tradeoff is
isolated in this PR so it is visible in review.
adamziel added a commit that referenced this pull request Apr 30, 2026
## Summary
- reuse one `WP_MySQL_Parser` instance inside the SQLite driver and
reset its token stream per query
- add `reset_tokens()` to the PHP parser polyfill and the Rust native
parser
- restore native parser-node accessor fast paths in
`WP_MySQL_Native_Parser_Node`, while keeping PHP child materialization
for mutation
- fix the local native extension build helper for Nix/libclang bindgen
by undefining `__SSE2__` during binding generation

## Stack
This is the top PR in the native MySQL lexer/parser stack. The stack is
split so each GitHub diff shows one reviewable concern:

1. [#384 Extract MySQL lexer and parser
polyfills](#384)
   - `trunk` -> `codex/native-parser-php-facade`
   - extraction-only PHP refactor
- moves the existing PHP lexer/parser implementations into polyfill
classes
- keeps public `WP_MySQL_Lexer` and `WP_MySQL_Parser` as thin PHP
subclasses

2. [#385 Add optional native parser
routing](#385)
- `codex/native-parser-php-facade` ->
`codex/native-parser-class-routing`
   - adds fallback `WP_MySQL_Native_*` PHP classes
- routes the public lexer/parser classes through native classes when the
Rust extension provides them
   - adds the minimal PHP grammar-export bridge for the native parser

3. [#386 Add lazy native parser node
facade](#386)
- `codex/native-parser-class-routing` ->
`codex/native-parser-node-facade`
   - keeps `WP_Parser_Node` as the plain PHP tree node
- adds `WP_MySQL_Native_Parser_Node extends WP_Parser_Node` for
native-backed lazy AST nodes
- keeps native AST handles and native accessor delegation out of the
base node class

4. [#381 Add lazy native AST
facade](#381)
   - `codex/native-parser-node-facade` -> `codex/native-lazy-ast-facade`
- implements the Rust lexer/parser extension and lazy native AST facade
   - makes the Rust extension instantiate `WP_MySQL_Native_Parser_Node`
- adds native-extension CI coverage for the SQLite driver and WordPress
PHPUnit tests
   - includes the local SQLite facade smoke benchmark

5. [#387 Cache native grammar on parser grammar
object](#387)
- `codex/native-lazy-ast-facade` ->
`codex/native-parser-object-grammar-cache`
   - restores the object-attached native grammar cache
   - adds only `WP_Parser_Grammar::$native_grammar` on the PHP side
- removes the Rust content-hash cache that walked the whole exported
grammar on every parser construction

6. This PR, [#388 Speed up native AST
materialization](#388)
- `codex/native-parser-object-grammar-cache` ->
`codex/native-parser-bulk-materialization`
- optimizes native-to-PHP AST access after the grammar-cache performance
restoration
- reuses the SQLite driver's parser instance instead of constructing it
per query

## Why
The native lexer/parser itself is fast, but the PHP-facing path can lose
that benefit if each query repeatedly rebuilds native parser state or
forces full PHP AST materialization. On the current stack, #387 already
removes the large grammar export/hash cost. This PR removes the
remaining per-query parser construction churn and restores the native
AST accessor path for descendant-heavy SQLite driver workloads.

## Measurements
Environment: local PHP 8.2 via the native build helper, release Rust
extension, current top of this PR.

Focused constructor/reset benchmark over 5000 unique SELECT queries:

| Phase | Time |
| --- | ---: |
| native tokenize | 22.62 us/query |
| fresh native parser constructor only | 2.31 us/query |
| reusable parser `reset_tokens()` only | 0.32 us/query |
| reusable parser reset + parse + `get_descendants()` | 157.06 us/query
|
| constructor/reset ratio | 7.3x |

The previously reported ~622 us/query constructor cost does not
reproduce on this stack because #387 already caches the native grammar
on the PHP grammar object. Parser reuse still removes most of the
remaining constructor overhead.

SQLite facade smoke workload:

Command:

```bash
TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh
```

| Workload | PHP fallback | Native extension | Speedup |
| --- | ---: | ---: | ---: |
| 250 generated queries, including 1 x 2000-row insert | 4.060s | 0.525s
| 7.73x |

## Testing
- `cargo fmt --check`
- `git diff --check`
- `composer run check-cs`
- `composer run test` from `packages/mysql-on-sqlite`
- `php -d
extension=packages/mysql-on-sqlite/ext/wp-mysql-parser/target/release/libwp_mysql_parser.so
packages/mysql-on-sqlite/vendor/bin/phpunit -c
packages/mysql-on-sqlite/phpunit.xml.dist`
- `TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh`

#[php_class]
#[php(name = "WP_MySQL_Native_Parser")]
pub struct WpMySqlNativeParser {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In native mode, new WP_MySQL_Parser(...) instanceof WP_Parser will return false. Is there a way to preserve the existing instanceof WP_Parser?

run: cargo build
working-directory: packages/php-ext-wp-mysql-parser

- name: Verify SQLite driver selects the native parser path
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job only builds the extension and runs a reflection smoke check, so native-parser regressions in the SQLite driver's translation/emulation path can still pass CI. The PR description says the mysql-on-sqlite PHPUnit suite runs with the extension loaded, but there is no php -d extension=... ./vendor/bin/phpunit or equivalent step here.

run: bash .github/workflows/wp-tests-phpunit-native-extension-setup.sh

- name: Verify WordPress uses parser extension
run: cd wordpress && node tools/local-env/scripts/docker.js run --rm php php /var/www/native-verify-extension.php
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After loading the extension, this native WordPress job only runs native-verify-extension.php and then cleans up; it never invokes the WordPress PHPUnit runner. That means failures that occur only while WordPress tests use the native parser path will not be caught by the job named for that coverage.

pull_request:
paths:
- '.github/workflows/mysql-parser-extension-tests.yml'
- 'packages/mysql-on-sqlite/**'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these path filters, a future change that touches only packages/php-ext-wp-mysql-parser/** will not run the parser extension workflow on push or pull request. Since the Rust crate lives there, extension build/test regressions can be merged without this CI running unless another watched path also changes.

.ok_or_else(|| php_error("Native AST node index is out of range"))
}

fn child_to_zval(&self, native_ast_zval: &Zval, child: NativeAstChild) -> PhpResult<Zval> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native AST accessors appear to create fresh PHP wrappers for child nodes/tokens on each read. That changes WP_Parser_Node semantics: child object identity is no longer stable, and mutations to a child returned by get_first_child_node() / get_children() are not visible when traversing from the parent again. Since WP_Parser_Node exposes public mutators and this PR aims to keep the public parser API unchanged, can we either cache/materialize child wrappers consistently or explicitly account for this compatibility change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants