Add native Rust-based MySQL parser extension#381
Conversation
bf50f10 to
c2da5e4
Compare
c219b31 to
2476729
Compare
f06ecf6 to
48db7c5
Compare
5fc4ca2 to
e41fdaf
Compare
b20499f to
77a45df
Compare
e41fdaf to
9595995
Compare
77a45df to
830a9b2
Compare
9595995 to
039eb69
Compare
830a9b2 to
e66bab3
Compare
039eb69 to
07a7777
Compare
07a7777 to
9153c2e
Compare
bb3b8e6 to
89bc6a4
Compare
9153c2e to
d636f96
Compare
89bc6a4 to
cd22199
Compare
d636f96 to
6403031
Compare
cd22199 to
23b1c02
Compare
6403031 to
c8f5b10
Compare
When a native parser is in use, expose query results through a node class that defers child materialization until callers actually walk the tree. The base WP_Parser_Node::$children visibility is loosened to protected so the facade can populate it on demand.
When a native parser is in use, expose query results through a node class that defers child materialization until callers actually walk the tree. The base WP_Parser_Node::$children visibility is loosened to protected so the facade can populate it on demand.
The previous helper name (has_unmaterialized_native_ast) implied a runtime check for native-extension presence. It's actually a per-instance state flag tracking whether this node's children have been copied into PHP. was_mutated() reads that intent more directly.
c8f5b10 to
7b2099a
Compare
When the native extension is loaded, WP_MySQL_Lexer extends WP_MySQL_Native_Lexer, so an instanceof check is more direct than method_exists() — and it gives the IDE/static analyzer something to work with.
Local-only build/smoke scripts that don't belong in the merged history.
Drop the tmp-test-native path filter and smoke-script step (the dir was removed), and switch the inline native-availability checks from method_exists() to instanceof WP_MySQL_Native_Lexer.
Lift it out of mysql-on-sqlite/ext/ to its own top-level package so it sits alongside other packages and the path reads as what it is — a standalone PHP extension.
## Summary - add one explicit `WP_Parser_Grammar::$native_grammar` cache slot - store the compiled Rust grammar on the PHP grammar object instead of in a content-hash cache - remove the full exported-grammar hash walk from native parser construction ## Why The previous Rust-only content-key cache preserved a smaller PHP diff, but every parser construction still exported and recursively hashed the entire grammar before it could hit cache. In the SQLite smoke benchmark that dropped the native path back to roughly 2x faster than PHP. This restores the object-attached cache path we had before, but keeps the PHP diff explicit and minimal: one new public cache property on `WP_Parser_Grammar`. ## Measurements Command: ```bash TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh ``` | Run | PHP parser | Rust parser | Speedup | | ---: | ---: | ---: | ---: | | 1 | 3.088s | 0.389s | 7.94x | | 2 | 3.126s | 0.386s | 8.10x | | 3 | 2.927s | 0.348s | 8.41x | Default 2000-query smoke workload: | Workload | PHP parser | Rust parser | Speedup | | --- | ---: | ---: | ---: | | 2000 generated queries, including 8 x 2000-row inserts | 24.082s | 3.008s | 8.01x | ## Testing - `cargo fmt --check` - `php -l packages/mysql-on-sqlite/src/parser/class-wp-parser-grammar.php` - `git diff --check` - `TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh` - `./tmp-test-native/run.sh` ## Notes This assumes `WP_Parser_Grammar` is immutable after construction for native parsing purposes. That matches current use, and the tradeoff is isolated in this PR so it is visible in review.
## Summary - reuse one `WP_MySQL_Parser` instance inside the SQLite driver and reset its token stream per query - add `reset_tokens()` to the PHP parser polyfill and the Rust native parser - restore native parser-node accessor fast paths in `WP_MySQL_Native_Parser_Node`, while keeping PHP child materialization for mutation - fix the local native extension build helper for Nix/libclang bindgen by undefining `__SSE2__` during binding generation ## Stack This is the top PR in the native MySQL lexer/parser stack. The stack is split so each GitHub diff shows one reviewable concern: 1. [#384 Extract MySQL lexer and parser polyfills](#384) - `trunk` -> `codex/native-parser-php-facade` - extraction-only PHP refactor - moves the existing PHP lexer/parser implementations into polyfill classes - keeps public `WP_MySQL_Lexer` and `WP_MySQL_Parser` as thin PHP subclasses 2. [#385 Add optional native parser routing](#385) - `codex/native-parser-php-facade` -> `codex/native-parser-class-routing` - adds fallback `WP_MySQL_Native_*` PHP classes - routes the public lexer/parser classes through native classes when the Rust extension provides them - adds the minimal PHP grammar-export bridge for the native parser 3. [#386 Add lazy native parser node facade](#386) - `codex/native-parser-class-routing` -> `codex/native-parser-node-facade` - keeps `WP_Parser_Node` as the plain PHP tree node - adds `WP_MySQL_Native_Parser_Node extends WP_Parser_Node` for native-backed lazy AST nodes - keeps native AST handles and native accessor delegation out of the base node class 4. [#381 Add lazy native AST facade](#381) - `codex/native-parser-node-facade` -> `codex/native-lazy-ast-facade` - implements the Rust lexer/parser extension and lazy native AST facade - makes the Rust extension instantiate `WP_MySQL_Native_Parser_Node` - adds native-extension CI coverage for the SQLite driver and WordPress PHPUnit tests - includes the local SQLite facade smoke benchmark 5. [#387 Cache native grammar on parser grammar object](#387) - `codex/native-lazy-ast-facade` -> `codex/native-parser-object-grammar-cache` - restores the object-attached native grammar cache - adds only `WP_Parser_Grammar::$native_grammar` on the PHP side - removes the Rust content-hash cache that walked the whole exported grammar on every parser construction 6. This PR, [#388 Speed up native AST materialization](#388) - `codex/native-parser-object-grammar-cache` -> `codex/native-parser-bulk-materialization` - optimizes native-to-PHP AST access after the grammar-cache performance restoration - reuses the SQLite driver's parser instance instead of constructing it per query ## Why The native lexer/parser itself is fast, but the PHP-facing path can lose that benefit if each query repeatedly rebuilds native parser state or forces full PHP AST materialization. On the current stack, #387 already removes the large grammar export/hash cost. This PR removes the remaining per-query parser construction churn and restores the native AST accessor path for descendant-heavy SQLite driver workloads. ## Measurements Environment: local PHP 8.2 via the native build helper, release Rust extension, current top of this PR. Focused constructor/reset benchmark over 5000 unique SELECT queries: | Phase | Time | | --- | ---: | | native tokenize | 22.62 us/query | | fresh native parser constructor only | 2.31 us/query | | reusable parser `reset_tokens()` only | 0.32 us/query | | reusable parser reset + parse + `get_descendants()` | 157.06 us/query | | constructor/reset ratio | 7.3x | The previously reported ~622 us/query constructor cost does not reproduce on this stack because #387 already caches the native grammar on the PHP grammar object. Parser reuse still removes most of the remaining constructor overhead. SQLite facade smoke workload: Command: ```bash TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh ``` | Workload | PHP fallback | Native extension | Speedup | | --- | ---: | ---: | ---: | | 250 generated queries, including 1 x 2000-row insert | 4.060s | 0.525s | 7.73x | ## Testing - `cargo fmt --check` - `git diff --check` - `composer run check-cs` - `composer run test` from `packages/mysql-on-sqlite` - `php -d extension=packages/mysql-on-sqlite/ext/wp-mysql-parser/target/release/libwp_mysql_parser.so packages/mysql-on-sqlite/vendor/bin/phpunit -c packages/mysql-on-sqlite/phpunit.xml.dist` - `TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh`
|
|
||
| #[php_class] | ||
| #[php(name = "WP_MySQL_Native_Parser")] | ||
| pub struct WpMySqlNativeParser { |
There was a problem hiding this comment.
In native mode, new WP_MySQL_Parser(...) instanceof WP_Parser will return false. Is there a way to preserve the existing instanceof WP_Parser?
| run: cargo build | ||
| working-directory: packages/php-ext-wp-mysql-parser | ||
|
|
||
| - name: Verify SQLite driver selects the native parser path |
There was a problem hiding this comment.
This job only builds the extension and runs a reflection smoke check, so native-parser regressions in the SQLite driver's translation/emulation path can still pass CI. The PR description says the mysql-on-sqlite PHPUnit suite runs with the extension loaded, but there is no php -d extension=... ./vendor/bin/phpunit or equivalent step here.
| run: bash .github/workflows/wp-tests-phpunit-native-extension-setup.sh | ||
|
|
||
| - name: Verify WordPress uses parser extension | ||
| run: cd wordpress && node tools/local-env/scripts/docker.js run --rm php php /var/www/native-verify-extension.php |
There was a problem hiding this comment.
After loading the extension, this native WordPress job only runs native-verify-extension.php and then cleans up; it never invokes the WordPress PHPUnit runner. That means failures that occur only while WordPress tests use the native parser path will not be caught by the job named for that coverage.
| pull_request: | ||
| paths: | ||
| - '.github/workflows/mysql-parser-extension-tests.yml' | ||
| - 'packages/mysql-on-sqlite/**' |
There was a problem hiding this comment.
With these path filters, a future change that touches only packages/php-ext-wp-mysql-parser/** will not run the parser extension workflow on push or pull request. Since the Rust crate lives there, extension build/test regressions can be merged without this CI running unless another watched path also changes.
| .ok_or_else(|| php_error("Native AST node index is out of range")) | ||
| } | ||
|
|
||
| fn child_to_zval(&self, native_ast_zval: &Zval, child: NativeAstChild) -> PhpResult<Zval> { |
There was a problem hiding this comment.
The native AST accessors appear to create fresh PHP wrappers for child nodes/tokens on each read. That changes WP_Parser_Node semantics: child object identity is no longer stable, and mutations to a child returned by get_first_child_node() / get_children() are not visible when traversing from the parent again. Since WP_Parser_Node exposes public mutators and this PR aims to keep the public parser API unchanged, can we either cache/materialize child wrappers consistently or explicitly account for this compatibility change?
Summary
Adds an optional Rust-based PHP extension that takes over MySQL lexing and parsing when loaded. The pure-PHP lexer and parser stay in place as the default; when the extension is present, it pre-declares
WP_MySQL_Native_LexerandWP_MySQL_Native_Parserand theload.phphook from #384 picks them up automatically. No public API changes —WP_MySQL_LexerandWP_MySQL_Parserlook the same to callers either way.The extension lives at
packages/php-ext-wp-mysql-parser/(its own top-level package, not nested under mysql-on-sqlite) and exposes:WP_MySQL_Native_Token_Stream.WP_MySQL_Native_Parser_Node(introduced in Add lazy native parser node facade #386) whose read methods delegate into that buffer so children are never copied into PHP unless a caller actually walks the tree. Mutation triggers materialization.The SQLite driver opts into the native path with an
instanceof WP_MySQL_Native_Lexercheck increate_parser()— when the extension is loaded, it streams tokens through the native lexer; otherwise it stays on the PHP path.CI
Testing
cargo buildin the extension dir, then run the mysql-on-sqlite PHPUnit suite with-d extension=...libwp_mysql_parser.so. Tests pass identically with and without the extension.