[codex] Speed up native MySQL parser materialization#379
Closed
adamziel wants to merge 1 commit intocodex/rust-mysql-parser-extensionfrom
Closed
[codex] Speed up native MySQL parser materialization#379adamziel wants to merge 1 commit intocodex/rust-mysql-parser-extensionfrom
adamziel wants to merge 1 commit intocodex/rust-mysql-parser-extensionfrom
Conversation
c613670 to
9a21181
Compare
8bffd34 to
dbf7072
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
This is stacked on #377.
WP_MySQL_TokenandWP_Parser_Nodeobjects directly through Zend property writes instead of calling PHP bridge helper functions for every token/node.append_child()once per child.next_query()and materializes the PHP AST lazily onget_query_ast(), caching that PHP object for repeated reads.HashMap/HashSetlookups with dense rule entries and sorted lookahead vectors.Why
The first Rust extension PR kept the exact public PHP token and AST shape, but it still spent a large part of runtime crossing back into PHP for object construction and child appends. This PR keeps the public objects the same while moving that materialization work into the extension and reducing generic grammar lookup overhead.
Performance
Measured locally on PHP 8.2.29 CLI, Linux 6.12, AMD Ryzen 5 7430U, 2 visible cores, 7.2 GiB RAM. The Rust extension was built with
cargo build --release. Each payload is a generated 50 MiB SQL file with unique statements and varied identifiers/literals: long multi-row inserts, CTE/window/subquery-heavy selects, and mixed DDL/DML. Timings excludefile_get_contents()and include tokenization, parser construction,next_query(), andget_query_ast()for every statement, so PHP-compatible AST materialization is included.Compared with the initial Rust implementation measured in #377, the native extension total time improved from 67.616s to 27.873s for long inserts, 87.148s to 28.755s for edge selects, and 88.876s to 22.323s for mixed DDL/DML.
Validation
cargo fmt --checkcargo buildcargo build --releaseOK (183 tests, 1421183 assertions)Tests: 667, Assertions: 1427673, Skipped: 2, Incomplete: 2