Skip to content

[codex] Speed up native MySQL parser materialization#379

Closed
adamziel wants to merge 1 commit intocodex/rust-mysql-parser-extensionfrom
codex/native-parser-fast-path
Closed

[codex] Speed up native MySQL parser materialization#379
adamziel wants to merge 1 commit intocodex/rust-mysql-parser-extensionfrom
codex/native-parser-fast-path

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

What changed

This is stacked on #377.

  • Creates PHP WP_MySQL_Token and WP_Parser_Node objects directly through Zend property writes instead of calling PHP bridge helper functions for every token/node.
  • Builds parser node children as a packed array in one step instead of calling append_child() once per child.
  • Keeps parsed query ASTs in native Rust form after next_query() and materializes the PHP AST lazily on get_query_ast(), caching that PHP object for repeated reads.
  • Replaces hot parser grammar HashMap/HashSet lookups with dense rule entries and sorted lookahead vectors.
  • Removes rule-name cloning and string comparisons from recursive parser matching.

Why

The first Rust extension PR kept the exact public PHP token and AST shape, but it still spent a large part of runtime crossing back into PHP for object construction and child appends. This PR keeps the public objects the same while moving that materialization work into the extension and reducing generic grammar lookup overhead.

Performance

Measured locally on PHP 8.2.29 CLI, Linux 6.12, AMD Ryzen 5 7430U, 2 visible cores, 7.2 GiB RAM. The Rust extension was built with cargo build --release. Each payload is a generated 50 MiB SQL file with unique statements and varied identifiers/literals: long multi-row inserts, CTE/window/subquery-heavy selects, and mixed DDL/DML. Timings exclude file_get_contents() and include tokenization, parser construction, next_query(), and get_query_ast() for every statement, so PHP-compatible AST materialization is included.

Payload Size MiB Statements Tokens PHP lex s Rust lex s Lex speedup PHP parse s Rust parse s Parse speedup PHP total s Rust total s Total speedup Peak MB PHP/Rust Failures PHP/Rust
long inserts 50.00 4,890 9,623,521 11.714 4.814 2.43x 107.580 23.059 4.67x 119.294 27.873 4.28x 1696.0 / 1672.0 0 / 0
edge selects 50.00 31,313 11,147,429 16.935 5.875 2.88x 130.798 22.880 5.72x 147.732 28.755 5.14x 1890.0 / 1854.0 0 / 0
mixed ddl dml 50.00 21,482 9,410,229 11.247 4.464 2.52x 100.169 17.859 5.61x 111.416 22.323 4.99x 1666.0 / 1644.0 0 / 0

Compared with the initial Rust implementation measured in #377, the native extension total time improved from 67.616s to 27.873s for long inserts, 87.148s to 28.755s for edge selects, and 88.876s to 22.323s for mixed DDL/DML.

Validation

  • cargo fmt --check
  • cargo build
  • cargo build --release
  • Extension-loaded focused parser/lexer suite: OK (183 tests, 1421183 assertions)
  • Extension-loaded full mysql-on-sqlite PHPUnit suite: Tests: 667, Assertions: 1427673, Skipped: 2, Incomplete: 2

@adamziel adamziel force-pushed the codex/rust-mysql-parser-extension branch 2 times, most recently from c613670 to 9a21181 Compare April 29, 2026 00:19
@adamziel adamziel force-pushed the codex/native-parser-fast-path branch from 8bffd34 to dbf7072 Compare April 29, 2026 00:20
@adamziel adamziel closed this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant