Skip to content

[codex] Add native token stream parser fast path#380

Closed
adamziel wants to merge 1 commit intocodex/native-parser-fast-pathfrom
codex/native-token-stream-parser
Closed

[codex] Add native token stream parser fast path#380
adamziel wants to merge 1 commit intocodex/native-parser-fast-pathfrom
codex/native-token-stream-parser

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

What changed

This stacks on #379.

  • Adds WP_MySQL_Native_Token_Stream, a compact Rust token stream returned by WP_MySQL_Native_Lexer::native_token_stream().
  • Teaches WP_MySQL_Native_Parser to accept either the existing PHP token array or the native token stream.
  • Switches the SQLite driver parser factory to use the native stream when the extension is loaded, with the existing PHP token-array path as the fallback.
  • Moves recursive stack-growth checks out of every grammar/node recursion and into the top-level parse/materialization entry points.
  • Updates the parser benchmark helper to exercise the stream path when available.

Why

#379 made AST materialization faster, but the driver still created millions of PHP token objects before Rust parsing started. This branch lets Rust parse compact token metadata directly and only materialize PHP-compatible token objects when an AST is requested.

The stack-growth guard was also on the recursive hot path. Keeping the guard at the parse/materialization boundary preserves the stack-growth behavior while removing a large per-rule overhead.

Benchmark

Methodology:

  • PHP 8.2 via Nix, release Rust extension build.
  • Three generated SQL files, each about 50 MiB, with unique statements.
  • Benchmark reads the full file, lexes, parses all statements with next_query(), and calls get_query_ast() for every query.
  • PHP baseline is the userland lexer/parser with no extension loaded.
  • “Base Rust” is [codex] Speed up native MySQL parser materialization #379 rebuilt/rerun on the same machine.
  • “This PR” is this branch with native token streams plus top-level stack checks.
Dataset Size Statements Tokens PHP total Base Rust total This PR total vs PHP vs Base Rust PHP peak Base Rust peak This PR peak
long_inserts 52,431,363 B 4,890 9,623,521 157.896s 26.023s 23.104s 6.83x 1.13x 1696.0 MiB 1672.0 MiB 112.0 MiB
edge_selects 52,429,066 B 31,313 11,147,429 149.077s 27.870s 23.526s 6.34x 1.18x 1890.0 MiB 1854.0 MiB 106.0 MiB
mixed_ddl_dml 52,430,929 B 21,482 9,410,229 110.326s 23.008s 18.460s 5.98x 1.25x 1666.0 MiB 1644.0 MiB 108.0 MiB

Validation

  • cargo fmt --check
  • cargo build --release
  • composer run check-cs
  • php -d extension=/tmp/mysql-parser-perf/target-stream-release2/release/libwp_mysql_parser.so -d memory_limit=1024M ./vendor/bin/phpunit -c ./phpunit.xml.dist --filter 'MySQL|Parser|Lexer'
  • php -d extension=/tmp/mysql-parser-perf/target-stream-release2/release/libwp_mysql_parser.so -d memory_limit=1024M ./vendor/bin/phpunit -c ./phpunit.xml.dist
  • php -d memory_limit=1024M ./vendor/bin/phpunit -c ./phpunit.xml.dist --filter 'WP_PDO_MySQL_On_SQLite_PDO_API_Tests::test_query$'

@adamziel adamziel force-pushed the codex/native-token-stream-parser branch from 5a9067a to 56c4280 Compare April 29, 2026 00:21
@adamziel adamziel closed this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant