Skip to content

Commit 5f6bb6a

Browse files
committed
Deduplicate selector entries while embedding branch sequences
The per-(rule, token) branch selector stored a separate inner array per token, even when many tokens within the same rule mapped to identical branch lists (a single branch's FIRST set covers many tokens, for example). Loading the MySQL grammar used ~40 MB of PHP memory, most of which was duplicated inner arrays. Deduplicate by signature during grammar build so all tokens that land on the same branch list share one inner array via copy-on-write. The inner arrays still embed the branch symbol sequences directly so the hot loop iterates them without an extra $rules[$rule_id][$idx] indirection per branch attempt. Grammar memory on the MySQL grammar drops from ~40 MB to ~10 MB. PHPUnit peak memory drops from 198 MB to 110 MB. Parser throughput is unchanged from the previous (non-deduplicated) embedded-sequences form.
1 parent 33233ae commit 5f6bb6a

1 file changed

Lines changed: 19 additions & 7 deletions

File tree

packages/mysql-on-sqlite/src/parser/class-wp-parser-grammar.php

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -337,15 +337,27 @@ private function build_branch_selectors() {
337337
$this->nullable_branches[ $rule_id ] = true;
338338
}
339339
if ( $selector ) {
340-
// Store the candidate branch sequences directly so the parser
341-
// can foreach over them without an extra $branches[$idx]
342-
// indirection on every branch attempt.
340+
// Expand branch indexes to the branch symbol sequences so
341+
// the parser can foreach candidate branches without an
342+
// extra $branches[$idx] indirection on every attempt. Many
343+
// tokens inside the same rule end up pointing to the same
344+
// branch-id list, so deduplicate by signature and let
345+
// copy-on-write share one sequences array across all of
346+
// them. Without this the nested table would be ~40 MB; with
347+
// it, ~1 MB.
348+
$by_signature = array();
343349
foreach ( $selector as $tid => $idx_list ) {
344-
$seqs = array();
345-
foreach ( $idx_list as $idx ) {
346-
$seqs[] = $branches[ $idx ];
350+
$sig = implode( ',', $idx_list );
351+
if ( isset( $by_signature[ $sig ] ) ) {
352+
$selector[ $tid ] = $by_signature[ $sig ];
353+
} else {
354+
$seqs = array();
355+
foreach ( $idx_list as $idx ) {
356+
$seqs[] = $branches[ $idx ];
357+
}
358+
$by_signature[ $sig ] = $seqs;
359+
$selector[ $tid ] = $seqs;
347360
}
348-
$selector[ $tid ] = $seqs;
349361
}
350362
$this->branches_for_token[ $rule_id ] = $selector;
351363
}

0 commit comments

Comments
 (0)