Commit 5f6bb6a
committed
Deduplicate selector entries while embedding branch sequences
The per-(rule, token) branch selector stored a separate inner array per
token, even when many tokens within the same rule mapped to identical
branch lists (a single branch's FIRST set covers many tokens, for
example). Loading the MySQL grammar used ~40 MB of PHP memory, most of
which was duplicated inner arrays.
Deduplicate by signature during grammar build so all tokens that land
on the same branch list share one inner array via copy-on-write. The
inner arrays still embed the branch symbol sequences directly so the
hot loop iterates them without an extra $rules[$rule_id][$idx]
indirection per branch attempt.
Grammar memory on the MySQL grammar drops from ~40 MB to ~10 MB.
PHPUnit peak memory drops from 198 MB to 110 MB. Parser throughput is
unchanged from the previous (non-deduplicated) embedded-sequences form.1 parent 33233ae commit 5f6bb6a
1 file changed
Lines changed: 19 additions & 7 deletions
Lines changed: 19 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
337 | 337 | | |
338 | 338 | | |
339 | 339 | | |
340 | | - | |
341 | | - | |
342 | | - | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
343 | 349 | | |
344 | | - | |
345 | | - | |
346 | | - | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
347 | 360 | | |
348 | | - | |
349 | 361 | | |
350 | 362 | | |
351 | 363 | | |
| |||
0 commit comments