Re-enable Joop Leo right-recursion optimization (fixes #397) by erezsh · Pull Request #1602 · lark-parser/lark

erezsh · 2026-06-02T13:59:33Z

Leo was disabled when predict_and_complete was refactored into a method, breaking is_quasi_complete's access to start_symbol. This caused incorrect behavior with self-referential start symbols and multiple start symbols.

Restore TransitiveItem in earley_common.py
Add start_symbol parameter to predict_and_complete so is_quasi_complete can correctly guard against self-referential cycles
Move held_completions update before the Leo/regular branch so empty items populate it even when Leo fires (fixes nullable symbol parsing)
Guard create_leo_transitives with NULLABLE check: Leo must not shortcut nullable right-recursive symbols, as that bypasses the intermediate SPPF nodes where priority/disambiguation lives
Fix load_paths for multi-hop chains: use reduction.rule.origin instead of next_titem.s (which carries the outer rule's origin, not the intermediate symbol), and merge paths sharing the same intermediate symbol into one canonical SymbolNode to preserve SPPF sharing
Pass start_symbol through in xearley.py
Add tests for right-recursive Leo and multi-start-symbol correctness

Leo was disabled when predict_and_complete was refactored into a method, breaking is_quasi_complete's access to start_symbol. This caused incorrect behavior with self-referential start symbols and multiple start symbols. - Restore TransitiveItem in earley_common.py - Add start_symbol parameter to predict_and_complete so is_quasi_complete can correctly guard against self-referential cycles - Move held_completions update before the Leo/regular branch so empty items populate it even when Leo fires (fixes nullable symbol parsing) - Guard create_leo_transitives with NULLABLE check: Leo must not shortcut nullable right-recursive symbols, as that bypasses the intermediate SPPF nodes where priority/disambiguation lives - Fix load_paths for multi-hop chains: use reduction.rule.origin instead of next_titem.s (which carries the outer rule's origin, not the intermediate symbol), and merge paths sharing the same intermediate symbol into one canonical SymbolNode to preserve SPPF sharing - Pass start_symbol through in xearley.py - Add tests for right-recursive Leo and multi-start-symbol correctness

codecov · 2026-06-02T14:01:16Z

Codecov Report

❌ Patch coverage is 96.36364% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.46%. Comparing base (eb015d1) to head (583bac0).
⚠️ Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
lark/parsers/earley_common.py	80.00%	3 Missing ⚠️
lark/parsers/earley.py	98.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1602      +/-   ##
==========================================
+ Coverage   90.06%   90.46%   +0.40%     
==========================================
  Files          52       52              
  Lines        8089     8169      +80     
==========================================
+ Hits         7285     7390     +105     
+ Misses        804      779      -25

Flag	Coverage Δ
unittests	`90.46% <96.36%> (+0.40%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

erezsh · 2026-06-02T14:03:50Z

@night199uk Does this seem right? It was vibe-coded, so I don't trust it much. But it does pass the tests.

chanicpanic · 2026-06-07T05:45:30Z

+                if quasi.expect not in self.NULLABLE:
+                    return False
+                if quasi.rule.origin == start_symbol and quasi.expect == start_symbol:
+                    return False


This statement isn't covered by the test cases. So, either we are missing a test case that exercises this branch, or start_symbol isn't actually needed here.

(There are a couple other lines in the Joop Leo code that aren't covered, but this one is the most interesting)

Even though one of the new test cases executes this branch, all the tests pass if this branch is removed, so I'm still not convinced it is necessary.

Thanks. I guess I'll have to go a bit deeper into it myself, so I don't keep generating slop :)

chanicpanic · 2026-06-07T05:46:12Z

+                quasi = quasi.advance()
+            return True
+
+        def create_leo_transitives(origin, start):


The original implementation of this function contains code comments that would probably be helpful to retain.

chanicpanic · 2026-06-07T05:52:06Z

While we are touching the transitives code: it would be less confusing for item.rule.origin to be used here instead of item.s. The existing code is correct given that item.s == item.rule.origin when item.is_complete, but unexpected given the preceding if item.rule.origin in transitives[item.start] condition.

chanicpanic · 2026-06-07T05:59:42Z

                        items.append(new_item)
                ###R Regular Earley completer
                else:
-                    # Empty has 0 length. If we complete an empty symbol in a particular


Note: None of the tests actually fail if this is not moved up

chanicpanic · 2026-06-07T06:15:30Z

+        # When a node already IS the intermediate symbol (from a direct Leo completion),
+        # use it as the canonical node so all other paths to the same symbol merge into it,
+        # preserving SPPF sharing and rule-order-based disambiguation.
+        canonical_vn = {}


This function contains the largest amount of new code and, unsurprisingly, is also the most questionable.

After playing around with it a while, I believe the logic is valid, but the implementation definitely reeks of LLM-generation.

I came up with a version that effectively does the same thing, but more concisely and in-line with the patterns of the existing earley code:

node_cache = {(node.s, node.start, node.end): node for (_, node) in self.paths} for transitive, node in self.paths: next_node = node if transitive.next_titem is not None: label = (transitive.next_titem.reduction.rule.origin, transitive.next_titem.reduction.start, self.end) is_node_the_next_node = label == (node.s, node.start, node.end) if not is_node_the_next_node: next_node = node_cache[label] if label in node_cache else node_cache.setdefault(label, type(self)(*label)) next_node.add_path(transitive.next_titem, node) self.add_family(transitive.reduction.rule.origin, transitive.reduction.rule, transitive.reduction.start, transitive.reduction.node, next_node) self.paths_loaded = True

That does look better, thanks! I updated the code accordingly.

@chanicpanic

As suggested by @chanicpanic, with small updates. Collapses the two-pass canonical_vn dance into one loop keyed on the full (s, start, end) SPPF identity. Paths whose node matches the expected intermediate label are reused as the canonical node; later paths to the same label merge via add_path and skip the redundant add_family.

1) Add Joop Leo regression tests for chain-extension corner cases Two new tests under the Joop Leo section in TestFullEarley: - Chain stops at non-nullable suffix: is_quasi_complete returns False when the outer rule has a non-nullable symbol after the recursive position. - Chain extends through nullable suffix: is_quasi_complete traverses trailing nullable non-terminals via the NULLABLE check + advance loop. These cover the previously-untested NULLABLE-suffix branch of is_quasi_complete and the break-at-non-nullable-suffix path of create_leo_transitives. 2) Add Joop Leo test for nullable-start self-reference guard Covers the is_quasi_complete branch that returns False when the chain encounters a state in start_symbol's rule expecting start_symbol itself as a nullable non-terminal. Without this guard, the Leo shortcut would treat the start rule as quasi-complete and produce an incorrect chain.

- is_quasi_complete's complete-item fast-path: caller in create_leo_transitives only passes originators with expect != None. - root_transitive's else branch: every titem is registered at transitives[col][previous], so the lookup always succeeds (the root titem self-resolves through it). - Leo + terminal scan branch: is_quasi_complete requires the post- recursion suffix to be all-nullable, so new_item.expect is never a terminal. The first two are Python-level defensive code; the third is residue from a more paper-faithful structure that the current titem representation makes redundant. Cycle detection (visited) and the empty-FIRST guard are retained as cheap protection against unusual grammars.

- Move held_completions update back into the regular completer branch. Empty (nullable) items can never trigger Leo (create_leo_transitives breaks at the NULLABLE check, so transitives never contains nullable origins as keys), so the unconditional placement was unnecessary. Reviewer's observation that no test failed without the move is correct. - Use item.rule.origin instead of item.s when looking up the transitive, matching the preceding `in transitives[item.start]` check. Both are equivalent for complete items, but item.rule.origin is less surprising. - Restore the original walk-backwards/walk-forwards comments in create_leo_transitives so the two phases are immediately legible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erezsh · 2026-06-07T14:11:11Z

Updated the PR based on the review. (simplified, improved coverage)

erezsh mentioned this pull request Jun 3, 2026

Fix empty SPPF node from xearley ignore carry-over (fixes #1598) #1601

Merged

chanicpanic reviewed Jun 7, 2026

View reviewed changes

erezsh and others added 4 commits June 7, 2026 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Re-enable Joop Leo right-recursion optimization (fixes #397)#1602

Re-enable Joop Leo right-recursion optimization (fixes #397)#1602
erezsh wants to merge 5 commits into
masterfrom
fix_joop_leo

erezsh commented Jun 2, 2026

Uh oh!

codecov Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

erezsh commented Jun 2, 2026

Uh oh!

chanicpanic Jun 7, 2026

Uh oh!

chanicpanic Jun 9, 2026

Uh oh!

erezsh Jun 9, 2026

Uh oh!

chanicpanic Jun 7, 2026

Uh oh!

chanicpanic Jun 7, 2026

Uh oh!

chanicpanic Jun 7, 2026

Uh oh!

chanicpanic Jun 7, 2026

Uh oh!

erezsh Jun 7, 2026

Uh oh!

erezsh commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

erezsh commented Jun 2, 2026

Uh oh!

codecov Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

erezsh commented Jun 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezsh commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 2, 2026 •

edited

Loading