Reject cyclic grammars instead of looping forever (#1585)#1607
Reject cyclic grammars instead of looping forever (#1585)#1607sarathfrancis90 wants to merge 1 commit into
Conversation
|
Thanks, I will take a look when I have the time. Quick note: |
A rule that can derive itself without consuming any input -- e.g. `a: a b*` where `b*` may be empty, which expands to a unit production `a: a` -- has no base case to terminate on. The LALR and CYK parsers build a degenerate table for such a grammar and then loop forever at parse time; Earley tolerates it. Detect this at construction time: a grammar is cyclic when some non-terminal A derives A directly or through other symbols that are all nullable. Build the "derives in one step to a single non-nullable symbol" relation over the rules and look for a cycle; if one exists, raise a GrammarError naming the rule. The check runs for the LALR and CYK analyzers (the ones that hang). Earley is left as-is since it handles these grammars without looping. Fixes lark-parser#1585
aa2f436 to
77e5042
Compare
|
Good call to check — I tried moving it into On the cost: LALR already reuses the No rush on the review. |
Fixes #1585.
As we discussed, a rule that can reach itself without consuming input — e.g.
a: a b*whereb*may be empty, which expands to a unit productiona: a— has no base case, so LALR and CYK build a degenerate table and then spin forever at parse time. Earley tolerates it.This catches it at construction instead: I build the "derives to a single symbol with everything else nullable" relation over the rules and look for a cycle; if a non-terminal can reach itself, a
GrammarErroris raised naming the rule. As you pointed out, this isn't specific tostart— it fires for any rule (yourx: "a" | x x*example included), and also for indirect cycles likea: b/b: a.I scoped the check to the LALR and CYK analyzers — the ones that actually hang — and left Earley untouched, since it currently returns a parse for these grammars and I didn't want to change its behaviour without your call. If you'd rather prevent these grammars from compiling everywhere (your "even if they pass grammar detection" point), it's a small move to run the check in
GrammarAnalyzeritself so Earley rejects them too — happy to do that here.Left recursion, optional repetition (
x*), and nullable chains are not cyclic and still build fine. Full suite passes (1107) and mypy is clean; added a regression test intest_grammar.py.