Fail fast on structurally malformed LLM kernel responses#138
Open
jiannanWang wants to merge 1 commit into
Open
Fail fast on structurally malformed LLM kernel responses#138jiannanWang wants to merge 1 commit into
jiannanWang wants to merge 1 commit into
Conversation
When the LLM returns code that fails Python syntax or omits the required ``kernel_function`` definition, the worker would still write the file, run the test subprocess (~5s), get a syntax error, then re-prompt the LLM up to 3 more times for a total of ~20s wasted per dead candidate. Add an ``ast.parse`` precheck (``_validate_kernel_candidate``) that short-circuits with a clear error message when: * the extracted text is empty / whitespace-only, * Python parse fails (with line / message), or * no top-level ``kernel_function`` is defined. Hook the validator into both ``_refine_kernel`` (so refinement aborts immediately when the LLM produces unparseable text) and the entry points of ``verify`` / ``_refine_until_pass`` (so a malformed initial kernel doesn't get a 3-round retry budget). Behavior preserved for valid kernels: every existing happy path is unchanged. Saves ~20s per dead candidate, which scales linearly with fanout — meaningful at multi-LLM × multi-bottleneck × samples > 1 where dead-candidate rate is non-trivial. Test plan: - Unit test asserts a ``def kernel_function(...)``-less candidate returns a clear malformed-reason instead of looping. - Existing ``pytest tests/`` suite passes.
ee9a530 to
d7591ed
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When the LLM returns a response that fails Python syntax or omits the required
kernel_functiondefinition, the worker still writes the file, runs the test subprocess (~5s), gets a syntax error, then re-prompts the LLM up to 3 more times — burning ~20s and 3 extra LLM API calls per dead candidate.This produces a long iteration tail: each kernel-authoring round can stall for the entire 3-round retry budget waiting on a doomed candidate. At low fanout it's annoying; at multi-LLM × multi-bottleneck × samples > 1 the dead-candidate rate is non-trivial and the wasted budget scales linearly with worker count.
A typical trigger is an LLM response truncated mid–code-block.
Fix
Add an
ast.parseprecheck (_validate_kernel_candidate) that short-circuits with a clear error message when:kernel_functionis defined.Hooked in at two points:
_refine_kernel, so refinement aborts immediately when the LLM produces unparseable text.verify/_refine_until_pass, so a malformed initial kernel doesn't consume the 3-round retry budget.