Skip to content

Reduce EnvGenAgent spurious failures#782

Open
qianl-nv wants to merge 4 commits into
mainfrom
qianl/dev/agentic_retry
Open

Reduce EnvGenAgent spurious failures#782
qianl-nv wants to merge 4 commits into
mainfrom
qianl/dev/agentic_retry

Conversation

@qianl-nv

@qianl-nv qianl-nv commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Reduce failure rate of the EnvGenAgent

Detailed description

  • What was the reason for the change?
    EnvGenAgent call has a ~20% failure rate on single call due to mostly server side issues. We need to reduce this for better user experience.
  • What has been changed?
    -- max token length doubled to reduce chance of truncation (occasionally with long reasoning text)
    -- add retries to recover from spurious network issue or timeouts
  • What is the impact of this change?
    Reduces the failure rate to <1% with default prompt

@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR reduces EnvGenAgent failure rates by doubling max_tokens from 2000 to 4096 and wrapping the entire API call + response-parsing pipeline in a retry loop (default 3 retries). The previous code only caught parse/validation errors; the new structure catches all exceptions including network errors and assertion failures.

  • Retry loop (max_retries=3 default) now covers API-level failures (network errors, timeouts, empty responses, malformed JSON) — the whole client.chat.completions.create call plus response parsing is inside the try block.
  • max_tokens doubled to 4096 to prevent response truncation on long reasoning outputs.
  • Tests added for the retry path: one that succeeds on the second attempt after a ConnectionError, and one that exhausts retries and raises RuntimeError.

Confidence Score: 5/5

Safe to merge — the retry wrapper correctly covers the entire API call and response-parsing pipeline, and the new tests validate both the success-after-retry and the exhaust-retries paths.

The change is small and targeted: a retry loop around an existing API call and a token-limit bump. The retry loop now correctly covers all failure modes (network errors, empty responses, parse failures) that it previously missed. New unit tests verify the retry behavior end-to-end.

No files require special attention; both changed files are straightforward and well-tested.

Important Files Changed

Filename Overview
isaaclab_arena/agentic_environment_generation/environment_generation_agent.py Retry loop correctly wraps the full API call and response parsing; max_tokens doubled; minor: no inter-retry delay.
isaaclab_arena/tests/test_environment_generation_agent.py New unit tests cover the retry-then-succeed and exhaust-retries paths; existing no-choices test updated to assert retry count; stale @pytest.mark.flaky TODO not removed despite its trigger condition now being met.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant generate_spec
    participant OpenAI_API as OpenAI API

    Caller->>generate_spec: "generate_spec(prompt, max_retries=3)"
    loop attempt in range(1 + max_retries)
        generate_spec->>OpenAI_API: chat.completions.create(...)
        alt Success
            OpenAI_API-->>generate_spec: resp with choices
            generate_spec->>generate_spec: extract_response_text()
            generate_spec->>generate_spec: json.loads() + model_validate()
            generate_spec-->>Caller: (EnvironmentIntentSpec, raw_text)
        else Any Exception
            OpenAI_API-->>generate_spec: raises Exception
            generate_spec->>generate_spec: store last_exc, continue
        end
    end
    generate_spec-->>Caller: raise RuntimeError("failed after N attempts")
Loading

Reviews (2): Last reviewed commit: "Fix exception handling" | Re-trigger Greptile

Comment thread isaaclab_arena/agentic_environment_generation/environment_generation_agent.py Outdated
Comment thread isaaclab_arena/agentic_environment_generation/environment_generation_agent.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants