Skip to content

fix: Agent max_retry_limit config is silently ignored; tool failures have no retry/backoff path#2081

Merged
MervinPraison merged 3 commits into
mainfrom
claude/issue-1809-20260603-0711
Jun 22, 2026
Merged

fix: Agent max_retry_limit config is silently ignored; tool failures have no retry/backoff path#2081
MervinPraison merged 3 commits into
mainfrom
claude/issue-1809-20260603-0711

Conversation

@MervinPraison

Copy link
Copy Markdown
Owner

Fixes #1809

Auto-opened from Claude triage branch claude/issue-1809-20260603-0711.

Made with Cursor

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/issue-1809-20260603-0711

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust exponential backoff and jitter retry mechanisms for both tool execution and guardrail validation within the agent framework. It adds a new BackoffPolicy utility, updates ExecutionConfig to support and validate retry parameters, and integrates this retry logic across synchronous and asynchronous execution paths. The review feedback highlights two important issues: first, a bug where self._execution_config is incorrectly accessed as self.execution_config in tool_execution.py, causing it to silently fall back to default values; and second, a gap in error handling where exceptions raised outside the retry loop (such as in hooks) bypass the ToolExecutionError wrapping.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

logging.warning(f"Tool {function_name} timed out after {tool_timeout}s")
result = {"error": f"Tool timed out after {tool_timeout}s", "timeout": True}
# Apply tool retry logic with exponential backoff
execution_config = getattr(self, 'execution_config', None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The execution_config attribute is stored as self._execution_config (with a leading underscore) in the Agent class initialization. Accessing self.execution_config without the underscore will always return None, causing the tool retry logic to silently fall back to the default hardcoded values instead of using the user-configured settings.

Suggested change
execution_config = getattr(self, 'execution_config', None)
execution_config = getattr(self, '_execution_config', None)

Comment on lines +461 to +465
except Exception as e:
# Emit tool call end with error
# Emit tool call end with error for exceptions that escape the retry loop
_duration_ms = (_time.time() - _tool_start_time) * 1000
_trace_emitter.tool_call_end(self.name, function_name, None, _duration_ms, str(e))

# Gap 3a fix: Wrap exceptions in ToolExecutionError for better observability
is_retryable = not isinstance(e, (ValueError, TypeError, AttributeError))
raise ToolExecutionError(
f"Tool '{function_name}' failed: {e}",
tool_name=function_name,
agent_id=self.name,
is_retryable=is_retryable,
) from e
raise # Re-raise the exception as it's already properly wrapped

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Any exceptions raised outside of the retry loop (such as in the BEFORE_TOOL or AFTER_TOOL hooks) will propagate to this outer except block. Since they are not wrapped in ToolExecutionError, simply calling raise will bypass the custom error wrapping and break the uniform error handling/observability contract (Gap 3a fix) for those exceptions. We should check if the exception is already a ToolExecutionError and wrap it if not.

        except Exception as e:
            # Emit tool call end with error for exceptions that escape the retry loop
            _duration_ms = (_time.time() - _tool_start_time) * 1000
            _trace_emitter.tool_call_end(self.name, function_name, None, _duration_ms, str(e))
            if not isinstance(e, ToolExecutionError):
                is_retryable = not isinstance(e, (ValueError, TypeError, AttributeError))
                raise ToolExecutionError(
                    f"Tool '{function_name}' failed: {e}",
                    tool_name=function_name,
                    agent_id=self.name,
                    is_retryable=is_retryable,
                ) from e
            raise

@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds exponential-backoff retry logic for tool failures on the synchronous execution path and wires guardrail retries to the same BackoffPolicy. It fixes the attribute-name mismatch (execution_config_execution_config) that caused ExecutionConfig retry settings to be silently ignored.

  • tool_execution.py: Replaces the single-shot tool call with a retry loop (range(1, max_retry_limit + 2)) that catches ToolExecutionError, applies BackoffPolicy.delay (exponential + jitter), and distinguishes retryable errors (circuit open, timeout) from non-retryable ones.
  • feature_configs.py: Adds retry_initial_delay, retry_backoff_factor, and retry_jitter fields to ExecutionConfig with __post_init__ validation — but the validation is placed after the early-return guards for the context_compaction deprecation path, so it is silently skipped for every normal internal instantiation.
  • agent.py: Stores _execution_config on self, imports BackoffPolicy, and injects exponential backoff into both sync and async guardrail retry loops; the async tool-execution path (execute_tool_async) remains unprotected.

Confidence Score: 3/5

The core retry logic is sound and the attribute-name bug is fixed, but the newly added input validation in ExecutionConfig.post_init is unreachable in every normal agent initialization because it is placed after early-return blocks that fire for all internal callers.

The post_init validation for retry_initial_delay, retry_backoff_factor, and retry_jitter is silently skipped in normal operation — internal callers set _context_compaction_internal to True on the first call and all subsequent instantiations return before reaching the validation block. Invalid values will pass through without error.

src/praisonai-agents/praisonaiagents/config/feature_configs.py — the validation block in post_init needs to be moved before the context_compaction early-return guards.

Important Files Changed

Filename Overview
src/praisonai-agents/praisonaiagents/agent/tool_execution.py Adds BackoffPolicy class and full retry loop for tool execution; sync path now honours max_retry_limit, but async tool-execution path is untouched and still has no retries.
src/praisonai-agents/praisonaiagents/agent/agent.py Stores _execution_config on self (fixing the attribute-name mismatch), wires BackoffPolicy into both sync and async guardrail retry loops, and seeds a default ExecutionConfig when no execution param is provided. Adds an unused import random.
src/praisonai-agents/praisonaiagents/config/feature_configs.py Adds retry_initial_delay, retry_backoff_factor, retry_jitter fields with validation in post_init; however the validation block is placed after early-return guards for the context_compaction deprecation path, meaning it is silently skipped for all normal internal ExecutionConfig instantiations.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[execute_tool called] --> B{blocked_result?}
    B -- yes --> C[result = blocked_result]
    B -- no --> D[Read _execution_config from self]
    D --> E{_execution_config found?}
    E -- yes --> F[Use config retry params]
    E -- no --> G[Use hardcoded defaults]
    F --> H[loop: attempt in range 1..max_retry_limit+2]
    G --> H
    H --> I{tool_timeout set?}
    I -- yes --> J[Submit to ThreadPoolExecutor with timeout]
    I -- no --> K[_execute_tool_with_circuit_breaker]
    J --> L{TimeoutError?}
    L -- yes --> M[result = error dict timeout=True]
    L -- no --> K
    K --> N{result is error dict?}
    M --> N
    N -- circuit_open or timeout --> O[raise ToolExecutionError is_retryable=True]
    N -- other error dict or success --> P[break - pass to LLM]
    O --> Q{retryable AND attempts left?}
    Q -- yes --> R[BackoffPolicy.delay / time.sleep]
    R --> H
    Q -- no --> S[raise ToolExecutionError]
    P --> T[wrap_if_external / truncate output]
    T --> U[Return result to LLM]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[execute_tool called] --> B{blocked_result?}
    B -- yes --> C[result = blocked_result]
    B -- no --> D[Read _execution_config from self]
    D --> E{_execution_config found?}
    E -- yes --> F[Use config retry params]
    E -- no --> G[Use hardcoded defaults]
    F --> H[loop: attempt in range 1..max_retry_limit+2]
    G --> H
    H --> I{tool_timeout set?}
    I -- yes --> J[Submit to ThreadPoolExecutor with timeout]
    I -- no --> K[_execute_tool_with_circuit_breaker]
    J --> L{TimeoutError?}
    L -- yes --> M[result = error dict timeout=True]
    L -- no --> K
    K --> N{result is error dict?}
    M --> N
    N -- circuit_open or timeout --> O[raise ToolExecutionError is_retryable=True]
    N -- other error dict or success --> P[break - pass to LLM]
    O --> Q{retryable AND attempts left?}
    Q -- yes --> R[BackoffPolicy.delay / time.sleep]
    R --> H
    Q -- no --> S[raise ToolExecutionError]
    P --> T[wrap_if_external / truncate output]
    T --> U[Return result to LLM]
Loading

Reviews (3): Last reviewed commit: "fix: correct critical retry logic bugs i..." | Re-trigger Greptile

Comment thread src/praisonai-agents/praisonaiagents/agent/tool_execution.py Outdated
Comment thread src/praisonai-agents/praisonaiagents/agent/tool_execution.py Outdated
Comment on lines +284 to +286
def execute_with_context():
with with_injection_context(state):
return self._execute_tool_with_circuit_breaker(function_name, arguments)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Late-binding closure in retry loop captures function_name/arguments at call time

execute_with_context is re-defined on every iteration without explicitly capturing the current loop values. Because neither variable is reassigned inside the loop this works correctly today, but it is a fragile pattern — any future refactor that reassigns those names between iterations would silently pass wrong values to the submitted thread. Capturing them explicitly (e.g. def execute_with_context(fn=function_name, args=arguments): ...) makes the intent clear and resilient to future changes.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@MervinPraison

Copy link
Copy Markdown
Owner Author

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:

  1. Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
  2. Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
  3. Performance: Any module-level heavy imports? Hot-path regressions?
  4. Tests: Are tests included? Do they cover the changes adequately?
  5. Backward compat: Any public API changes without deprecation?
  6. Code quality: DRY violations, naming conventions, error handling?
  7. Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
  8. Suggest specific improvements with code examples where possible

@MervinPraison

Copy link
Copy Markdown
Owner Author

@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding.

Phase 1: Review per AGENTS.md

  1. Protocol-driven: check heavy implementations vs core SDK
  2. Backward compatible: ensure zero feature regressions
  3. Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

@praisonai-triage-agent

praisonai-triage-agent Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @MervinPraison's task —— View job


✅ Final Architecture Review & Fixes Complete

Phase 1: Architecture Review (per AGENTS.md)

  • Review protocol-driven architecture compliance ✅
  • Check backward compatibility ✅
  • Assess performance impact ✅

Findings:

  • ✅ Changes are minimal and focused on retry logic
  • ✅ No heavy implementations added to core SDK
  • ✅ No module-level heavy imports added
  • ✅ Backward compatible (ExecutionConfig extends existing functionality)
  • ✅ No hot-path regressions (retry only on failure)

Phase 2: Fix Valid Issues

Critical bugs fixed in commit abc1734:

  1. P1: Attribute name mismatch

    • Fixed line 260: 'execution_config''_execution_config'
    • User-configured retry settings now properly used
  2. P1: Off-by-one retry count issue

    • Fixed retry logic: max_retry_limit now correctly means number of retries
    • max_retry_limit=0 → 1 attempt (no retries)
    • max_retry_limit=1 → 2 attempts (1 retry)
    • max_retry_limit=2 → 3 attempts (2 retries)
  3. P2: Missing error wrapping

    • Fixed line 465-479: Exceptions from hooks now properly wrapped in ToolExecutionError
    • Ensures uniform error handling and observability
  • Fix attribute name mismatch (execution_config_execution_config) ✅
  • Fix off-by-one retry count issue ✅
  • Fix error wrapping for exceptions outside retry loop ✅
  • Test changes locally ✅

Phase 3: Final Verdict

  • Push fixes to branch ✅
  • Provide final recommendation ✅

🎯 Final Recommendation: APPROVED

This PR successfully implements exponential backoff retry logic for tool execution failures and guardrail validation retries. All critical issues identified by reviewers have been fixed:

Files Modified:

  • src/praisonai-agents/praisonaiagents/agent/tool_execution.py - Fixed all 3 critical issues
  • src/praisonai-agents/praisonaiagents/agent/agent.py - No changes needed (backoff logic was correct)
  • src/praisonai-agents/praisonaiagents/config/feature_configs.py - No changes needed (config was correct)

What I didn't change (and why):

  • Did not modify agent.py or feature_configs.py as they were already correct
  • Did not add tests (would require separate test infrastructure setup)
  • Did not modify praisonai-rust or praisonai-ts packages (out of scope per instructions)

The PR is now ready to merge. The retry logic will prevent transient tool failures from breaking agent execution, improving overall reliability.


Comment on lines +259 to +370
# Apply tool retry logic with exponential backoff
execution_config = getattr(self, '_execution_config', None)
if execution_config is None:
# Fall back to reading individual config attributes for backward compatibility
max_retry_limit = getattr(self, 'max_retry_limit', 2)
retry_initial_delay = 1.0
retry_backoff_factor = 2.0
retry_jitter = 0.1
else:
with with_injection_context(state):
result = self._execute_tool_with_circuit_breaker(function_name, arguments)
max_retry_limit = execution_config.max_retry_limit
retry_initial_delay = execution_config.retry_initial_delay
retry_backoff_factor = execution_config.retry_backoff_factor
retry_jitter = execution_config.retry_jitter

result = None
last_exception = None

# max_retry_limit is the number of retries (not total attempts)
# So total attempts = 1 (initial) + max_retry_limit (retries)
for attempt in range(1, max_retry_limit + 2):
try:
# P8/G11: Apply tool timeout if configured
tool_timeout = getattr(self, '_tool_timeout', None)
if tool_timeout and tool_timeout > 0:
# Use copy_context to preserve injection context in executor thread
ctx = contextvars.copy_context()

def execute_with_context():
with with_injection_context(state):
return self._execute_tool_with_circuit_breaker(function_name, arguments)

# Use reusable executor to prevent resource leaks
if not hasattr(self, '_tool_executor'):
self._tool_executor = concurrent.futures.ThreadPoolExecutor(
max_workers=2, thread_name_prefix=f"tool-{self.name}"
)

future = self._tool_executor.submit(ctx.run, execute_with_context)
try:
result = future.result(timeout=tool_timeout)
except concurrent.futures.TimeoutError:
future.cancel()
logging.warning(f"Tool {function_name} timed out after {tool_timeout}s")
result = {"error": f"Tool timed out after {tool_timeout}s", "timeout": True}
else:
with with_injection_context(state):
result = self._execute_tool_with_circuit_breaker(function_name, arguments)

# Check if the result indicates a retryable error
if isinstance(result, dict) and result.get("error"):
# Check if this is a circuit breaker error (always retryable)
if result.get("circuit_open"):
raise ToolExecutionError(
result["error"],
tool_name=function_name,
agent_id=self.name,
is_retryable=True,
)
# Check if this is a timeout error (retryable)
elif result.get("timeout"):
raise ToolExecutionError(
result["error"],
tool_name=function_name,
agent_id=self.name,
is_retryable=True,
)
# For other error dicts, treat as non-retryable unless specified
else:
# Success path - return the result
break
else:
# Success path - return the result
break

except ToolExecutionError as e:
last_exception = e
# Only retry if the error is marked as retryable and we have retries left
# attempt starts at 1, so (attempt - 1) gives us the retry count
if not e.is_retryable or (attempt - 1) >= max_retry_limit:
raise e

# Calculate delay for exponential backoff
delay = BackoffPolicy.delay(attempt, retry_initial_delay, retry_backoff_factor, retry_jitter)
logging.warning(
f"Tool {function_name} failed (attempt {attempt}/{max_retry_limit + 1}): {e}. "
f"Retrying in {delay:.2f}s..."
)
time.sleep(delay)

except Exception as e:
# Wrap unexpected exceptions in ToolExecutionError
# Most tool errors are considered retryable unless they're programming errors
is_retryable = not isinstance(e, (ValueError, TypeError, AttributeError))
tool_error = ToolExecutionError(
f"Tool '{function_name}' failed: {e}",
tool_name=function_name,
agent_id=self.name,
is_retryable=is_retryable,
)
last_exception = tool_error

# attempt starts at 1, so (attempt - 1) gives us the retry count
if not is_retryable or (attempt - 1) >= max_retry_limit:
raise tool_error from e

# Calculate delay for exponential backoff
delay = BackoffPolicy.delay(attempt, retry_initial_delay, retry_backoff_factor, retry_jitter)
logging.warning(
f"Tool {function_name} failed (attempt {attempt}/{max_retry_limit + 1}): {e}. "
f"Retrying in {delay:.2f}s..."
)
time.sleep(delay)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Retry logic bypassed on the async execution path

The new retry loop lives entirely inside _execute_tool_with_context, which is only reached through the synchronous execute_tool_execute_tool_with_context call chain. The async path used by agents running astart() binds execute_tool_fn to self.execute_tool_async (chat_mixin.py lines 1021 and 2173), which is a completely separate implementation in execution_mixin.py that calls the tool function directly — there is no retry loop there.

As a result, any agent that goes through the async LLM completion path still silently ignores max_retry_limit for tool failures, which is the same bug this PR set out to fix. The sync path is now protected; the async path is not.

Comment on lines +758 to +765
def __post_init__(self) -> None:
"""Validate retry configuration parameters."""
if self.retry_initial_delay <= 0:
raise ValueError("ExecutionConfig.retry_initial_delay must be positive.")
if self.retry_backoff_factor < 1.0:
raise ValueError("ExecutionConfig.retry_backoff_factor must be >= 1.0.")
if self.retry_jitter < 0:
raise ValueError("ExecutionConfig.retry_jitter must be non-negative.")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing max_retry_limit lower-bound guard in __post_init__

The new validation checks the three delay fields but not max_retry_limit itself. A caller passing max_retry_limit=-1 would produce range(1, 1) — an empty loop — so result stays None and is silently returned to the LLM with no error and no retry. Adding if self.max_retry_limit < 0: raise ValueError(...) makes the contract explicit.

Suggested change
def __post_init__(self) -> None:
"""Validate retry configuration parameters."""
if self.retry_initial_delay <= 0:
raise ValueError("ExecutionConfig.retry_initial_delay must be positive.")
if self.retry_backoff_factor < 1.0:
raise ValueError("ExecutionConfig.retry_backoff_factor must be >= 1.0.")
if self.retry_jitter < 0:
raise ValueError("ExecutionConfig.retry_jitter must be non-negative.")
def __post_init__(self) -> None:
"""Validate retry configuration parameters."""
if self.retry_initial_delay <= 0:
raise ValueError("ExecutionConfig.retry_initial_delay must be positive.")
if self.retry_backoff_factor < 1.0:
raise ValueError("ExecutionConfig.retry_backoff_factor must be >= 1.0.")
if self.retry_jitter < 0:
raise ValueError("ExecutionConfig.retry_jitter must be non-negative.")
if self.max_retry_limit < 0:
raise ValueError("ExecutionConfig.max_retry_limit must be non-negative.")

@MervinPraison

Copy link
Copy Markdown
Owner Author

@claude Merge conflict vs main (5f39d667eedd) — DIRTY / CONFLICTING

Older branch (claude/issue-1809-20260603). Rebase onto main, wire max_retry_limit through current ExecutionConfig / tool execution path, then git push --force-with-lease.

Conflict files

  • src/praisonai-agents/praisonaiagents/agent/agent.py (content)
  • src/praisonai-agents/praisonaiagents/agent/tool_execution.py (content)
  • src/praisonai-agents/praisonaiagents/config/feature_configs.py (content)

Representative hunks

Overlaps #2080 on agent.py + tool_execution.py. Additional conflict in feature_configs.py where main consolidated execution/tool config.

Resolution hint: after rebase, honour consolidated ExecutionConfig on main; implement retry/backoff there rather than reviving deprecated standalone params.

@praisonai-triage-agent

praisonai-triage-agent Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @MervinPraison's task —— View job


✅ Successfully resolved merge conflicts with main

I've successfully rebased this PR onto main and resolved all conflicts while preserving the retry/backoff logic implementation.

📋 Completed Tasks:

  • Set up git identity and authentication
  • Read AGENTS.md for architecture guidelines (not found, proceeding)
  • Check current branch status
  • Fetch latest main and start rebase
  • Resolve conflicts in tool_execution.py (1st commit)
  • Resolve conflicts in agent.py (2nd commit)
  • Resolve conflicts in feature_configs.py (2nd commit)
  • Resolve conflicts in tool_execution.py (3rd commit)
  • Complete rebase
  • Run tests on changed files
  • Push resolved branch

🔧 Resolution Details:

Files resolved:

  1. tool_execution.py:

    • Kept loop guard logic from main
    • Integrated retry/backoff implementation with proper attribute access (_execution_config)
    • Fixed retry count logic to use (attempt - 1) >= max_retry_limit
    • Cleaned up exception wrapping logic
  2. agent.py:

    • Merged parallel_tool_calls default handling
    • Kept ExecutionConfig instantiation for non-execution case
  3. feature_configs.py:

    • Combined __post_init__ methods to include both context_compaction handling and retry parameter validation

✅ Tests passed:

  • Agent correctly stores ExecutionConfig with retry parameters
  • BackoffPolicy calculates exponential delays correctly
  • Attribute access fixed (_execution_config instead of execution_config)

The PR is now cleanly rebased on main and ready for review.


praisonai-triage-agent Bot and others added 3 commits June 22, 2026 09:46
…failures (fixes #1809)

- Add retry configuration fields to ExecutionConfig (retry_initial_delay, retry_backoff_factor, retry_jitter)
- Implement BackoffPolicy class for exponential backoff with jitter
- Add retry loop to tool execution that respects max_retry_limit from ExecutionConfig
- Honor ToolExecutionError.is_retryable for retry decisions
- Add backoff delays to guardrail retries (sync and async versions)
- Tool failures now retry with exponential backoff instead of immediate termination
- Guardrail retries no longer hammer LLM API with immediate re-calls

Co-authored-by: MervinPraison <MervinPraison@users.noreply.github.com>
- Wire ExecutionConfig into guardrail retries by storing _execution_config on Agent instance
- Add max_delay cap (60s) to BackoffPolicy to prevent unbounded exponential growth
- Add validation for retry parameters in ExecutionConfig.__post_init__
- Use shared BackoffPolicy in guardrail retry logic to eliminate code duplication
- Fixes issues identified by Greptile, CodeRabbit, and Copilot reviewers

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
- Fix attribute name mismatch: '_execution_config' not 'execution_config'
  This was causing all user-configured retry settings to be silently ignored
- Fix off-by-one retry count: max_retry_limit now correctly means number of retries
  With max_retry_limit=1, we now get 1 retry (2 total attempts) instead of 0 retries
- Fix error wrapping: exceptions from hooks are now properly wrapped in ToolExecutionError
  This ensures uniform error handling and observability

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
@praisonai-triage-agent praisonai-triage-agent Bot force-pushed the claude/issue-1809-20260603-0711 branch from abc1734 to d7dabae Compare June 22, 2026 09:51
Comment on lines 771 to 774
def __post_init__(self) -> None:
"""Post-initialization processing with deprecation warnings."""
"""Post-initialization processing with deprecation warnings and validation."""
# Handle context_compaction serialization round-trip
if isinstance(self.context_compaction, dict):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Retry validation is silently bypassed on every normal instantiation. __post_init__ returns early (lines 784, 786, 801) whenever context_compaction is False (the default) and the caller is internal library code — which is the case for every ExecutionConfig() created during agent startup. The three raise ValueError guards at lines 815–820 are therefore never reached in practice, so callers can pass retry_initial_delay=-1.0 or retry_backoff_factor=0.5 without error. Moving the validation before the context_compaction guard block ensures it always runs.

Suggested change
def __post_init__(self) -> None:
"""Post-initialization processing with deprecation warnings."""
"""Post-initialization processing with deprecation warnings and validation."""
# Handle context_compaction serialization round-trip
if isinstance(self.context_compaction, dict):
def __post_init__(self) -> None:
"""Post-initialization processing with deprecation warnings and validation."""
# Validate retry configuration parameters (must run before any early returns)
if self.retry_initial_delay <= 0:
raise ValueError("ExecutionConfig.retry_initial_delay must be positive.")
if self.retry_backoff_factor < 1.0:
raise ValueError("ExecutionConfig.retry_backoff_factor must be >= 1.0.")
if self.retry_jitter < 0:
raise ValueError("ExecutionConfig.retry_jitter must be non-negative.")
# Handle context_compaction serialization round-trip
if isinstance(self.context_compaction, dict):

@MervinPraison MervinPraison merged commit 287db48 into main Jun 22, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent max_retry_limit config is silently ignored; tool failures have no retry/backoff path

1 participant