-
Notifications
You must be signed in to change notification settings - Fork 603
feat(agent): add configurable retry_strategy for model calls #1424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Refactored hardcoded retry logic in event_loop into a flexible, hook-based retry system that allows users to customize retry behavior.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…igh-Level constructs In doing api bar raising for strands-agents/sdk-python/pull/1424, we determined that HookProvider is a too-low-level interface for exposing directly to integrators. This captures that decision & reasoning in log format and sets us up to record future decisions in a similar way going forward. See DECISIONS.md on the decision & the format
…igh-Level constructs In doing api bar raising for strands-agents/sdk-python/pull/1424, we determined that HookProvider is a too-low-level interface for exposing directly to integrators. This captures that decision & reasoning in log format and sets us up to record future decisions in a similar way going forward. See DECISIONS.md on the decision & the format
Enforces that Agent only accepts ModelRetryStrategy instances (not subclasses) for the retry_strategy parameter to prevent API confusion before a base RetryStrategy class is introduced.
# Conflicts: # src/strands/agent/agent.py
…igh-Level constructs (#420) In doing api bar raising for strands-agents/sdk-python/pull/1424, we determined that HookProvider is a too-low-level interface for exposing directly to integrators. This captures that decision & reasoning in log format and sets us up to record future decisions in a similar way going forward. See DECISIONS.md on the decision & the format Co-authored-by: Mackenzie Zastrow <[email protected]>
strands-agent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type check for retry_strategy parameter is too restrictive and will break for subclasses:
if retry_strategy and type(retry_strategy) is not ModelRetryStrategy:
raise ValueError("retry_strategy must be an instance of ModelRetryStrategy")This uses type() with is not, which fails for subclasses. Consider:
class MyCustomRetry(ModelRetryStrategy):
# Custom retry logic
pass
agent = Agent(retry_strategy=MyCustomRetry()) # ❌ Raises ValueError!Recommendation: Use isinstance() check instead:
if retry_strategy is not None and not isinstance(retry_strategy, HookProvider):
raise TypeError(f"retry_strategy must implement HookProvider, got {type(retry_strategy).__name__}")This allows:
- Subclasses of ModelRetryStrategy ✅
- Custom HookProvider implementations ✅
- Better error message with actual type ✅
- Type safety with proper inheritance check ✅
🤖 This is an experimental AI agent response from the Strands team, powered by Strands Agents. We're exploring how AI agents can help with community support and development. Your feedback helps us improve! If you'd prefer human assistance, please let us know.
Description
The current retry logic for handling ModelThrottledException is hardcoded in event_loop.py with fixed values (6 attempts, exponential backoff starting at 4s). This makes it impossible for users to customize retry behavior for their specific use cases, such as:
This PR refactors the hardcoded retry logic into a
ModelRetryStrategyclass so that folks can customize the parameters.Not Included:
The PR does not introduce a
RetryStrategybase class. I started to do so, but am deferring it because:ModelRetryStrategyprovides enough benefit to allow folks to customize the agent loop without blocking on a more extensible designPublic API Changes
Added a new
retry_strategyparameter toAgent.__init__():The
retry_strategyparameter only acceptsModelRetryStrategy, no derived classes. We've been discussing aRetryStrategybase class that is more abstract and supports additional exception types, but I'm punting on that as it requires additional design work whereas this provides immediate benefit to callers attempting to custimize the current agent-loop retry behavior.For now, alternative retry strategies can be implemented by creating a hook provider that sets
event.retry = Trueon theAfterModelCallEventwhen a retry should occur.Backwards Compatibility
Retry delay
The general default behavior is unchanged — agents still retry up to 5 times (6 attempts in total) with the same exponential backoff. The
EventLoopThrottleEventandForceStopEventare still emitted during retries, maintaining backwards compatibility with existing hooks that listen for this event.The exact delay times have changed!. Because of a bug in the original logic, the initial delay was actually doubled the first time it executed (see
test_agent_events.pyfor the test changes to accomidate this). Previous to these changes, the delay(s) were:Afer these changes, the delays are:
I think this are okay changes to make, however.
Default retry behavior
The default retry behavior also reads from
event_loop.MAX_ATTEMPTSetc so that anyone who was previously modifying those constants will continue to do soImplementation Decisions
EventLoopThrottleEventandForceStopEventevents as we used to.ForceStopEventwhenever an exception bubbles out of the model invocationretry_strategyso that as hooks are expanded to allow retrying tools, we can also enable tool retry strategiesModelRetryStrategysince it's only focused on model retries - in the future we might vend other strategies, but we can add a new strategy rather than attempting to fit it all into this one.Related Issues
Documentation PR
TODO once we align on this approach
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.