⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@kevmyung
Copy link

@kevmyung kevmyung commented Jan 8, 2026

Summary

  • Add CacheConfig with strategy="auto" for automatic prompt caching in BedrockModel
  • Cache points are injected at the end of the last assistant message before each model call
  • Supports all Claude models on Bedrock that have prompt caching capability

Usage

from strands import Agent
from strands.models import BedrockModel, CacheConfig

model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    cache_config=CacheConfig(strategy="auto")
)
agent = Agent(model=model)

Test plan

  • Unit tests for cache point injection logic
  • Integration test with Claude models on Bedrock confirming cache hits

Closes #1432

Add CacheConfig with strategy="auto" for BedrockModel to automatically
inject cache points at the end of assistant messages in multi-turn
conversations.

- Add CacheConfig dataclass in model.py with strategy field
- Add supports_caching property to check Claude model compatibility
- Implement _inject_cache_point() for automatic cache point management
- Export CacheConfig from models/__init__.py

Closes strands-agents#1432
standardized way to configure and process requests for different AI model providers.
Attributes:
cache_config: Optional configuration for prompt caching.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we should add this to the base model class just yet. Not every model provider will have caching support, so this may end up being a per-model provider feature

self.update_config(**model_config)

# Set cache_config on base Model class for Agent to detect
self.cache_config = self.config.get("cache_config")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont need to set this on the base class parameter if its already in self.config

Returns True for Claude models on Bedrock.
"""
model_id = self.config.get("model_id", "").lower()
return "claude" in model_id or "anthropic" in model_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like not every claude model, and some nova models support caching: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-models

Couple of questions:

  • Does the cache strategy for different models change? Also, I see there are some docs mentioning simplified cache strategies for claude models, so does the approach in this pr still make sense?
  • If we add cachePoints for models that arent in this list, what happens? If bedrock just ignores the checkpoints, do we really need this check?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The supports_caching check is intentional. Nova models accept cachePoints but don't provide intelligent cache matching like Claude - you'd pay for cache writes without getting cache hits. This guard prevents unexpected cost increases for non-Claude models

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a warning here when cache_config is enabled, but we dont cache since the model_id doesnt match

Comment on lines 318 to 319
This enables prompt caching for multi-turn conversations by placing a single
cache point that covers system prompt, tools, and conversation history.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? Tools and system prompts both have their own cache points defined in the converse stream model. I thought this was just to add automatic caching for the messages array

Copy link
Author

@kevmyung kevmyung Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that tools and system prompts have their own cache point options. The key insight is that Anthropic sends prompts in this order: tools → system → messages. (Link) When a cachePoint is placed at the end of the last assistant message, the cached prefix automatically includes everything before it (system prompt + tools + prior conversation). So a single cachePoint in messages effectively caches the entire context without needing separate cache points for system/tools.

That said, you can still place explicit cache points on system/tools as a fallback for cases like sliding window truncation where message history changes might cause cache misses

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, your explanation, and the the FAQ on this page helped me understand this: https://platform.claude.com/docs/en/build-with-claude/prompt-caching#faq

That being said, this comment is a bit misleading, and I would instead give a bit more of a general comment here.

Additionally, we will need to update our documentation to reflect the new caching behavior here. Would you be interested in making that update too? https://github.com/strands-agents/docs/blob/main/docs/user-guide/concepts/model-providers/amazon-bedrock.md?plain=1#L418

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the docs update, I'll create a separate PR after this one is merged.

This enables prompt caching for multi-turn conversations by placing a single
cache point that covers system prompt, tools, and conversation history.
The cache point is automatically moved to the latest assistant message on each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in a previous comment, but how does this compare to the simplified caching mentioned in the docs here: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-simplified

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation actually relies on that simplified caching. With simplified caching, we only need to move the cachePoint to the end of the last assistant message on each turn. Anthropic automatically matches the overlapping prefix between the previous cachePoint position and the new one. That's also why we're limiting this strategy to Claude models until other providers support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the explanation! I would love for this cache_config to be applicable to both anthropic and nova models, and we can change the strategy based on the modelid. Thats out of scope for this pr, but just something to keep in mind!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Nova models could also be supported, though the cache checkpoint implementation would be more complex than Claude's.

cache_tools: Cache point type for tools
cache_prompt: Cache point type for the system prompt (deprecated, use cache_config)
cache_config: Configuration for prompt caching. Use CacheConfig(strategy="auto") for automatic caching.
cache_tools: Cache point type for tools (deprecated, use cache_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_prompt is deprecated, but I dont think we should to deprecate cache_tools unless cache_config replaces its functionality. For now I would keep cache_tools undeprecated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll keep cache_tools undeprecated. Out of curiosity, why was only cache_prompt (system prompt) deprecated but not cache_tools?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pr goes into more detail (ref), but we originally represented system prompt as just a string, so you couldnt inject cachepoints easily. cache_prompt was our workaround for that until the pr I referenced actually fixed it.

Comment on lines 340 to 341
if not isinstance(content, list):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not sure we really need this check. content should always be a list

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will remove

continue

for block_idx, block in enumerate(content):
if isinstance(block, dict) and "cachePoint" in block:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We dont need to re-check objects that are already typed. content blocks are always dicts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will remove

if isinstance(block, dict) and "cachePoint" in block:
cache_point_positions.append((msg_idx, block_idx))

# Step 2: If no assistant message yet, nothing to cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are cache points in the messages array already, they wont get removed since we exit early.

I might refactor this function a bit to:

  1. Loop through the entire messages array backwards
  2. Once we find encounter the first assistant message, we add a cache point there
  3. All other cache points we remove along the way
    a. Lets add a warning when we remove customer added cache points

This seems to accomplish the same thing as the proposed approach, but is a bit easier to follow. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll refactor to loop backwards as you suggested.

assert formatted_messages[2]["content"][1]["guardContent"]["image"]["format"] == "png"


def test_cache_config_auto_sets_model_attribute(bedrock_client):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add some more comprehensive unit tests here. We should make sure that cache points are inserted and deleted as intended. I would also like to see an end-to-end test (in the /tests/strands/agent/test_agent.py file) that inserts the cache points, but does not edit the actual agent.messages parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add more unit tests for cache point insertion/deletion logic, and an e2e test.

@azaylamba
Copy link

azaylamba commented Jan 12, 2026

This is an important change, thanks for this.

Returns True for Claude models on Bedrock.
"""
model_id = self.config.get("model_id", "").lower()
return "claude" in model_id or "anthropic" in model_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a warning here when cache_config is enabled, but we dont cache since the model_id doesnt match

Comment on lines 318 to 319
This enables prompt caching for multi-turn conversations by placing a single
cache point that covers system prompt, tools, and conversation history.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, your explanation, and the the FAQ on this page helped me understand this: https://platform.claude.com/docs/en/build-with-claude/prompt-caching#faq

That being said, this comment is a bit misleading, and I would instead give a bit more of a general comment here.

Additionally, we will need to update our documentation to reflect the new caching behavior here. Would you be interested in making that update too? https://github.com/strands-agents/docs/blob/main/docs/user-guide/concepts/model-providers/amazon-bedrock.md?plain=1#L418

This enables prompt caching for multi-turn conversations by placing a single
cache point that covers system prompt, tools, and conversation history.
The cache point is automatically moved to the latest assistant message on each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the explanation! I would love for this cache_config to be applicable to both anthropic and nova models, and we can change the strategy based on the modelid. Thats out of scope for this pr, but just something to keep in mind!

cache_tools: Cache point type for tools
cache_prompt: Cache point type for the system prompt (deprecated, use cache_config)
cache_config: Configuration for prompt caching. Use CacheConfig(strategy="auto") for automatic caching.
cache_tools: Cache point type for tools (deprecated, use cache_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pr goes into more detail (ref), but we originally represented system prompt as just a string, so you couldnt inject cachepoints easily. cache_prompt was our workaround for that until the pr I referenced actually fixed it.

@strands-agent
Copy link
Contributor

🎯 Review - Automatic Prompt Caching

Excellent implementation of automatic prompt caching for Bedrock! This addresses #1432 nicely and will provide significant performance and cost benefits for multi-turn conversations.

What I Really Like ✅

  1. Smart Cache Point Strategy: The _inject_cache_point() logic is elegant - automatically moving the cache point to the last assistant message ensures optimal cache utilization without manual management.

  2. Comprehensive Tests: 155 lines of new tests in test_bedrock.py covering edge cases like:

    • Cache point injection
    • Cache point movement across turns
    • Cleanup of stale cache points
  3. Backward Compatibility: Deprecating cache_prompt while still supporting it shows good API stewardship.

  4. Documentation: Clear docstrings and usage examples in the PR description.

Minor Suggestions 💡

1. Cache Point Detection Could Be More Explicit

In _inject_cache_point() around line 344, the logic for detecting an existing cache point at the right position relies on the loop continuing. Consider making this more explicit:

# Check if cache point was already found at the right position
last_assistant_content = messages[last_assistant_idx]["content"]
if last_assistant_content and "cachePoint" in last_assistant_content[-1]:
    logger.debug(f"Cache point already exists at end of last assistant message {last_assistant_idx}")
    return

# Add cache point at the end of the last assistant message
last_assistant_content.append({"cachePoint": {"type": "default"}})
logger.debug(f"Added cache point at end of assistant message {last_assistant_idx}")

2. Model Support Detection

The supports_caching property checks for "claude" or "anthropic" in the model ID. Consider if future Bedrock models might support caching:

@property
def supports_caching(self) -> bool:
    """Whether this model supports prompt caching.
    
    Returns True for Claude models on Bedrock that support caching.
    Add other models as they become available.
    """
    model_id = self.config.get("model_id", "").lower()
    # Claude 3 Opus, Sonnet, and Haiku on Bedrock support caching
    # Claude 3.5 Sonnet and later also support caching
    return "claude" in model_id or "anthropic" in model_id

3. Integration Test Clarity

The integration tests are great! One suggestion - add a comment explaining the cache hit verification:

# After second call, verify cache hit (cache_read_input_tokens > 0)
# This confirms the cache point strategy is working
assert result.metadata["converse_metrics"]["cache_read_input_tokens"] > 0

Questions for Discussion 🤔

  1. Cache Invalidation: What happens if the system prompt or tools change between calls? Does the cache automatically invalidate, or should there be explicit cache busting?

  2. Multiple Cache Points: The current strategy uses a single cache point. Are there scenarios where multiple cache points (e.g., one for system prompt, one for conversation) would be beneficial?

  3. Performance Metrics: Would it be valuable to expose cache hit/miss metrics in the AgentResult.metadata for users to monitor cache effectiveness?

CI Status

I see CI is still pending. Once it passes, this looks ready for maintainer review!

Overall Assessment

This is a high-quality PR that will be valuable for the community. The automatic cache management removes complexity from users while providing real performance benefits. Great work, @kevmyung! 🎉

🦆


🤖 This is an experimental AI agent response from the Strands team, powered by Strands Agents. We're exploring how AI agents can help with community support and development. Your feedback helps us improve! If you'd prefer human assistance, please let us know.

- Add warning when cache_config enabled but model doesn't support caching
- Make supports_caching private (_supports_caching)
- Fix log formatting to follow style guide
- Clean up tests and imports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add cache_strategy="auto" for automatic prompt caching

4 participants