Migrating to Cache Strategy API and Persisted Memory Injection
Overview
Two changes shipped together that affect how prompt caching is configured and how memory auto-injection works.
- Prompt caching is now opt-in and provider-specific. The old
LLMConfig.caching: 'auto'field and its companions have been removed. You now choose a provider-specific strategy helper and set it onLLMConfig.cache. - Memory auto-injection no longer injects an ephemeral SYSTEM message on every step. Memories are now retrieved once per turn and persisted as a hidden
usermessage in the transcript (deduped by memory ID across turns).
1. Prompt Caching
What changed
| Old (removed) | New |
|---|---|
LLMConfig.caching: 'auto' | false | LLMConfig.cache: CacheStrategy | CacheStrategy[] |
| (TTL was not separately configurable) | anthropicCache({ ttl: '1h' }) |
applyCacheBreakpoints(messages) | applyCacheStrategies(cache, request) |
detectCacheProvider(model) | No replacement — pick the helper for your provider |
CacheBreakpointResult type | AppliedCacheResult type |
Why it changed
The old caching: 'auto' implementation tried to detect the provider from the model object and apply generic breakpoints. This was fragile and produced incorrect results for non-Anthropic providers. The new design makes caching explicit and opt-in: you pass a strategy function that knows how to annotate messages and tools for your specific provider. When cache is not set, no caching occurs — there is no silent default.
Migration steps
Before (v0.27 and earlier):
import { defineAgent } from '@helix-agents/core';
import { anthropic } from '@ai-sdk/anthropic';
const agent = defineAgent({
name: 'my-agent',
llmConfig: {
model: anthropic('claude-3-5-sonnet-20241022'),
caching: 'auto', // removed
},
// ...
});After (v0.28+):
import { defineAgent } from '@helix-agents/core';
import { anthropicCache } from '@helix-agents/core';
import { anthropic } from '@ai-sdk/anthropic';
const agent = defineAgent({
name: 'my-agent',
llmConfig: {
model: anthropic('claude-3-5-sonnet-20241022'),
cache: anthropicCache({ ttl: '5m' }), // opt-in, provider-specific
},
// ...
});Provider-specific helpers
Pick the helper that matches your model:
import { anthropicCache, openaiCache, xaiCache } from '@helix-agents/core';
// Anthropic (Claude) — places cache_control markers on system, tools, and
// rolling conversation breakpoints. Default ttl: '1h'.
cache: anthropicCache();
cache: anthropicCache({ ttl: '5m' });
// OpenAI — sets promptCacheKey from sessionId for cache affinity.
cache: openaiCache();
// xAI (Grok) — sets the x-grok-conv-id header from sessionId.
cache: xaiCache();
// Multiple strategies (applied in order):
cache: [anthropicCache(), openaiCache()];Google / Gemini does not need a helper — it uses implicit prefix caching server-side. No googleCache() exists; leave cache unset.
Removed exports
These identifiers are gone and have no direct replacement:
applyCacheBreakpoints— replaced byapplyCacheStrategies(different signature; used by custom runtimes only)detectCacheProvider— removed; pick the correct helper for your provider explicitlyCacheBreakpointResult— replaced byAppliedCacheResult
The LLMConfig.caching field is removed from the type. TypeScript will flag any remaining uses.
2. Memory Auto-Injection
What changed
Previously, when autoInject was enabled the memory system injected relevant memories as an ephemeral SYSTEM message that was prepended fresh on every LLM call but not persisted to state.messages. This meant:
- The system prompt was different every step (breaking prompt-cache affinity).
- The same memories could appear twice if the conversation was resumed from a checkpoint.
Now, memories are persisted as a hidden user message (role 'user', metadata MEMORY_INJECTION: true, HIDDEN: true) appended to state.messages once per turn. The message is never mutated — fresh memories on later turns produce a new message. The stable prefix (system prompt + tools + conversation history) is never invalidated by memory changes, making the transcript cache-friendly.
Impact on applications
For most applications no code change is required. The autoInject config, search_memory/save_memory tools, and generation config are all unchanged.
Behavioural differences to be aware of:
| Before | After |
|---|---|
| Memory injected into SYSTEM message, ephemeral (not stored) | Memory injected as hidden user message, persisted to transcript |
| Same memories re-injected every LLM call within a turn | Deduped by memory ID: each unique memory injected at most once per session |
| Memory injection invalidates cached system-prompt prefix | Append-only hidden message preserves cached prefix |
state.messages never contains memory injection messages | state.messages contains hidden user messages with MEMORY_INJECTION metadata |
Filtering memory messages from UI / queries
If your application reads state.messages or calls stateStore.getMessages() to display the conversation, you should filter out hidden memory messages:
import { COMMON_METADATA_KEYS } from '@helix-agents/core';
const visibleMessages = messages.filter((m) => m.metadata?.[COMMON_METADATA_KEYS.HIDDEN] !== true);Alternatively, filter specifically for memory-injection messages:
const nonMemoryMessages = messages.filter(
(m) => m.metadata?.[COMMON_METADATA_KEYS.MEMORY_INJECTION] !== true
);The HIDDEN key is the broader filter (also covers other framework-internal messages). The MEMORY_INJECTION key is specific to auto-injected memory messages.
New core exports (custom runtimes only)
If you maintain a custom runtime that manually calls memory injection, use the new exports:
import { collectInjectedMemoryIds, COMMON_METADATA_KEYS } from '@helix-agents/core';
// MemoryManager.buildInjectionMessage is on the memory package:
import { MemoryManager } from '@helix-agents/memory';
// Collect memory IDs already in the transcript (for dedup):
const excludeIds = collectInjectedMemoryIds(state.messages);
// Build a new injection message (returns null if no fresh memories):
const memoryMessage = await memoryManager.buildInjectionMessage({
query: userMessage,
context,
excludeMemoryIds: excludeIds,
});
if (memoryMessage) {
await stateStore.appendMessages(sessionId, [memoryMessage]);
}collectInjectedMemoryIds and COMMON_METADATA_KEYS.MEMORY_INJECTION are exported from @helix-agents/core. MemoryManager.buildInjectionMessage is on @helix-agents/memory.