Migrating to Cache Strategy API and Persisted Memory Injection

Overview

Two changes shipped together that affect how prompt caching is configured and how memory auto-injection works.

Prompt caching is now opt-in and provider-specific. The old LLMConfig.caching: 'auto' field and its companions have been removed. You now choose a provider-specific strategy helper and set it on LLMConfig.cache.
Memory auto-injection no longer injects an ephemeral SYSTEM message on every step. Memories are now retrieved once per turn and persisted as a hidden user message in the transcript (deduped by memory ID across turns).

1. Prompt Caching

What changed

Old (removed)	New
`LLMConfig.caching: 'auto' \| false`	`LLMConfig.cache: CacheStrategy \| CacheStrategy[]`
(TTL was not separately configurable)	`anthropicCache({ ttl: '1h' })`
`applyCacheBreakpoints(messages)`	`applyCacheStrategies(cache, request)`
`detectCacheProvider(model)`	No replacement — pick the helper for your provider
`CacheBreakpointResult` type	`AppliedCacheResult` type

Why it changed

The old caching: 'auto' implementation tried to detect the provider from the model object and apply generic breakpoints. This was fragile and produced incorrect results for non-Anthropic providers. The new design makes caching explicit and opt-in: you pass a strategy function that knows how to annotate messages and tools for your specific provider. When cache is not set, no caching occurs — there is no silent default.

Migration steps

Before (v0.27 and earlier):

typescript

import { defineAgent } from '@helix-agents/core';
import { anthropic } from '@ai-sdk/anthropic';

const agent = defineAgent({
  name: 'my-agent',
  llmConfig: {
    model: anthropic('claude-3-5-sonnet-20241022'),
    caching: 'auto', // removed
  },
  // ...
});

After (v0.28+):

typescript

import { defineAgent } from '@helix-agents/core';
import { anthropicCache } from '@helix-agents/core';
import { anthropic } from '@ai-sdk/anthropic';

const agent = defineAgent({
  name: 'my-agent',
  llmConfig: {
    model: anthropic('claude-3-5-sonnet-20241022'),
    cache: anthropicCache({ ttl: '5m' }), // opt-in, provider-specific
  },
  // ...
});

Provider-specific helpers

Pick the helper that matches your model:

typescript

import { anthropicCache, openaiCache, xaiCache } from '@helix-agents/core';

// Anthropic (Claude) — places cache_control markers on system, tools, and
// rolling conversation breakpoints. Default ttl: '1h'.
cache: anthropicCache();
cache: anthropicCache({ ttl: '5m' });

// OpenAI — sets promptCacheKey from sessionId for cache affinity.
cache: openaiCache();

// xAI (Grok) — sets the x-grok-conv-id header from sessionId.
cache: xaiCache();

// Multiple strategies (applied in order):
cache: [anthropicCache(), openaiCache()];

Google / Gemini does not need a helper — it uses implicit prefix caching server-side. No googleCache() exists; leave cache unset.

Removed exports

These identifiers are gone and have no direct replacement:

applyCacheBreakpoints — replaced by applyCacheStrategies (different signature; used by custom runtimes only)
detectCacheProvider — removed; pick the correct helper for your provider explicitly
CacheBreakpointResult — replaced by AppliedCacheResult

The LLMConfig.caching field is removed from the type. TypeScript will flag any remaining uses.

2. Memory Auto-Injection

What changed

Previously, when autoInject was enabled the memory system injected relevant memories as an ephemeral SYSTEM message that was prepended fresh on every LLM call but not persisted to state.messages. This meant:

The system prompt was different every step (breaking prompt-cache affinity).
The same memories could appear twice if the conversation was resumed from a checkpoint.

Now, memories are persisted as a hidden user message (role 'user', metadata MEMORY_INJECTION: true, HIDDEN: true) appended to state.messages once per turn. The message is never mutated — fresh memories on later turns produce a new message. The stable prefix (system prompt + tools + conversation history) is never invalidated by memory changes, making the transcript cache-friendly.

Impact on applications

For most applications no code change is required. The autoInject config, search_memory/save_memory tools, and generation config are all unchanged.

Behavioural differences to be aware of:

Before	After
Memory injected into SYSTEM message, ephemeral (not stored)	Memory injected as hidden `user` message, persisted to transcript
Same memories re-injected every LLM call within a turn	Deduped by memory ID: each unique memory injected at most once per session
Memory injection invalidates cached system-prompt prefix	Append-only hidden message preserves cached prefix
`state.messages` never contains memory injection messages	`state.messages` contains hidden `user` messages with `MEMORY_INJECTION` metadata

Filtering memory messages from UI / queries

If your application reads state.messages or calls stateStore.getMessages() to display the conversation, you should filter out hidden memory messages:

typescript

import { COMMON_METADATA_KEYS } from '@helix-agents/core';

const visibleMessages = messages.filter((m) => m.metadata?.[COMMON_METADATA_KEYS.HIDDEN] !== true);

Alternatively, filter specifically for memory-injection messages:

typescript

const nonMemoryMessages = messages.filter(
  (m) => m.metadata?.[COMMON_METADATA_KEYS.MEMORY_INJECTION] !== true
);

The HIDDEN key is the broader filter (also covers other framework-internal messages). The MEMORY_INJECTION key is specific to auto-injected memory messages.

New core exports (custom runtimes only)

If you maintain a custom runtime that manually calls memory injection, use the new exports:

typescript

import { collectInjectedMemoryIds, COMMON_METADATA_KEYS } from '@helix-agents/core';
// MemoryManager.buildInjectionMessage is on the memory package:
import { MemoryManager } from '@helix-agents/memory';

// Collect memory IDs already in the transcript (for dedup):
const excludeIds = collectInjectedMemoryIds(state.messages);

// Build a new injection message (returns null if no fresh memories):
const memoryMessage = await memoryManager.buildInjectionMessage({
  query: userMessage,
  context,
  excludeMemoryIds: excludeIds,
});

if (memoryMessage) {
  await stateStore.appendMessages(sessionId, [memoryMessage]);
}

collectInjectedMemoryIds and COMMON_METADATA_KEYS.MEMORY_INJECTION are exported from @helix-agents/core. MemoryManager.buildInjectionMessage is on @helix-agents/memory.

Migrating to Cache Strategy API and Persisted Memory Injection ​

Overview ​

1. Prompt Caching ​

What changed ​

Why it changed ​

Migration steps ​

Provider-specific helpers ​

Removed exports ​

2. Memory Auto-Injection ​

What changed ​

Impact on applications ​

Filtering memory messages from UI / queries ​

New core exports (custom runtimes only) ​

Migrating to Cache Strategy API and Persisted Memory Injection

Overview

1. Prompt Caching

What changed

Why it changed

Migration steps

Provider-specific helpers

Removed exports

2. Memory Auto-Injection

What changed

Impact on applications

Filtering memory messages from UI / queries

New core exports (custom runtimes only)