Skip to content

Migrating to Cache Strategy API and Persisted Memory Injection

Overview

Two changes shipped together that affect how prompt caching is configured and how memory auto-injection works.

  1. Prompt caching is now opt-in and provider-specific. The old LLMConfig.caching: 'auto' field and its companions have been removed. You now choose a provider-specific strategy helper and set it on LLMConfig.cache.
  2. Memory auto-injection no longer injects an ephemeral SYSTEM message on every step. Memories are now retrieved once per turn and persisted as a hidden user message in the transcript (deduped by memory ID across turns).

1. Prompt Caching

What changed

Old (removed)New
LLMConfig.caching: 'auto' | falseLLMConfig.cache: CacheStrategy | CacheStrategy[]
(TTL was not separately configurable)anthropicCache({ ttl: '1h' })
applyCacheBreakpoints(messages)applyCacheStrategies(cache, request)
detectCacheProvider(model)No replacement — pick the helper for your provider
CacheBreakpointResult typeAppliedCacheResult type

Why it changed

The old caching: 'auto' implementation tried to detect the provider from the model object and apply generic breakpoints. This was fragile and produced incorrect results for non-Anthropic providers. The new design makes caching explicit and opt-in: you pass a strategy function that knows how to annotate messages and tools for your specific provider. When cache is not set, no caching occurs — there is no silent default.

Migration steps

Before (v0.27 and earlier):

typescript
import { defineAgent } from '@helix-agents/core';
import { anthropic } from '@ai-sdk/anthropic';

const agent = defineAgent({
  name: 'my-agent',
  llmConfig: {
    model: anthropic('claude-3-5-sonnet-20241022'),
    caching: 'auto', // removed
  },
  // ...
});

After (v0.28+):

typescript
import { defineAgent } from '@helix-agents/core';
import { anthropicCache } from '@helix-agents/core';
import { anthropic } from '@ai-sdk/anthropic';

const agent = defineAgent({
  name: 'my-agent',
  llmConfig: {
    model: anthropic('claude-3-5-sonnet-20241022'),
    cache: anthropicCache({ ttl: '5m' }), // opt-in, provider-specific
  },
  // ...
});

Provider-specific helpers

Pick the helper that matches your model:

typescript
import { anthropicCache, openaiCache, xaiCache } from '@helix-agents/core';

// Anthropic (Claude) — places cache_control markers on system, tools, and
// rolling conversation breakpoints. Default ttl: '1h'.
cache: anthropicCache();
cache: anthropicCache({ ttl: '5m' });

// OpenAI — sets promptCacheKey from sessionId for cache affinity.
cache: openaiCache();

// xAI (Grok) — sets the x-grok-conv-id header from sessionId.
cache: xaiCache();

// Multiple strategies (applied in order):
cache: [anthropicCache(), openaiCache()];

Google / Gemini does not need a helper — it uses implicit prefix caching server-side. No googleCache() exists; leave cache unset.

Removed exports

These identifiers are gone and have no direct replacement:

  • applyCacheBreakpoints — replaced by applyCacheStrategies (different signature; used by custom runtimes only)
  • detectCacheProvider — removed; pick the correct helper for your provider explicitly
  • CacheBreakpointResult — replaced by AppliedCacheResult

The LLMConfig.caching field is removed from the type. TypeScript will flag any remaining uses.


2. Memory Auto-Injection

What changed

Previously, when autoInject was enabled the memory system injected relevant memories as an ephemeral SYSTEM message that was prepended fresh on every LLM call but not persisted to state.messages. This meant:

  • The system prompt was different every step (breaking prompt-cache affinity).
  • The same memories could appear twice if the conversation was resumed from a checkpoint.

Now, memories are persisted as a hidden user message (role 'user', metadata MEMORY_INJECTION: true, HIDDEN: true) appended to state.messages once per turn. The message is never mutated — fresh memories on later turns produce a new message. The stable prefix (system prompt + tools + conversation history) is never invalidated by memory changes, making the transcript cache-friendly.

Impact on applications

For most applications no code change is required. The autoInject config, search_memory/save_memory tools, and generation config are all unchanged.

Behavioural differences to be aware of:

BeforeAfter
Memory injected into SYSTEM message, ephemeral (not stored)Memory injected as hidden user message, persisted to transcript
Same memories re-injected every LLM call within a turnDeduped by memory ID: each unique memory injected at most once per session
Memory injection invalidates cached system-prompt prefixAppend-only hidden message preserves cached prefix
state.messages never contains memory injection messagesstate.messages contains hidden user messages with MEMORY_INJECTION metadata

Filtering memory messages from UI / queries

If your application reads state.messages or calls stateStore.getMessages() to display the conversation, you should filter out hidden memory messages:

typescript
import { COMMON_METADATA_KEYS } from '@helix-agents/core';

const visibleMessages = messages.filter((m) => m.metadata?.[COMMON_METADATA_KEYS.HIDDEN] !== true);

Alternatively, filter specifically for memory-injection messages:

typescript
const nonMemoryMessages = messages.filter(
  (m) => m.metadata?.[COMMON_METADATA_KEYS.MEMORY_INJECTION] !== true
);

The HIDDEN key is the broader filter (also covers other framework-internal messages). The MEMORY_INJECTION key is specific to auto-injected memory messages.

New core exports (custom runtimes only)

If you maintain a custom runtime that manually calls memory injection, use the new exports:

typescript
import { collectInjectedMemoryIds, COMMON_METADATA_KEYS } from '@helix-agents/core';
// MemoryManager.buildInjectionMessage is on the memory package:
import { MemoryManager } from '@helix-agents/memory';

// Collect memory IDs already in the transcript (for dedup):
const excludeIds = collectInjectedMemoryIds(state.messages);

// Build a new injection message (returns null if no fresh memories):
const memoryMessage = await memoryManager.buildInjectionMessage({
  query: userMessage,
  context,
  excludeMemoryIds: excludeIds,
});

if (memoryMessage) {
  await stateStore.appendMessages(sessionId, [memoryMessage]);
}

collectInjectedMemoryIds and COMMON_METADATA_KEYS.MEMORY_INJECTION are exported from @helix-agents/core. MemoryManager.buildInjectionMessage is on @helix-agents/memory.

Released under the MIT License.