Memory
Memory gives agents the ability to learn from conversations and recall information across sessions. Instead of treating each conversation as a blank slate, agents can extract facts, preferences, and context — then automatically inject relevant memories into future interactions.
How It Works
The memory system has three parts:
- Retrieval — Before each LLM call, relevant memories are fetched and injected into the system prompt
- Generation — After each step, new memories are extracted from the conversation using an LLM
- Tools — The agent can explicitly search and save memories via
search_memoryandsave_memorytools
All three parts are optional and independently configurable.
Quick Start
1. Install the Memory Package
npm install @helix-agents/memory2. Create a Memory Store
For development, use the in-memory store:
import { InMemoryMemoryStore } from '@helix-agents/memory';
const memoryStore = new InMemoryMemoryStore();For production with Redis (requires RediSearch module):
npm install @helix-agents/memory-redisimport { RedisMemoryStore } from '@helix-agents/memory-redis';
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
const memoryStore = new RedisMemoryStore({
redis,
dimensions: 1536, // Must match your embedding model
});
await memoryStore.initialize();3. Set Up an Embedding Adapter
Memory search uses vector embeddings for semantic similarity:
npm install @helix-agents/embedding-vercelimport { VercelEmbeddingAdapter } from '@helix-agents/embedding-vercel';
import { openai } from '@ai-sdk/openai';
const embeddingAdapter = new VercelEmbeddingAdapter(
openai.embedding('text-embedding-3-small'),
1536
);4. Add Memory to Your Agent
import { defineAgent } from '@helix-agents/sdk';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { openai } from '@ai-sdk/openai';
const agent = defineAgent({
name: 'assistant',
systemPrompt: 'You are a helpful assistant.',
llmConfig: { model: openai('gpt-4o') },
memory: {
sources: [
{
name: 'user-prefs',
description: 'User preferences and personal information',
store: memoryStore,
entityId: (ctx) => ctx.sessionId, // Scope memories by session
},
],
llmAdapter: new VercelAIAdapter(),
llmConfig: { model: openai('gpt-4o-mini') }, // Cheaper model for memory ops
embeddingAdapter,
autoInject: { enabled: true },
generation: { mode: 'realtime' },
},
});Now the agent automatically remembers information from conversations and recalls it in future sessions.
Memory Sources
A memory source is a named, scoped collection of memories. You can have multiple sources for different types of information (e.g., user preferences, project context, conversation history).
import type { MemorySource } from '@helix-agents/core';
const userPrefs: MemorySource = {
name: 'user-prefs',
description: 'User preferences like language, timezone, communication style',
store: memoryStore,
entityId: (ctx) => ctx.sessionId,
retrieval: {
threshold: 0.7, // Minimum similarity score (0-1)
strategy: 'semantic', // 'semantic' | 'keyword' | 'hybrid'
},
extractionPrompt: 'Focus on user preferences, settings, and personal details.',
};Entity Resolution
The entityId function determines how memories are scoped. Memories are stored and retrieved per entity, so this controls who "owns" a memory.
// Scope by session (memories only visible within the same session)
entityId: (ctx) => ctx.sessionId,
// Scope by user (memories persist across sessions for the same user)
entityId: () => 'user-alice',
// Scope by a custom field from agent state
entityId: (ctx) => ctx.customState?.organizationId ?? ctx.sessionId,Cross-Session Memory
To make memories persist across sessions, use a stable entity ID like a user ID rather than sessionId. Pass the user ID through custom state or derive it from your application context.
Multiple Sources
memory: {
sources: [
{
name: 'user-prefs',
description: 'User preferences and personal information',
store: userMemoryStore,
entityId: () => 'user-alice',
},
{
name: 'project-context',
description: 'Project details, requirements, and decisions',
store: projectMemoryStore,
entityId: (ctx) => ctx.customState?.projectId ?? 'default',
},
],
// ...
}When multiple sources are configured, the memory system uses an LLM to route each extracted memory to the appropriate source.
Retrieval & Auto-Injection
When autoInject is enabled, the memory system automatically:
- Embeds the user's message
- Searches all sources for relevant memories
- Injects matching memories into the LLM context as a system message
memory: {
// ...
autoInject: {
enabled: true,
maxPerSource: 10, // Max memories to inject per source (default: 10)
},
}Search Strategies
Each source can use a different search strategy:
| Strategy | How it works | Best for |
|---|---|---|
semantic | Vector similarity search | Finding conceptually related memories |
keyword | Full-text search | Finding exact terms or names |
hybrid | Combines semantic + keyword via RRF | Best of both worlds |
retrieval: {
strategy: 'hybrid',
threshold: 0.5, // Lower threshold = more results
}Store Capabilities
Not all stores support all strategies. InMemoryMemoryStore supports semantic and keyword search. RedisMemoryStore and CloudflareMemoryStore (full mode) support all three including hybrid. Check store.capabilities() to see what's available.
Memory Generation
Memory generation extracts facts from conversations and stores them for future retrieval.
Realtime Mode
Extracts memories after each agent step:
memory: {
// ...
generation: {
mode: 'realtime',
realtime: {
interval: 1, // Extract every N steps (default: 1)
},
},
}Async Mode
Extracts memories after the agent completes:
memory: {
// ...
generation: {
mode: 'async',
async: {
executor: customExecutor, // Optional custom executor
},
},
}Both Modes
Run realtime extraction during execution and async extraction on completion:
generation: {
mode: 'both';
}Custom Extraction Criteria
Each source can specify what kind of information to extract:
{
name: 'user-prefs',
description: 'User preferences',
store: memoryStore,
entityId: (ctx) => ctx.sessionId,
extractionPrompt: 'Extract user preferences, favorite tools, communication style, and timezone.',
}Embedding Pipeline
When a memory is saved (via extraction or the save_memory tool), embedding generation is handled separately from storage. This means memories are immediately searchable via keyword/full-text search while vector embeddings are computed asynchronously.
Each memory has an embeddingStatus field: 'pending' (saved but not yet embedded) or 'complete' (embedding computed and stored). The embedding pipeline is controlled by the embeddingExecutor config option.
InlineEmbeddingExecutor (Default)
Computes embeddings synchronously before returning. Simple and correct — every saved memory immediately has an embedding. This is the default behavior.
memory: {
// ...
// No embeddingExecutor needed — InlineEmbeddingExecutor is the default
}BackgroundEmbeddingExecutor
Fire-and-forget: saves the memory immediately, then embeds in the background. The agent loop is not blocked by embedding API calls. Failed embeddings leave memories in 'pending' status for later recovery.
import { BackgroundEmbeddingExecutor } from '@helix-agents/memory';
memory: {
// ...
embeddingExecutor: new BackgroundEmbeddingExecutor({
maxConcurrency: 50, // Max parallel embedding tasks (default: 50)
}),
}When to Use Background Embedding
Use BackgroundEmbeddingExecutor in production when embedding API latency (100-300ms per call) is slowing down your agent loop. Memories are still keyword-searchable immediately — only semantic/vector search requires the embedding to complete.
Recovery with processUnembeddedMemories
If background embedding fails (network errors, service outages), memories remain in 'pending' status. Use MemoryManager.processUnembeddedMemories() to retry them:
import { MemoryManager } from '@helix-agents/memory';
const manager = new MemoryManager(memoryConfig);
// Run on a schedule (cron, startup, etc.)
const processed = await manager.processUnembeddedMemories(50); // batch size
console.log(`Recovered ${processed} pending memories`);This queries all stores for memories with embeddingStatus: 'pending', computes their embeddings, and updates them to 'complete'. Stores that don't support getMemoriesByEmbeddingStatus are skipped.
Deduplication
When memories are extracted, they may overlap with existing memories. The dedup system prevents duplicates.
LLM Dedup (Default)
Uses an LLM to compare new memories against existing ones and decide whether to ADD, UPDATE, DELETE, or skip:
memory: {
// ...
dedup: { strategy: 'llm' }, // default
}Similarity Dedup
Uses embedding similarity to detect duplicates (faster, no LLM call):
dedup: {
strategy: 'similarity',
similarityThreshold: 0.9, // Cosine similarity threshold for considering duplicates
}No Dedup
Skip deduplication entirely:
dedup: {
strategy: 'none';
}Memory Tools
By default, agents with memory get two tools:
search_memory
Lets the agent explicitly search for memories:
search_memory({ query: "user's preferred language", source: "user-prefs" })save_memory
Lets the agent explicitly save something to memory:
save_memory({
content: "User prefers TypeScript over JavaScript",
context: "Language preference discussion",
source: "user-prefs"
})Disabling Tools
You can disable either tool independently:
memory: {
// ...
tools: {
searchMemory: false, // Disable search_memory tool
saveMemory: true, // Keep save_memory tool
},
}Embedding Adapters
The memory system needs an embedding adapter for vector search. The framework provides two adapters and an interface for custom implementations.
VercelEmbeddingAdapter
Wraps the Vercel AI SDK's embedding functions:
import { VercelEmbeddingAdapter } from '@helix-agents/embedding-vercel';
import { openai } from '@ai-sdk/openai';
const adapter = new VercelEmbeddingAdapter(
openai.embedding('text-embedding-3-small'),
1536 // dimensions
);WorkersAIEmbeddingAdapter
Uses Cloudflare Workers AI for embeddings — runs on the edge with no external API calls:
import { WorkersAIEmbeddingAdapter } from '@helix-agents/embedding-cloudflare';
const adapter = new WorkersAIEmbeddingAdapter({
ai: env.AI,
model: '@cf/baai/bge-base-en-v1.5',
dimensions: 768,
});Custom Adapter
Implement the EmbeddingAdapter interface:
import type { EmbeddingAdapter } from '@helix-agents/core';
class MyEmbeddingAdapter implements EmbeddingAdapter {
readonly dimensions = 768;
async embed(text: string): Promise<number[]> {
// Your embedding logic
return await myEmbeddingAPI.embed(text);
}
async embedMany(texts: string[]): Promise<number[][]> {
return await myEmbeddingAPI.embedBatch(texts);
}
}Memory Stores
InMemoryMemoryStore
For development and testing. Data is lost when the process exits.
import { InMemoryMemoryStore } from '@helix-agents/memory';
const store = new InMemoryMemoryStore();Capabilities: semantic search, keyword search.
RedisMemoryStore
For production. Requires Redis with the RediSearch module (available in the redis-stack Docker image).
npm install @helix-agents/memory-redisimport { RedisMemoryStore } from '@helix-agents/memory-redis';
import Redis from 'ioredis';
const store = new RedisMemoryStore({
redis: new Redis(process.env.REDIS_URL),
dimensions: 1536,
prefix: 'myapp:memory', // Optional key prefix (default: 'helix:memory')
maxCacheSize: 50_000, // Optional LRU cache size for ID lookups (default: 50000, 0 to disable)
});
await store.initialize(); // Creates RediSearch indexes lazily per sourceCapabilities: semantic search, keyword search, hybrid search, metadata filtering.
Redis Stack Required
Standard Redis does not include the RediSearch module. Use the redis/redis-stack Docker image or a Redis provider that supports modules (e.g., Redis Cloud).
CloudflareMemoryStore
For Cloudflare Workers deployments. Uses D1 for storage, Vectorize for semantic search, and Queues for reliable D1-to-Vectorize synchronization.
npm install @helix-agents/memory-cloudflareFull mode (D1 + Vectorize + Queues):
import { CloudflareMemoryStore } from '@helix-agents/memory-cloudflare';
const store = new CloudflareMemoryStore({
d1: env.MEMORY_DB,
vectorize: env.VECTORIZE_INDEX,
syncQueue: env.MEMORY_SYNC_QUEUE,
});Capabilities: semantic search, keyword search, hybrid search, metadata filtering.
D1-only mode (no semantic search):
const store = new CloudflareMemoryStore({
d1: env.MEMORY_DB,
});Capabilities: keyword search, metadata filtering.
The store runs D1 migrations automatically on first use. Vectorize sync is handled asynchronously via Cloudflare Queues — add a queue consumer to your worker:
import { createVectorizeSyncHandler } from '@helix-agents/memory-cloudflare';
export default {
async queue(batch, env) {
const handler = createVectorizeSyncHandler({ vectorize: env.VECTORIZE_INDEX });
await handler.processBatch(batch);
},
};WorkersAIEmbeddingAdapter
Pair the Cloudflare store with the Workers AI embedding adapter:
npm install @helix-agents/embedding-cloudflareimport { WorkersAIEmbeddingAdapter } from '@helix-agents/embedding-cloudflare';
const embeddingAdapter = new WorkersAIEmbeddingAdapter({
ai: env.AI,
model: '@cf/baai/bge-base-en-v1.5',
dimensions: 768,
});See the OpenNext + Cloudflare DO example for a complete setup.
Custom Store
Implement the MemoryStore interface from @helix-agents/core:
import type {
MemoryStore,
Memory,
MemorySearchQuery,
MemorySearchResult,
ListEntitiesOptions,
PaginatedEntities,
} from '@helix-agents/core';
class MyMemoryStore implements MemoryStore {
async addMemories(memories: Memory[]): Promise<string[]> {
/* ... */
}
async getMemory(id: string): Promise<Memory | null> {
/* ... */
}
async updateMemory(id: string, updates: Partial<Memory>): Promise<void> {
/* ... */
}
async deleteMemory(id: string): Promise<void> {
/* ... */
}
async search(query: MemorySearchQuery): Promise<MemorySearchResult[]> {
/* ... */
}
async getMemoriesByEntity(entityId: string, sourceName: string): Promise<Memory[]> {
/* ... */
}
async deleteMemoriesByEntity(entityId: string, sourceName: string): Promise<number> {
/* ... */
}
async listEntities(options?: ListEntitiesOptions): Promise<PaginatedEntities> {
/* ... */
}
capabilities() {
return {
semanticSearch: true,
keywordSearch: false,
hybridSearch: false,
metadataFiltering: false,
};
}
// Optional: Enable processUnembeddedMemories() recovery for this store.
// Omit this method if your store doesn't track embedding status.
async getMemoriesByEmbeddingStatus?(
status: 'pending' | 'complete',
limit: number
): Promise<Memory[]> {
/* ... */
}
}Full Configuration Reference
memory: {
// Required: Memory sources
sources: MemorySource[],
// Required: LLM adapter for memory extraction/dedup/routing
llmAdapter: LLMAdapter,
// Required: LLM config for memory operations
llmConfig: LLMConfig,
// Required: Embedding adapter for vector search
embeddingAdapter: EmbeddingAdapter,
// Optional: Embedding executor (default: InlineEmbeddingExecutor)
// Controls how embeddings are computed after memory save.
// - InlineEmbeddingExecutor: synchronous, blocks until embedding completes
// - BackgroundEmbeddingExecutor: fire-and-forget, returns immediately
embeddingExecutor?: EmbeddingExecutor,
// Optional: Dedup strategy (default: { strategy: 'llm' })
dedup?: {
strategy: 'llm' | 'similarity' | 'none',
similarityThreshold?: number, // For 'similarity' strategy (0-1)
},
// Optional: Generation mode (default: { mode: 'realtime' })
generation?: {
mode: 'realtime' | 'async' | 'both',
realtime?: { interval: number },
async?: { executor: MemoryExtractionExecutor },
maxMessages?: number, // Max messages sent to extraction LLM
},
// Optional: Auto-injection config
autoInject?: {
enabled: boolean, // default: true
maxPerSource?: number, // default: 10
},
// Optional: Tool availability
tools?: {
searchMemory?: boolean, // default: true
saveMemory?: boolean, // default: true
},
// Optional: Logger
logger?: Logger,
}Lifecycle Management
Shutdown
When using BackgroundEmbeddingExecutor or async extraction executors, call MemoryManager.shutdown() before process exit to drain in-flight tasks:
import { MemoryManager } from '@helix-agents/memory';
const manager = new MemoryManager(memoryConfig);
// ... agent execution ...
// Before process exit: drain pending embedding and extraction tasks
await manager.shutdown();This drains both the embedding executor and the extraction executor (if configured). Executors without a shutdown method are skipped. Runtimes that dispatch to external systems (Temporal workflows, Cloudflare Workflows) are no-ops since the work runs outside the process.
Runtime Support
Memory works with all three runtimes:
| Runtime | Extraction | Notes |
|---|---|---|
| JS Runtime | Inline (in-process) | Simplest setup, extraction runs in the agent loop |
| Temporal Runtime | Activity-based | Extraction runs as a Temporal activity for durability |
| Cloudflare Runtime | DO-based | Extraction runs within the Durable Object |
The memory configuration is part of the agent definition, so switching runtimes requires no changes to your memory setup.
Examples
Several examples demonstrate memory integration across different runtimes and stores:
- Next.js + Redis (
examples/nextjs-redis) — Production setup withRedisMemoryStore,VercelEmbeddingAdapter,BackgroundEmbeddingExecutor, and similarity dedup. Best reference for production deployments. - Research Assistant (Temporal) —
InMemoryMemoryStorewithMockEmbeddingAdapterfor development. Shows async extraction in durable workflows. - OpenNext + Cloudflare DO —
CloudflareMemoryStorewithWorkersAIEmbeddingAdapterinside a Durable Object. Uses D1, Vectorize, and Queues for durable cross-session memory with hybrid search.
Browsing Memory Entities
The listEntities() method returns distinct entity/source pairs with memory counts. Use this to browse users or contexts that have stored memories, identify the most active entities, or audit memory usage.
import { RedisMemoryStore } from '@helix-agents/memory-redis';
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
const memoryStore = new RedisMemoryStore({ redis, dimensions: 1536 });
await memoryStore.initialize();
// List all entities
const result = await memoryStore.listEntities();
console.log(`Found ${result.total} entity/source pairs`);
for (const entity of result.entities) {
console.log(`${entity.entityId} (${entity.sourceName}): ${entity.memoryCount} memories`);
console.log(` Oldest: ${entity.oldestMemoryAt}, Newest: ${entity.newestMemoryAt}`);
}
// Find most active entities
const mostActive = await memoryStore.listEntities({
orderBy: { field: 'memoryCount', direction: 'desc' },
limit: 10,
});
// Filter by source
const userPrefs = await memoryStore.listEntities({
sourceName: 'user-prefs',
});See Querying Guide for complete documentation including filtering options and pagination.
Next Steps
- Defining Agents — Full agent configuration reference
- Defining Tools — Learn about the tools system
- Redis Store — Production state storage with Redis
- Querying — Cross-session queries for analytics and monitoring