Memory

Memory gives agents the ability to learn from conversations and recall information across sessions. Instead of treating each conversation as a blank slate, agents can extract facts, preferences, and context — then automatically inject relevant memories into future interactions.

How It Works

The memory system has three parts:

Retrieval — Once per turn, relevant memories are retrieved and persisted into the transcript as a hidden user message (deduped by memory ID). Because the message is append-only and immutable, the stable prefix (system prompt + tools + conversation history) stays cacheable across turns.
Generation — After each step, new memories are extracted from the conversation using an LLM
Tools — The agent can explicitly search and save memories via search_memory and save_memory tools

All three parts are optional and independently configurable.

Quick Start

1. Install the Memory Package

bash

npm install @helix-agents/memory

2. Create a Memory Store

For development, use the in-memory store:

typescript

import { InMemoryMemoryStore } from '@helix-agents/memory';

const memoryStore = new InMemoryMemoryStore();

For production with Redis (requires RediSearch module):

bash

npm install @helix-agents/memory-redis

typescript

import { RedisMemoryStore } from '@helix-agents/memory-redis';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);
const memoryStore = new RedisMemoryStore({
  redis,
  dimensions: 1536, // Must match your embedding model
});
await memoryStore.initialize();

3. Set Up an Embedding Adapter

Memory search uses vector embeddings for semantic similarity:

bash

npm install @helix-agents/embedding-vercel

typescript

import { VercelEmbeddingAdapter } from '@helix-agents/embedding-vercel';
import { openai } from '@ai-sdk/openai';

const embeddingAdapter = new VercelEmbeddingAdapter(
  openai.embedding('text-embedding-3-small'),
  1536
);

4. Add Memory to Your Agent

typescript

import { defineAgent } from '@helix-agents/sdk';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { openai } from '@ai-sdk/openai';

const agent = defineAgent({
  name: 'assistant',
  systemPrompt: 'You are a helpful assistant.',
  llmConfig: { model: openai('gpt-4o') },
  memory: {
    sources: [
      {
        name: 'user-prefs',
        description: 'User preferences and personal information',
        store: memoryStore,
        entityId: (ctx) => ctx.sessionId, // Scope memories by session
      },
    ],
    llmAdapter: new VercelAIAdapter(),
    llmConfig: { model: openai('gpt-4o-mini') }, // Cheaper model for memory ops
    embeddingAdapter,
    autoInject: { enabled: true },
    generation: { mode: 'realtime' },
  },
});

Now the agent automatically remembers information from conversations and recalls it in future sessions.

Memory Sources

A memory source is a named, scoped collection of memories. You can have multiple sources for different types of information (e.g., user preferences, project context, conversation history).

typescript

import type { MemorySource } from '@helix-agents/core';

const userPrefs: MemorySource = {
  name: 'user-prefs',
  description: 'User preferences like language, timezone, communication style',
  store: memoryStore,
  entityId: (ctx) => ctx.sessionId,
  retrieval: {
    threshold: 0.7, // Minimum similarity score (0-1)
    strategy: 'semantic', // 'semantic' | 'keyword' | 'hybrid'
  },
  extractionPrompt: 'Focus on user preferences, settings, and personal details.',
};

Entity Resolution

The entityId function determines how memories are scoped. Memories are stored and retrieved per entity, so this controls who "owns" a memory.

typescript

// Scope by session (memories only visible within the same session)
entityId: (ctx) => ctx.sessionId,

// Scope by user (memories persist across sessions for the same user)
entityId: () => 'user-alice',

// Scope by a custom field from agent state
entityId: (ctx) => ctx.customState?.organizationId ?? ctx.sessionId,

Cross-Session Memory

To make memories persist across sessions, use a stable entity ID like a user ID rather than sessionId. Pass the user ID through custom state or derive it from your application context.

Multiple Sources

typescript

memory: {
  sources: [
    {
      name: 'user-prefs',
      description: 'User preferences and personal information',
      store: userMemoryStore,
      entityId: () => 'user-alice',
    },
    {
      name: 'project-context',
      description: 'Project details, requirements, and decisions',
      store: projectMemoryStore,
      entityId: (ctx) => ctx.customState?.projectId ?? 'default',
    },
  ],
  // ...
}

When multiple sources are configured, the memory system uses an LLM to route each extracted memory to the appropriate source.

Retrieval & Auto-Injection

When autoInject is enabled, the memory system automatically retrieves memories once per turn and appends them to the transcript as a persisted, hidden user message (deduped by memory ID). This makes memory cache-friendly: the injected message is append-only and immutable, so the stable prefix (system prompt + tools + conversation history) is never invalidated by memory changes across turns.

The auto-injection flow:

Embeds the user's message
Searches all sources for relevant memories
Filters out memory IDs already present in the transcript (dedup)
If there are new memories, appends a hidden memory message to the transcript (persisted to state.messages)

Within a single turn this is idempotent — the same query returns memories already in the transcript, so no duplicate is appended. Across turns, new memories are accumulated additively.

For model-driven retrieval within a turn, the agent can call search_memory directly (see Memory Tools below).

search_memory dedup (v1 known tradeoff)

Memories returned by the model-driven search_memory tool are not deduplicated against the auto-loaded injection message in v1. If the same memory is both auto-injected and returned by search_memory, the model may see it twice. This is a known v1 limitation; the auto-injection dedup set covers only the injected-per-turn messages, not tool-returned results.

Context growth

Each turn that produces fresh memories appends a new hidden memory message to the transcript. Over a very long session this accumulates (one or more messages per turn) and grows the token budget. Plan token budgets accordingly for long-running sessions or consider scoping memory retrieval with a tighter maxPerSource limit.

typescript

memory: {
  // ...
  autoInject: {
    enabled: true,
    maxPerSource: 10, // Max memories to inject per source (default: 10)
  },
}

Search Strategies

Each source can use a different search strategy:

Strategy	How it works	Best for
`semantic`	Vector similarity search	Finding conceptually related memories
`keyword`	Full-text search	Finding exact terms or names
`hybrid`	Combines semantic + keyword via RRF	Best of both worlds

typescript

retrieval: {
  strategy: 'hybrid',
  threshold: 0.5,  // Lower threshold = more results
}

Store Capabilities

Not all stores support all strategies. InMemoryMemoryStore supports semantic and keyword search. RedisMemoryStore and CloudflareMemoryStore (full mode) support all three including hybrid. Check store.capabilities() to see what's available.

Memory Generation

Memory generation extracts facts from conversations and stores them for future retrieval.

Realtime Mode

Extracts memories after each agent step:

typescript

memory: {
  // ...
  generation: {
    mode: 'realtime',
    realtime: {
      interval: 1, // Extract every N steps (default: 1)
    },
  },
}

Async Mode

Extracts memories after the agent completes:

typescript

memory: {
  // ...
  generation: {
    mode: 'async',
    async: {
      executor: customExecutor, // Optional custom executor
    },
  },
}

Both Modes

Run realtime extraction during execution and async extraction on completion:

typescript

generation: {
  mode: 'both';
}

Custom Extraction Criteria

Each source can specify what kind of information to extract:

typescript

{
  name: 'user-prefs',
  description: 'User preferences',
  store: memoryStore,
  entityId: (ctx) => ctx.sessionId,
  extractionPrompt: 'Extract user preferences, favorite tools, communication style, and timezone.',
}

Embedding Pipeline

When a memory is saved (via extraction or the save_memory tool), embedding generation is handled separately from storage. This means memories are immediately searchable via keyword/full-text search while vector embeddings are computed asynchronously.

Each memory has an embeddingStatus field: 'pending' (saved but not yet embedded) or 'complete' (embedding computed and stored). The embedding pipeline is controlled by the embeddingExecutor config option.

InlineEmbeddingExecutor (Default)

Computes embeddings synchronously before returning. Simple and correct — every saved memory immediately has an embedding. This is the default behavior.

typescript

memory: {
  // ...
  // No embeddingExecutor needed — InlineEmbeddingExecutor is the default
}

BackgroundEmbeddingExecutor

Fire-and-forget: saves the memory immediately, then embeds in the background. The agent loop is not blocked by embedding API calls. Failed embeddings leave memories in 'pending' status for later recovery.

typescript

import { BackgroundEmbeddingExecutor } from '@helix-agents/memory';

memory: {
  // ...
  embeddingExecutor: new BackgroundEmbeddingExecutor({
    maxConcurrency: 50, // Max parallel embedding tasks (default: 50)
  }),
}

When to Use Background Embedding

Use BackgroundEmbeddingExecutor in production when embedding API latency (100-300ms per call) is slowing down your agent loop. Memories are still keyword-searchable immediately — only semantic/vector search requires the embedding to complete.

Recovery with processUnembeddedMemories

If background embedding fails (network errors, service outages), memories remain in 'pending' status. Use MemoryManager.processUnembeddedMemories() to retry them:

typescript

import { MemoryManager } from '@helix-agents/memory';

const manager = new MemoryManager(memoryConfig);

// Run on a schedule (cron, startup, etc.)
const processed = await manager.processUnembeddedMemories({ maxBatch: 50 });
console.log(`Recovered ${processed} pending memories`);

This queries all stores for memories with embeddingStatus: 'pending', computes their embeddings, and updates them to 'complete'. Stores that don't support getMemoriesByEmbeddingStatus are skipped.

Deduplication

When memories are extracted, they may overlap with existing memories. The dedup system prevents duplicates.

LLM Dedup (Default)

Uses an LLM to compare new memories against existing ones and decide whether to ADD, UPDATE, DELETE, or skip:

typescript

memory: {
  // ...
  dedup: { strategy: 'llm' }, // default
}

Similarity Dedup

Uses embedding similarity to detect duplicates (faster, no LLM call):

typescript

dedup: {
  strategy: 'similarity',
  similarityThreshold: 0.9, // Cosine similarity threshold for considering duplicates
}

No Dedup

Skip deduplication entirely:

typescript

dedup: {
  strategy: 'none';
}

Memory Tools

By default, agents with memory get two tools:

search_memory

Lets the agent explicitly search for memories:

search_memory({ query: "user's preferred language", source: "user-prefs" })

save_memory

Lets the agent explicitly save something to memory:

save_memory({
  content: "User prefers TypeScript over JavaScript",
  context: "Language preference discussion",
  source: "user-prefs"
})

Disabling Tools

You can disable either tool independently:

typescript

memory: {
  // ...
  tools: {
    searchMemory: false, // Disable search_memory tool
    saveMemory: true,    // Keep save_memory tool
  },
}

Embedding Adapters

The memory system needs an embedding adapter for vector search. The framework provides two adapters and an interface for custom implementations.

VercelEmbeddingAdapter

Wraps the Vercel AI SDK's embedding functions:

typescript

import { VercelEmbeddingAdapter } from '@helix-agents/embedding-vercel';
import { openai } from '@ai-sdk/openai';

const adapter = new VercelEmbeddingAdapter(
  openai.embedding('text-embedding-3-small'),
  1536 // dimensions
);

WorkersAIEmbeddingAdapter

Uses Cloudflare Workers AI for embeddings — runs on the edge with no external API calls:

typescript

import { WorkersAIEmbeddingAdapter } from '@helix-agents/embedding-cloudflare';

const adapter = new WorkersAIEmbeddingAdapter({
  ai: env.AI,
  model: '@cf/baai/bge-base-en-v1.5',
  dimensions: 768,
});

Custom Adapter

Implement the EmbeddingAdapter interface:

typescript

import type { EmbeddingAdapter } from '@helix-agents/core';

class MyEmbeddingAdapter implements EmbeddingAdapter {
  readonly dimensions = 768;

  async embed(text: string): Promise<number[]> {
    // Your embedding logic
    return await myEmbeddingAPI.embed(text);
  }

  async embedMany(texts: string[]): Promise<number[][]> {
    return await myEmbeddingAPI.embedBatch(texts);
  }
}

Memory Stores

InMemoryMemoryStore

For development and testing. Data is lost when the process exits.

typescript

import { InMemoryMemoryStore } from '@helix-agents/memory';

const store = new InMemoryMemoryStore();

Capabilities: semantic search, keyword search.

RedisMemoryStore

For production. Requires Redis with the RediSearch module (available in the redis-stack Docker image).

bash

npm install @helix-agents/memory-redis

typescript

import { RedisMemoryStore } from '@helix-agents/memory-redis';
import Redis from 'ioredis';

const store = new RedisMemoryStore({
  redis: new Redis(process.env.REDIS_URL),
  dimensions: 1536,
  prefix: 'myapp:memory', // Optional key prefix (default: 'helix:memory')
  maxCacheSize: 50_000, // Optional LRU cache size for ID lookups (default: 50000, 0 to disable)
});
await store.initialize(); // Creates RediSearch indexes lazily per source

Capabilities: semantic search, keyword search, hybrid search, metadata filtering.

Redis Stack Required

Standard Redis does not include the RediSearch module. Use the redis/redis-stack Docker image or a Redis provider that supports modules (e.g., Redis Cloud).

CloudflareMemoryStore

For Cloudflare Workers deployments. Uses D1 for storage, Vectorize for semantic search, and Queues for reliable D1-to-Vectorize synchronization.

bash

npm install @helix-agents/memory-cloudflare

Full mode (D1 + Vectorize + Queues):

typescript

import { CloudflareMemoryStore } from '@helix-agents/memory-cloudflare';

const store = new CloudflareMemoryStore({
  d1: env.MEMORY_DB,
  vectorize: env.VECTORIZE_INDEX,
  syncQueue: env.MEMORY_SYNC_QUEUE,
});

Capabilities: semantic search, keyword search, hybrid search, metadata filtering.

D1-only mode (no semantic search):

typescript

const store = new CloudflareMemoryStore({
  d1: env.MEMORY_DB,
});

Capabilities: keyword search, metadata filtering.

The store runs D1 migrations automatically on first use. Vectorize sync is handled asynchronously via Cloudflare Queues — add a queue consumer to your worker:

typescript

import { createVectorizeSyncHandler } from '@helix-agents/memory-cloudflare';

export default {
  async queue(batch, env) {
    const handler = createVectorizeSyncHandler({ vectorize: env.VECTORIZE_INDEX });
    await handler.processBatch(batch);
  },
};

WorkersAIEmbeddingAdapter

Pair the Cloudflare store with the Workers AI embedding adapter:

bash

npm install @helix-agents/embedding-cloudflare

typescript

import { WorkersAIEmbeddingAdapter } from '@helix-agents/embedding-cloudflare';

const embeddingAdapter = new WorkersAIEmbeddingAdapter({
  ai: env.AI,
  model: '@cf/baai/bge-base-en-v1.5',
  dimensions: 768,
});

See the OpenNext + Cloudflare DO example for a complete setup.

Custom Store

Implement the MemoryStore interface from @helix-agents/core:

typescript

import type {
  MemoryStore,
  Memory,
  MemorySearchQuery,
  MemorySearchResult,
  ListEntitiesOptions,
  PaginatedEntities,
} from '@helix-agents/core';

class MyMemoryStore implements MemoryStore {
  async addMemories(memories: Memory[]): Promise<string[]> {
    /* ... */
  }
  async getMemory(id: string): Promise<Memory | null> {
    /* ... */
  }
  async updateMemory(id: string, updates: Partial<Memory>): Promise<void> {
    /* ... */
  }
  async deleteMemory(id: string): Promise<void> {
    /* ... */
  }
  async search(query: MemorySearchQuery): Promise<MemorySearchResult[]> {
    /* ... */
  }
  async getMemoriesByEntity(entityId: string, sourceName: string): Promise<Memory[]> {
    /* ... */
  }
  async deleteMemoriesByEntity(entityId: string, sourceName: string): Promise<number> {
    /* ... */
  }
  async listEntities(options?: ListEntitiesOptions): Promise<PaginatedEntities> {
    /* ... */
  }
  capabilities() {
    return {
      semanticSearch: true,
      keywordSearch: false,
      hybridSearch: false,
      metadataFiltering: false,
    };
  }

  // Optional: Enable processUnembeddedMemories() recovery for this store.
  // Omit this method if your store doesn't track embedding status.
  async getMemoriesByEmbeddingStatus?(
    status: 'pending' | 'complete',
    limit: number
  ): Promise<Memory[]> {
    /* ... */
  }
}

Full Configuration Reference

typescript

memory: {
  // Required: Memory sources
  sources: MemorySource[],

  // Required: LLM adapter for memory extraction/dedup/routing
  llmAdapter: LLMAdapter,

  // Required: LLM config for memory operations
  llmConfig: LLMConfig,

  // Required: Embedding adapter for vector search
  embeddingAdapter: EmbeddingAdapter,

  // Optional: Embedding executor (default: InlineEmbeddingExecutor)
  // Controls how embeddings are computed after memory save.
  // - InlineEmbeddingExecutor: synchronous, blocks until embedding completes
  // - BackgroundEmbeddingExecutor: fire-and-forget, returns immediately
  embeddingExecutor?: EmbeddingExecutor,

  // Optional: Dedup strategy (default: { strategy: 'llm' })
  dedup?: {
    strategy: 'llm' | 'similarity' | 'none',
    similarityThreshold?: number,  // For 'similarity' strategy (0-1)
  },

  // Optional: Generation mode (default: { mode: 'realtime' })
  generation?: {
    mode: 'realtime' | 'async' | 'both',
    realtime?: { interval: number },
    async?: { executor: MemoryExtractionExecutor },
    maxMessages?: number,  // Max messages sent to extraction LLM
  },

  // Optional: Auto-injection config
  autoInject?: {
    enabled: boolean,       // default: true
    maxPerSource?: number,  // default: 10
  },

  // Optional: Tool availability
  tools?: {
    searchMemory?: boolean, // default: true
    saveMemory?: boolean,   // default: true
  },

  // Optional: Logger
  logger?: Logger,
}

Lifecycle Management

Shutdown

When using BackgroundEmbeddingExecutor or async extraction executors, call MemoryManager.shutdown() before process exit to drain in-flight tasks:

typescript

import { MemoryManager } from '@helix-agents/memory';

const manager = new MemoryManager(memoryConfig);

// ... agent execution ...

// Before process exit: drain pending embedding and extraction tasks
await manager.shutdown();

This drains both the embedding executor and the extraction executor (if configured). Executors without a shutdown method are skipped. Runtimes that dispatch to external systems (Temporal workflows, Cloudflare Workflows) are no-ops since the work runs outside the process.

Runtime Support

Memory auto-injection and extraction are supported on the JS runtime and Cloudflare runtime only.

Runtime	Memory support	Notes
JS Runtime	✅	Injection and extraction run inline in the agent loop
Cloudflare Runtime	✅	Extraction runs within the Durable Object
Temporal Runtime	❌	`memory:` config accepted but is a silent no-op
DBOS Runtime	❌	`memory:` config accepted but is a silent no-op

The memory: field is part of the agent definition and is accepted on all runtimes at the type level, but Temporal and DBOS perform no memory injection or extraction at runtime. Do not rely on memory recall or storage when running agents on those runtimes.

Examples

Several examples demonstrate memory integration across different runtimes and stores:

Next.js + Redis (examples/nextjs-redis) — Production setup with RedisMemoryStore, VercelEmbeddingAdapter, BackgroundEmbeddingExecutor, and similarity dedup. Best reference for production deployments.
Research Assistant (Temporal) — InMemoryMemoryStore with MockEmbeddingAdapter for development. Note: memory injection and extraction are not active on the Temporal runtime; this example illustrates the agent pattern only.
OpenNext + Cloudflare DO — CloudflareMemoryStore with WorkersAIEmbeddingAdapter inside a Durable Object. Uses D1, Vectorize, and Queues for durable cross-session memory with hybrid search.

Browsing Memory Entities

The listEntities() method returns distinct entity/source pairs with memory counts. Use this to browse users or contexts that have stored memories, identify the most active entities, or audit memory usage.

typescript

import { RedisMemoryStore } from '@helix-agents/memory-redis';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);
const memoryStore = new RedisMemoryStore({ redis, dimensions: 1536 });
await memoryStore.initialize();

// List all entities
const result = await memoryStore.listEntities();
console.log(`Found ${result.total} entity/source pairs`);

for (const entity of result.entities) {
  console.log(`${entity.entityId} (${entity.sourceName}): ${entity.memoryCount} memories`);
  console.log(`  Oldest: ${entity.oldestMemoryAt}, Newest: ${entity.newestMemoryAt}`);
}

// Find most active entities
const mostActive = await memoryStore.listEntities({
  orderBy: { field: 'memoryCount', direction: 'desc' },
  limit: 10,
});

// Filter by source
const userPrefs = await memoryStore.listEntities({
  sourceName: 'user-prefs',
});

See Querying Guide for complete documentation including filtering options and pagination.

Next Steps

Defining Agents — Full agent configuration reference
Defining Tools — Learn about the tools system
Redis Store — Production state storage with Redis
Querying — Cross-session queries for analytics and monitoring

Memory ​

How It Works ​

Quick Start ​

1. Install the Memory Package ​

2. Create a Memory Store ​

3. Set Up an Embedding Adapter ​

4. Add Memory to Your Agent ​

Memory Sources ​

Entity Resolution ​

Multiple Sources ​

Retrieval & Auto-Injection ​

Search Strategies ​

Memory Generation ​

Realtime Mode ​

Async Mode ​

Both Modes ​

Custom Extraction Criteria ​

Embedding Pipeline ​

InlineEmbeddingExecutor (Default) ​

BackgroundEmbeddingExecutor ​

Recovery with processUnembeddedMemories ​

Deduplication ​

LLM Dedup (Default) ​

Similarity Dedup ​

No Dedup ​

Memory Tools ​

search_memory ​

save_memory ​

Disabling Tools ​

Embedding Adapters ​

VercelEmbeddingAdapter ​

WorkersAIEmbeddingAdapter ​

Custom Adapter ​

Memory Stores ​

InMemoryMemoryStore ​

RedisMemoryStore ​

CloudflareMemoryStore ​

WorkersAIEmbeddingAdapter ​

Custom Store ​

Full Configuration Reference ​

Lifecycle Management ​

Shutdown ​

Runtime Support ​

Examples ​

Browsing Memory Entities ​

Next Steps ​

Memory

How It Works

Quick Start

1. Install the Memory Package

2. Create a Memory Store

3. Set Up an Embedding Adapter

4. Add Memory to Your Agent

Memory Sources

Entity Resolution

Multiple Sources

Retrieval & Auto-Injection

Search Strategies

Memory Generation

Realtime Mode

Async Mode

Both Modes

Custom Extraction Criteria

Embedding Pipeline

InlineEmbeddingExecutor (Default)

BackgroundEmbeddingExecutor

Recovery with processUnembeddedMemories

Deduplication

LLM Dedup (Default)

Similarity Dedup

No Dedup

Memory Tools

search_memory

save_memory

Disabling Tools

Embedding Adapters

VercelEmbeddingAdapter

WorkersAIEmbeddingAdapter

Custom Adapter

Memory Stores

InMemoryMemoryStore

RedisMemoryStore

CloudflareMemoryStore

WorkersAIEmbeddingAdapter

Custom Store

Full Configuration Reference

Lifecycle Management

Shutdown

Runtime Support

Examples

Browsing Memory Entities

Next Steps