Skip to content

Cloudflare Durable Objects Runtime

The Durable Objects (DO) runtime executes agents directly inside a Durable Object, bypassing Cloudflare's subrequest limits by streaming directly to WebSocket and SSE connections.

When to Use DO Runtime

Choose the DO runtime when you need heavy streaming (>100 chunks per execution), real-time WebSocket/SSE connections, or simpler single-DO-per-run architecture. For step-level durability with automatic retries, consider the Workflows runtime instead.

Why Durable Objects?

Cloudflare Workers have a 1000 subrequest limit per invocation. In the Workflows runtime, each stream chunk write to the streaming Durable Object counts as a subrequest. For streaming-heavy agents, this can quickly exhaust the limit.

The DO runtime solves this by:

  • Hosting agent execution inside the DO - The JSAgentExecutor runs within the Durable Object
  • Streaming directly to connections - WebSocket/SSE writes don't count as subrequests
  • Using local SQLite - State is stored in DO's built-in SQLite, not external D1
  • Only LLM calls consume subrequests - The only external calls are to the LLM API

Key Benefits

FeatureDescription
FREE StreamingWebSocket/SSE writes don't count against subrequest limits
Built-in SQLiteNo D1 database needed - state lives in the DO
HibernationCost-efficient sleep/wake with state preservation
Simpler ArchitectureOne DO = one agent run (no coordination needed)
Native WebSocket/SSEReal-time bidirectional communication

Session vs Run Identifiers

  • sessionId: Primary key for state storage. Each Durable Object hosts one session.
  • runId: Identifies a specific execution within a session. Multiple runs can occur within one session (e.g., after interrupts).

Use sessionId for DO naming (e.g., env.AGENTS.idFromName(\session:${sessionId}`)). The runId` is execution metadata for tracing.

Architecture

mermaid
graph TB
    subgraph Edge ["Edge Location"]
        subgraph CFWorker ["Cloudflare Worker"]
            W1["Routes requests to DOs<br/>Creates LLM adapters<br/>Manages DO stubs"]
        end

        CFWorker --> AgentDO

        subgraph AgentDO ["AgentServer (Durable Object)"]
            subgraph Executor ["JSAgentExecutor"]
                E1["Agent loop execution<br/>Tool execution<br/>Sub-agent spawning"]
            end

            subgraph Stores [" "]
                direction LR
                StateStore["DOStateStore<br/>(SQLite)"]
                UsageStore["DOUsageStore<br/>(SQLite)"]
                StreamMgr["DOStreamManager<br/>(WebSocket/SSE)"]
            end
        end

        AgentDO --> Clients

        subgraph Clients [" "]
            direction LR
            WS["<b>WebSocket Clients</b><br/>Real-time, survives<br/>hibernation"]
            SSE["<b>SSE Clients</b><br/>One-way, simpler"]
        end
    end

Prerequisites

  • Cloudflare account with Workers Paid plan
  • Wrangler CLI: npm install -g wrangler
  • No additional services needed (D1, etc.) - SQLite is built into the DO

Installation

bash
npm install @helix-agents/runtime-cloudflare @helix-agents/core

The partyserver dependency is included automatically.

Quick Start (Composition API)

The composition API using createAgentServer() is the recommended approach for building AgentServer instances.

1. Configure wrangler.toml

toml
name = "my-agent-worker"
main = "src/worker.ts"
compatibility_date = "2024-01-01"

# Durable Object binding with SQLite support
[[durable_objects.bindings]]
name = "AGENTS"
class_name = "MyAgentServer"

# Enable SQLite for the DO class
[[migrations]]
tag = "v1"
new_sqlite_classes = ["MyAgentServer"]

Important

Use new_sqlite_classes instead of new_classes to enable SQLite storage in the Durable Object. Without this, the DO won't have access to the this.sql method.

2. Create AgentServer with Composition

typescript
// src/agent-server.ts
import { createAgentServer, AgentRegistry } from '@helix-agents/runtime-cloudflare';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { createAnthropic } from '@ai-sdk/anthropic';
import { createOpenAI } from '@ai-sdk/openai';
import { myAgent } from './agent.js';

interface Env {
  AGENTS: DurableObjectNamespace;
  ANTHROPIC_API_KEY: string;
  OPENAI_API_KEY: string;
}

// Create and populate registry
const registry = new AgentRegistry();
registry.register(myAgent);

// Create AgentServer using composition
export const MyAgentServer = createAgentServer<Env>({
  // Required: Factory to create LLM adapters
  llmAdapter: (env, ctx) =>
    new VercelAIAdapter({
      anthropic: createAnthropic({ apiKey: env.ANTHROPIC_API_KEY }),
    }),

  // Required: Agent registry or resolver function
  agents: registry,

  // Optional: Required if using createSubAgentTool() — see Sub-Agents section
  // subAgentNamespace: (env) => env.AGENTS,

  // Optional: Lifecycle hooks
  hooks: {
    beforeStart: async ({ executionState, body }) => {
      // "Last message wins" - interrupt existing execution
      if (executionState.isExecuting) {
        await executionState.interrupt('New message received');
      }
      console.log(`Starting agent: ${body.agentType}`);
    },
    afterStart: async ({ sessionId, agentType }) => {
      console.log(`Agent ${agentType} started (session: ${sessionId})`);
    },
    onComplete: async ({ sessionId, status, result, error }) => {
      console.log(`Execution complete: ${status}`);
    },
  },
});

3. Create Worker Entry Point

typescript
// src/worker.ts
import { MyAgentServer } from './agent-server.js';
import type { Env } from './agent-server.js';

// Export the AgentServer DO class
export { MyAgentServer };

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);

    // Start new agent execution
    if (url.pathname === '/execute' && request.method === 'POST') {
      const { message } = await request.json<{ message: string | UserInputMessage[] }>();

      // Generate unique session ID
      const sessionId = crypto.randomUUID();

      // Get DO stub for this session
      const doId = env.AGENTS.idFromName(`session:${sessionId}`);
      const stub = env.AGENTS.get(doId);

      // Start execution using the new request format
      const response = await stub.fetch('/start', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          agentType: 'researcher', // Agent type from registry
          input: { message },
          sessionId,
        }),
      });

      const result = await response.json<{ sessionId: string; streamId: string }>();

      return Response.json({
        sessionId: result.sessionId,
        streamUrl: `/stream/${result.sessionId}`,
        websocketUrl: `/ws/${result.sessionId}`,
      });
    }

    // SSE streaming endpoint
    if (url.pathname.startsWith('/stream/')) {
      const sessionId = url.pathname.split('/').pop()!;
      const doId = env.AGENTS.idFromName(`session:${sessionId}`);
      const stub = env.AGENTS.get(doId);

      // Forward to DO's SSE endpoint
      return stub.fetch(new URL('/sse', request.url), request);
    }

    // WebSocket upgrade
    if (url.pathname.startsWith('/ws/')) {
      const sessionId = url.pathname.split('/').pop()!;
      const doId = env.AGENTS.idFromName(`session:${sessionId}`);
      const stub = env.AGENTS.get(doId);

      // Forward WebSocket upgrade to DO
      return stub.fetch(request);
    }

    // Get status
    if (url.pathname.startsWith('/status/')) {
      const sessionId = url.pathname.split('/').pop()!;
      const doId = env.AGENTS.idFromName(`session:${sessionId}`);
      const stub = env.AGENTS.get(doId);

      return stub.fetch(new URL('/status', request.url));
    }

    return new Response('Not found', { status: 404 });
  },
};

4. Define Your Agent

typescript
// src/agent.ts
import { defineAgent, defineTool } from '@helix-agents/core';
import { z } from 'zod';

const searchTool = defineTool({
  name: 'search',
  description: 'Search for information',
  inputSchema: z.object({
    query: z.string(),
  }),
  execute: async ({ query }) => {
    // Your search implementation
    return { results: [`Found: ${query}`] };
  },
});

export const myAgent = defineAgent({
  name: 'researcher',
  description: 'Research assistant',
  systemPrompt: 'You are a helpful research assistant.',
  tools: [searchTool],
  outputSchema: z.object({
    summary: z.string(),
    sources: z.array(z.string()),
  }),
  maxSteps: 10,
});

DurableObjectAgentConfig

The createAgentServer<TEnv>(config) factory accepts this configuration:

typescript
interface DurableObjectAgentConfig<TEnv = unknown> {
  /**
   * Factory to create LLM adapters.
   * Called once per execution with environment and context.
   * Required.
   */
  llmAdapter: LLMAdapterFactory<TEnv>;

  /**
   * Agent resolution strategy.
   * Can be an AgentRegistry instance or a resolver function.
   * Required.
   */
  agents: AgentResolverInterface<TEnv> | AgentResolver<TEnv>;

  /**
   * Optional lifecycle hooks for customization.
   */
  hooks?: ServerHooks<TEnv>;

  /**
   * Optional custom endpoints.
   * Keys are paths (e.g., "/my-endpoint"), values are handlers.
   * Custom endpoints are checked BEFORE built-in endpoints.
   */
  endpoints?: Record<string, EndpointHandler<TEnv>>;

  /**
   * Optional usage store factory.
   * If not provided, uses the DO-local DOUsageStore.
   */
  usageStore?: UsageStoreFactory<TEnv>;

  /**
   * Optional logger.
   * If not provided, uses noopLogger.
   */
  logger?: Logger;

  /**
   * Optional namespace for routing sub-agent calls to sibling DOs.
   * Required when using createSubAgentTool() — see Sub-Agents section below.
   * Each sub-agent session gets its own isolated DO instance.
   */
  subAgentNamespace?: (env: TEnv) => DurableObjectNamespace;
}

LLMAdapterFactory

Creates LLM adapters with access to environment and execution context:

typescript
type LLMAdapterFactory<TEnv> = (env: TEnv, context: LLMAdapterContext) => LLMAdapter;

interface LLMAdapterContext {
  sessionId: string;
  agentType: string;
  runId: string;
}

Example: Per-agent model selection

typescript
llmAdapter: (env, ctx) => {
  // Choose model based on agent type
  if (ctx.agentType === 'fast-responder') {
    return new VercelAIAdapter({
      openai: createOpenAI({ apiKey: env.OPENAI_API_KEY }),
      defaultModel: 'gpt-4o-mini',
    });
  }
  return new VercelAIAdapter({
    anthropic: createAnthropic({ apiKey: env.ANTHROPIC_API_KEY }),
    defaultModel: 'claude-sonnet-4-20250514',
  });
};

AgentRegistry with Factories

The AgentRegistry supports both static registration and factory functions for dynamic agent creation.

Static Registration

For agents that don't need runtime dependencies:

typescript
import { AgentRegistry } from '@helix-agents/runtime-cloudflare';

const registry = new AgentRegistry();
registry.register(researchAgent);
registry.register(summarizerAgent);

Factory Registration

For agents that need environment bindings, per-request configuration, or dependency injection:

typescript
registry.registerFactory<Env>('brainstorm', (ctx) => {
  return createBrainstormAgent({
    tavilyApiKey: ctx.env.TAVILY_API_KEY,
    database: ctx.env.DATABASE,
    userId: ctx.userId,
  });
});

AgentFactoryContext

Factory functions receive this context:

typescript
interface AgentFactoryContext<TEnv = unknown> {
  /** Environment bindings */
  env: TEnv;
  /** Session ID for the execution */
  sessionId: string;
  /** Run ID for this execution */
  runId: string;
  /** Optional user identifier */
  userId?: string;
}

Using resolve() vs get()

typescript
// get() - Static agents only (legacy, for backward compatibility)
const agent = registry.get('researcher');

// resolve() - Handles both static and factory agents (preferred)
const agent = registry.resolve('brainstorm', {
  env,
  sessionId: 'session-123',
  runId: 'run-456',
  userId: 'user-789',
});

Registry Methods

MethodDescription
register(config)Register a static agent configuration
registerFactory(type, factory)Register a factory function
replace(config)Overwrite an existing static registration
resolve(type, context)Resolve agent (factory-first, then static)
get(type)Get static agent only (legacy)
has(type)Check if agent type is registered
unregister(type)Remove an agent registration
getRegisteredTypes()Get all registered agent type names
clear()Remove all registrations (for testing)

registry.replace(config)

Overwrites an existing static agent registration with a new config, returning whether a prior registration existed. Throws if a factory is registered under the same name.

typescript
const registry = new AgentRegistry();
registry.register(myAgent);

// Later: swap the registered reference (e.g., to attach hooks)
const agentWithHooks = { ...myAgent, hooks: makeHooks() };
const hadPrior = registry.replace(agentWithHooks);
// hadPrior === true

When to use: The Cloudflare Workflows runtime resolves agents BY NAME from the registry — workflow inputs only carry the agent name, not the agent reference. Per-call hooks attached to the agent passed into executor.execute(agent, ...) aren't honored unless the registered reference itself carries them. replace() is the canonical way to swap the reference (most commonly in tests that exercise per-test hooks). JS / DBOS runtimes read the agent reference inline at execute() time and don't need this API.

Don't replace factories. If a factory is registered under the same name, replace() throws. Use unregister() followed by registerFactory() to replace a factory.

AgentNotFoundError

When an agent type is not found, AgentNotFoundError provides helpful information:

typescript
import { AgentNotFoundError } from '@helix-agents/runtime-cloudflare';

try {
  const agent = registry.resolve('unknown', context);
} catch (error) {
  if (error instanceof AgentNotFoundError) {
    console.log(`Unknown type: ${error.agentType}`);
    console.log(`Available: ${error.availableTypes.join(', ')}`);
  }
}

ExecutionState

The ExecutionState interface provides hooks with a clean way to check and control execution:

typescript
interface ExecutionState {
  /** Session ID of current execution, or null if none */
  readonly sessionId: string | null;

  /** Run ID of current execution, or null if none */
  readonly runId: string | null;

  /** Whether an execution is currently running */
  readonly isExecuting: boolean;

  /** Current execution status */
  readonly status: 'idle' | 'running' | 'paused' | 'interrupted' | 'completed' | 'failed';

  /** Get the current execution handle (if any) */
  getHandle<T = unknown>(): AgentExecutionHandle<T> | null;

  /** Interrupt the current execution (soft stop, resumable) */
  interrupt(reason?: string): Promise<void>;

  /** Abort the current execution (hard stop, not resumable) */
  abort(reason?: string): Promise<void>;
}

Example: "Last message wins" pattern

typescript
hooks: {
  beforeStart: async ({ executionState }) => {
    if (executionState.isExecuting) {
      await executionState.interrupt('New message received');
    }
  },
}

Lifecycle Hooks

All hooks are optional. They enable customization without subclassing.

typescript
interface ServerHooks<TEnv = unknown> {
  /**
   * Called before starting execution.
   * Can abort by returning { abort: true, response: Response }.
   */
  beforeStart?: (context: BeforeStartContext<TEnv>) => Promise<void | AbortResult>;

  /**
   * Called after execution handle is created (before it completes).
   */
  afterStart?: (context: AfterStartContext<TEnv>) => Promise<void>;

  /**
   * Called before resuming execution.
   * Can abort by returning { abort: true, response: Response }.
   */
  beforeResume?: (context: BeforeResumeContext<TEnv>) => Promise<void | AbortResult>;

  /**
   * Called after resume handle is created.
   */
  afterResume?: (context: AfterResumeContext<TEnv>) => Promise<void>;

  /**
   * Called before retrying execution.
   * Can abort by returning { abort: true, response: Response }.
   */
  beforeRetry?: (context: BeforeRetryContext<TEnv>) => Promise<void | AbortResult>;

  /**
   * Called after retry handle is created.
   */
  afterRetry?: (context: AfterRetryContext<TEnv>) => Promise<void>;

  /**
   * Called when execution completes (success, failure, or interrupt).
   */
  onComplete?: (context: CompleteContext<TEnv>) => Promise<void>;

  /**
   * Transform incoming /start request to standard format.
   */
  transformRequest?: (rawBody: unknown, env: TEnv) => Promise<StartAgentRequestV2>;

  /**
   * Transform outgoing /start response.
   */
  transformResponse?: (
    response: { sessionId: string; streamId: string; status: 'started' | 'resumed' },
    context: { sessionId: string; agentType: string }
  ) => Promise<unknown>;
}

Hook Contexts

BeforeStartContext:

typescript
interface BeforeStartContext<TEnv> {
  env: TEnv;
  request: Request;
  body: StartAgentRequestV2;
  executionState: ExecutionState;
}

AfterStartContext:

typescript
interface AfterStartContext<TEnv> {
  env: TEnv;
  sessionId: string;
  runId: string;
  agentType: string;
  handle: AgentExecutionHandle<unknown>;
}

CompleteContext:

typescript
interface CompleteContext<TEnv> {
  env: TEnv;
  sessionId: string;
  runId: string;
  status: 'completed' | 'failed' | 'interrupted';
  result?: unknown;
  error?: Error;
}

Aborting Requests

Hooks can abort requests by returning an AbortResult:

typescript
hooks: {
  beforeStart: async ({ body }) => {
    if (!body.userId) {
      return {
        abort: true,
        response: Response.json({ error: 'User ID required' }, { status: 401 }),
      };
    }
  },
}

Request/Response Transformation

Transform custom request formats to the standard format:

typescript
hooks: {
  transformRequest: async (rawBody, env) => {
    const body = rawBody as { prompt: string; conversationId: string };
    return {
      agentType: 'brainstorm',
      input: { message: body.prompt },
      sessionId: body.conversationId,
    };
  },

  transformResponse: async (response, { sessionId }) => {
    return {
      id: sessionId,
      status: 'processing',
    };
  },
}

Custom Endpoints

Add custom HTTP endpoints that are checked before built-in endpoints:

typescript
export const MyAgentServer = createAgentServer<Env>({
  llmAdapter: (env) =>
    new VercelAIAdapter({
      /* ... */
    }),
  agents: registry,

  endpoints: {
    '/health': async (request, { executionState }) => {
      return Response.json({
        status: 'healthy',
        isExecuting: executionState.isExecuting,
      });
    },

    '/custom-action': async (request, { env, logger }) => {
      if (request.method !== 'POST') {
        return Response.json({ error: 'Method not allowed' }, { status: 405 });
      }
      const body = await request.json();
      logger.info('Custom action', { body });
      return Response.json({ success: true });
    },
  },
});

EndpointHandler

typescript
type EndpointHandler<TEnv> = (
  request: Request,
  context: EndpointContext<TEnv>
) => Promise<Response>;

interface EndpointContext<TEnv> {
  env: TEnv;
  executionState: ExecutionState;
  logger: Logger;
}

New Request Format

The composition API uses a new request format that uses agentType instead of the full AgentConfig:

StartAgentRequestV2

typescript
interface StartAgentRequestV2 {
  /** Agent type to execute (looked up in registry/resolver) */
  agentType: string;

  /** Input for the agent */
  input: {
    /** String message or array of UserInputMessage for multi-message input */
    message: string | UserInputMessage[];
    state?: Record<string, unknown>;
    messages?: Message[];
  };

  /** Session identifier (required) */
  sessionId: string;

  /** Optional user identifier */
  userId?: string;

  /** Optional tags for the session */
  tags?: string[];

  /** Optional metadata for the session */
  metadata?: Record<string, string>;
}

/** Individual message in a multi-message input */
interface UserInputMessage {
  role: 'user';
  content: string;
  metadata?: Record<string, unknown>;
  files?: FileInput[];
}

interface FileInput {
  data: string; // Base64 data URI
  mediaType: string; // MIME type
  filename?: string;
}

ResumeAgentRequestV2

typescript
interface ResumeAgentRequestV2 {
  /** Agent type (for context) */
  agentType: string;

  /** Resume options */
  options: {
    mode?: 'continue' | 'retry' | 'branch';
    checkpointId?: string;
    modifyState?: unknown;
    appendMessages?: Message[];
  };
}

AgentServer HTTP Endpoints

The AgentServer exposes the following HTTP endpoints:

EndpointMethodDescription
/startPOSTStart new agent execution
/resumePOSTResume paused/interrupted agent
/retryPOSTRetry a failed agent
/interruptPOSTRequest soft interruption (legacy — see /chat/{id}/interrupt)
/abortPOSTForce stop execution immediately
/statusGETGet current execution status
/sseGETSSE streaming connection
/historyGETGet historical stream chunks
/usageGETGet usage data (tokens, tool calls)
/messagesGETGet conversation messages (paginated)
/snapshotGETGet UI state snapshot for initialization
/workspaceGETOperator workspace registry snapshot
/chatPOSTv7: Unified entry — fresh / continue / resume / attach
/chat/{id}/streamGETv7: Re-attach to in-flight stream after submit/interrupt
/chat/{id}/submit-tool-resultPOSTv7: Durable submit for client/approval tools
/chat/{id}/interruptPOSTv7: Durable interrupt (writes flag, returns 202)
/chat/{id}/abortPOSTv7: Hard abort (writes terminal status)

The five /chat/... routes form the v7 stateless HITL chat protocol, wired via the chatHandler config field on createAgentServer (typically handleChatStream from @helix-agents/ai-sdk). See v6 → v7 migration guide for the full protocol.

Why /submit-tool-result auto-continues the run on DO

A Durable Object has no long-running process between requests — each request wakes the DO, runs, and the DO is free to be evicted afterward. So when a client-tool result is submitted, /submit-tool-result does the durable write and calls ensureExecutionContinues() inline, which spins up a fire-and-forget executor.resume() to advance the suspended run. This is the only mechanism that can advance the run on DO; removing it would strand the run (the result would sit undrained until some unrelated request happened to wake the DO). This differs from JS / Temporal / Cloudflare Workflows, where a follow-up resume() request drives continuation, and matches DBOS, which auto-continues via its live DBOS.recv(). See the client-tool continuation model for the full per-runtime breakdown. The follow-up /chat/{id}/stream (or the AI SDK's helixSAW-driven request) simply re-attaches to the already-running stream to read the continuation live — it does not re-drive it, and it does not emit data-stream-resync.

The /workspace route is the wire-level companion to round-5 D6's JSAgentExecutor.getWorkspaceRegistry(sessionId) accessor. See the @helix-agents/agent-server README on GitLab for the response shape and the 404 disambiguation between RUNTIME_NO_WORKSPACE_SUPPORT and WORKSPACES_NOT_FOUND. The DO base class does NOT auto-wire this route today (the DO constructs JSAgentExecutor instances per-request, not statefully); operators wanting introspection from inside a DO subclass should follow the in-process pattern in docs/workspaces/runbook.md.

Starting Execution

typescript
// Request with a string message
const response = await stub.fetch('/start', {
  method: 'POST',
  body: JSON.stringify({
    agentType: 'researcher', // Agent type from registry
    input: { message: 'Hello' }, // Input message (string or UserInputMessage[])
    sessionId: 'session-123', // Session ID for conversation continuity
    userId: 'user-456', // Optional: user context
    tags: ['test'], // Optional: tags
    metadata: { key: 'value' }, // Optional: metadata
  }),
});

// Request with multiple messages (context injection, file attachments)
const response = await stub.fetch('/start', {
  method: 'POST',
  body: JSON.stringify({
    agentType: 'researcher',
    input: {
      message: [
        {
          role: 'user',
          content: 'Background: user is on the enterprise plan',
          metadata: { source: 'system' },
        },
        { role: 'user', content: 'What features do I have access to?' },
      ],
    },
    sessionId: 'session-123',
  }),
});

// Response
interface AgentStartResponse {
  sessionId: string;
  streamId: string;
  status: 'started' | 'resumed';
}

Resuming Execution

typescript
const response = await stub.fetch('/resume', {
  method: 'POST',
  body: JSON.stringify({
    agentType: 'researcher',
    options: {
      mode: 'continue',           // 'continue' | 'retry' | 'branch'
      checkpointId: 'cp-123',     // Optional: resume from checkpoint
      modifyState: (s) => s,      // Optional: modify state before resume
      appendMessages: [...],      // Optional: add messages before resume
    },
  }),
});

Interrupting Execution

typescript
// Soft interrupt - agent finishes current step
await stub.fetch('/interrupt', {
  method: 'POST',
  body: JSON.stringify({ reason: 'user_requested' }),
});

// Hard abort - stops immediately
await stub.fetch('/abort', {
  method: 'POST',
  body: JSON.stringify({ reason: 'timeout' }),
});

Getting Status

typescript
const response = await stub.fetch('/status');
const status = await response.json<AgentStatusResponse>();

interface AgentStatusResponse {
  sessionId: string | null;
  status: string;
  // v6: 'active' | 'completed' | 'failed' | 'paused' | 'interrupted'
  // v7 adds three suspended_* variants for the stateless suspension model:
  //   'suspended_client_tool'        — pending client-executed tool
  //   'suspended_awaiting_children'  — sub-agent suspension cascade
  //   'suspended_step_partial'       — mixed phase-1/phase-2 batch
  stepCount: number;
  output?: unknown;
  error?: string;
  checkpointId?: string;
  isExecuting: boolean; // true if currently running
}

Getting History

typescript
const response = await stub.fetch('/history?fromSequence=0&limit=100');
const history = await response.json<{
  chunks: Array<{ sequence: number; chunk: StreamChunk }>;
  hasMore: boolean;
  latestSequence: number;
}>();

Getting Messages

Fetch conversation messages with pagination:

typescript
const response = await stub.fetch('/messages?offset=0&limit=50');
const data = await response.json<{
  messages: UIMessage[];
  total: number;
  offset: number;
  limit: number;
  hasMore: boolean;
}>();

Getting Snapshot

Get a complete UI state snapshot for initializing frontend state (e.g., on page refresh):

typescript
const response = await stub.fetch('/snapshot');
const snapshot = await response.json<{
  runId: string;
  messages: UIMessage[];
  state: TState | null;
  streamSequence: number;
  status: 'active' | 'paused' | 'ended' | 'failed';
  timestamp: number;

  // System-level SessionState fields. Emitted only when set (absent keys
  // mean "none"). These let `DOStateStoreClient.loadState` — the read view
  // used by `createCloudflareChatHandler` — reconstruct a faithful
  // SessionState. The most important is `pendingClientToolCalls`: without
  // it the chat handler can't validate client-tool-result submissions and
  // rejects every resume as `unknown-tool-call` (the agent stays paused
  // forever). `agentType` / `version` replace the historical `'unknown'` /
  // `1` placeholders.
  agentType?: string;
  version?: number;
  error?: string;
  interruptContext?: InterruptContext;
  pendingClientToolCalls?: Record<string, PendingClientToolCall>;
  clientToolCallOwnership?: Record<string, string>;
  completedClientToolCalls?: Record<string, number>;
  rootSessionId?: string;
  parentSessionId?: string;
}>();

// Use for initializing useChat
const { messages, setMessages } = useChat({
  initialMessages: snapshot.messages,
});

Upgrading

These system-level fields were added in @helix-agents/runtime-cloudflare@5.5.0. Older DO peers omit them; DOStateStoreClient falls back to the historical agentType: 'unknown' / version: 1 placeholders and treats missing pendingClientToolCalls as "no pending entries", so there is no rolling-deploy hazard. If client-tool resume on the Cloudflare DO runtime leaves the agent stuck in paused with a data-resume-rejected (reason: 'unknown-tool-call') event, upgrade runtime-cloudflare to ≥ 5.5.0.

Streaming

WebSocket Streaming

WebSocket connections survive DO hibernation and provide real-time bidirectional communication:

typescript
// Client-side
const ws = new WebSocket(`wss://your-worker.workers.dev/ws/${sessionId}?fromSequence=0`);

ws.onopen = () => {
  console.log('Connected');
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);

  if (data.type === 'chunk') {
    handleChunk(data.chunk);
  } else if (data.type === 'end') {
    console.log('Stream ended', data.finalOutput);
    ws.close();
  } else if (data.type === 'fail') {
    console.error('Stream failed', data.error);
    ws.close();
  } else if (data.type === 'pong') {
    // Response to ping
  }
};

ws.onerror = (error) => {
  console.error('WebSocket error', error);
};

// Send ping to keep alive
setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: 'ping' }));
  }
}, 30000);

SSE Streaming

Server-Sent Events for one-way streaming (simpler, wider browser support):

typescript
// Client-side
const eventSource = new EventSource(
  `https://your-worker.workers.dev/stream/${sessionId}?fromSequence=0`
);

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);

  switch (data.type) {
    case 'chunk':
      handleChunk(data.chunk);
      break;
    case 'end':
      console.log('Complete', data.finalOutput);
      eventSource.close();
      break;
    case 'fail':
      console.error('Failed', data.error);
      eventSource.close();
      break;
  }
};

eventSource.onerror = (error) => {
  console.error('SSE error', error);
  eventSource.close();
};

// SSE heartbeats are sent automatically every 30 seconds

Stream Events

Both WebSocket and SSE receive the same event types:

typescript
type StreamEvent =
  | { type: 'chunk'; sequence: number; chunk: StreamChunk }
  | { type: 'end'; finalOutput: unknown }
  | { type: 'fail'; error: string };

type StreamChunk =
  | { type: 'text_delta'; delta: string; agentId: string; ... }
  | { type: 'tool_call_start'; toolName: string; ... }
  | { type: 'tool_result'; result: unknown; ... }
  | { type: 'run_completed'; output: unknown; ... }
  | { type: 'run_interrupted'; reason: string; ... }
  // v7 additions:
  | { type: 'tool_approval_request'; toolCallId: string; toolName: string; input: unknown; ... }
  | { type: 'tool_approval_response'; toolCallId: string; decision: 'approve' | 'reject'; ... }
  | { type: 'run_resumed'; previousRunId: string; ... }
  | { type: 'run_suspended'; reason: 'client_tool' | 'awaiting_children' | 'step_partial'; ... }
  // ... and more

LLM Adapter Handling

Critical

LLM adapters cannot be serialized over HTTP. They must be created within the worker/DO context.

With the composition API, the llmAdapter factory handles this automatically:

typescript
export const MyAgentServer = createAgentServer<Env>({
  // Called inside the DO for each execution
  llmAdapter: (env) =>
    new VercelAIAdapter({
      model: openai('gpt-4o', { apiKey: env.OPENAI_API_KEY }),
    }),
  // ...
});

Hibernation (v7 Stateless Model)

The DO runtime uses a fully stateless suspension model in v7. The DO closes its SSE response and becomes hibernation-eligible at every Human-In-The-Loop (HITL) suspension boundary, instead of holding the request open across HITL waits.

v7 Behavior

  • The DO request handler closes its SSE stream when an agent reaches a suspension boundary ('suspended_client_tool', 'suspended_awaiting_children', 'suspended_step_partial').
  • DOs become hibernation-eligible at every suspension boundary — no in-memory promises pin the instance.
  • SQLite state (including suspension_context) persists across hibernation; the DO has zero in-memory state to restore on wake.
  • Idle CPU billing on long-suspended HITL sessions drops sharply (typical reduction of ~80% wall-time billing on multi-minute approvals).
  • Clients reconnect via the new GET /chat/{id}/stream route after submitting a tool result. The previous "hold the SSE open through HITL" pattern is gone.

Resume Protocol

typescript
// 1. Submit a client-tool result (durable, idempotent)
await fetch(`/chat/${sessionId}/submit-tool-result`, {
  method: 'POST',
  body: JSON.stringify({ toolCallId, result }),
});

// 2. Re-attach to the (newly-resumed) stream
const eventSource = new EventSource(`/chat/${sessionId}/stream`);

The framework handles the rest: the durable submit lands in pendingClientToolCalls, the DO wakes and observes the change, and the resume drains pending entries before continuing.

Durable Interrupt (No More INTERRUPT_NOT_LOCAL)

The INTERRUPT_NOT_LOCAL 503 error code is removed in v7. Interrupt is now durable:

  • POST /chat/{id}/interrupt writes a flag to durable state and returns 202 Accepted immediately.
  • The running DO observes the flag opportunistically (within ~5s by default; configurable via the host server's interruptObservationDeadlineMs option).
  • Cross-replica interrupts no longer fail — they succeed asynchronously.
  • If the observation deadline elapses without the flag being observed, the route returns 504.

Keeping DOs Alive (Optional)

If you want the DO to remain warm across active streaming sessions (e.g., for an active multi-turn conversation without HITL), keep a WebSocket connection alive:

typescript
setInterval(() => ws.send(JSON.stringify({ type: 'ping' })), 30000);

For HITL flows this is unnecessary — the DO is designed to hibernate during waits.

Multi-Turn Conversations

Continue conversations across multiple messages:

typescript
// First message - creates a new session
const sessionId = 'session-123';
await stub.fetch('/start', {
  method: 'POST',
  body: JSON.stringify({
    agentType: 'researcher',
    input: { message: 'Hello, my name is Alice' },
    sessionId,
  }),
});

// Wait for completion...

// Continue the conversation - same sessionId
await stub.fetch('/start', {
  method: 'POST',
  body: JSON.stringify({
    agentType: 'researcher',
    input: { message: 'What is my name?' },
    sessionId, // Reuse the same session
  }),
});

// The agent remembers: "Your name is Alice"

Multi-Message Input

The message field in input accepts either a string or a UserInputMessage[] array. This lets you inject context alongside the user's question or attach files:

typescript
await stub.fetch('/start', {
  method: 'POST',
  body: JSON.stringify({
    agentType: 'researcher',
    input: {
      message: [
        {
          role: 'user',
          content: 'System context: user is a premium subscriber',
          metadata: { source: 'system' },
        },
        { role: 'user', content: 'What premium features can I use?' },
      ],
    },
    sessionId,
  }),
});

// Messages can include file attachments
await stub.fetch('/start', {
  method: 'POST',
  body: JSON.stringify({
    agentType: 'researcher',
    input: {
      message: [
        {
          role: 'user',
          content: 'Analyze this image',
          files: [
            { data: 'data:image/png;base64,...', mediaType: 'image/png', filename: 'chart.png' },
          ],
        },
      ],
    },
    sessionId,
  }),
});

String and multi-message inputs can be mixed freely across turns in the same session.

Interrupt & Resume

Full interrupt/resume support is available:

typescript
// Start execution
await stub.fetch('/start', {
  method: 'POST',
  body: JSON.stringify({
    agentType: 'researcher',
    input: { message: 'Long research task' },
    sessionId,
  }),
});

// Interrupt later
await stub.fetch('/interrupt', {
  method: 'POST',
  body: JSON.stringify({ reason: 'user_pause' }),
});

// Check if resumable
const statusResponse = await stub.fetch('/status');
const { status } = await statusResponse.json();

if (status === 'paused' || status === 'interrupted') {
  // Resume
  await stub.fetch('/resume', {
    method: 'POST',
    body: JSON.stringify({
      agentType: 'researcher',
      options: { mode: 'continue' },
    }),
  });
}

DOStateStore

The DOStateStore uses the DO's built-in SQLite for persistence:

typescript
import { DOStateStore } from '@helix-agents/runtime-cloudflare';

// Created internally by AgentServer
const stateStore = new DOStateStore({
  sql: this.sql, // PartyServer's sql tagged template
  logger: this.logger,
});

// Implements SessionStateStore interface
await stateStore.saveState(sessionId, state);
const loaded = await stateStore.loadState(sessionId);
await stateStore.appendMessages(sessionId, messages);

State size limit (StateSizeExceededError)

Cloudflare DO SQL has a documented 2 MiB per-row size limit. Because the staging row carries BOTH patches (RFC 6902 patch list) AND merge_changes (the merge representation) for each tool call, large state writes hit this cap surprisingly fast — a ~900 KB payload becomes a ~1.8 MB row before any encoding overhead.

DOStateStore.stageChanges() runs a fail-fast pre-flight size check before issuing the INSERT. When the serialized payload exceeds the configured threshold, it throws a typed StateSizeExceededError carrying the exact byte counts and the offending top-level state keys, so triage takes seconds rather than the hours the original opaque DO state operation 'stageChanges' failed for sessionId: ... message required.

typescript
import {
  DOStateStore,
  StateSizeExceededError,
  DEFAULT_MAX_STAGED_CHANGES_BYTES, // 1.8 MiB
} from '@helix-agents/runtime-cloudflare';

// Configure (defaults to 1.8 MiB, leaving headroom under the 2 MiB cap)
const stateStore = new DOStateStore({
  sql: this.sql,
  // Tune lower if you want stricter agent-author guardrails:
  maxStagedChangesBytes: 1_500_000,
  // Or set Infinity to disable the pre-flight (NOT recommended — SQLite
  // will still reject oversized rows, just with a far less informative error).
});

try {
  await stateStore.stageChanges(sessionId, stepId, changes);
} catch (err) {
  if (err instanceof StateSizeExceededError) {
    console.error(
      `Tool ${err.toolCallId} wrote ${err.totalBytes}B (limit ${err.limit}B). ` +
        `Offending keys: ${err.changedKeys.join(', ')}`
    );
    // Recommended remediation: persist large blobs to D1/R2/Postgres
    // and store only the id/reference in agent state.
  }
  throw err;
}

When SQLite itself rejects a row (e.g. you raised the threshold above the platform limit, or some other column pushed the row over), stageChanges wraps the underlying error in a DOStateError whose message includes the serialized sizes alongside the SQLite error class and message. Both the typed cause field and the standard ES2022 Error.cause chain are populated so V8's stack-trace formatter and structured loggers can walk the chain.

See docs/guide/state.md § "State Size and Storage Limits" for the architectural pattern (state holds control flow, durable stores hold artifacts).

Schema (V5)

The DOStateStore creates these tables automatically and migrates lazily on first DO access. The current DO_SCHEMA_VERSION is 5 (V4 added completed_client_tool_calls; V5 added suspension_context for the v7 stateless suspension model).

sql
-- Main state table (single row per DO)
CREATE TABLE state (
  id INTEGER PRIMARY KEY DEFAULT 1,
  session_id TEXT NOT NULL,
  agent_type TEXT NOT NULL,
  stream_id TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'active',
  step_count INTEGER DEFAULT 0,
  custom_state TEXT,
  output TEXT,
  error TEXT,
  checkpoint_id TEXT,
  completed_client_tool_calls TEXT,        -- V4
  suspension_context TEXT DEFAULT NULL,    -- V5 (suspendedAwaitingChildren, suspendedStepId,
                                           --     tracingContext, expiresAt)
  ...
);

-- Messages table
CREATE TABLE messages (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  role TEXT NOT NULL,
  content TEXT,
  tool_calls TEXT,
  ...
);

-- Stream chunks table
CREATE TABLE stream_chunks (
  sequence INTEGER PRIMARY KEY,
  chunk TEXT NOT NULL,
  created_at INTEGER NOT NULL
);

See docs/storage/cloudflare.md for the full migration progression.

DOStreamManager

The DOStreamManager writes directly to connected clients:

typescript
import { DOStreamManager } from '@helix-agents/runtime-cloudflare';

// Created internally by AgentServer
const streamManager = new DOStreamManager({
  sql: this.sql,
  getConnections: () => this.getConnections(),
  broadcast: (msg) => this.broadcast(msg),
  sseConnections: this.sseConnections,
  logger: this.logger,
});

// Writes to SQLite AND broadcasts to all connected clients
const writer = await streamManager.createWriter(streamId, agentId, agentType);
await writer.write(chunk); // FREE - no subrequest!

Sub-Agents

Sub-agents work transparently in the DO runtime — use createSubAgentTool() exactly as you would in other runtimes. The only requirement is configuring subAgentNamespace so the DO runtime knows which namespace to use when spawning sibling DOs.

Why Sibling DOs?

Each DO instance hosts a single session backed by its own SQLite database. Running a sub-agent inside the same DO as its parent would corrupt the parent's state. Instead, the DO runtime automatically routes createSubAgentTool() calls to sibling DO instances — each sub-agent session gets its own isolated DO with its own SQLite.

This is handled internally by DOStubTransport, an implementation of the RemoteAgentTransport interface that routes calls to sibling DOs within the same namespace. You don't interact with DOStubTransport directly — it is wired up automatically when subAgentNamespace is configured. Agent definitions don't need to change; the tool rewriting happens at execution time.

Configuration

Add subAgentNamespace to your createAgentServer config:

typescript
// wrangler.toml
// [[durable_objects.bindings]]
// name = "AGENTS"
// class_name = "MyAgentServer"

interface Env {
  AGENTS: DurableObjectNamespace;
  ANTHROPIC_API_KEY: string;
}

export const MyAgentServer = createAgentServer<Env>({
  llmAdapter: (env) =>
    new VercelAIAdapter({
      anthropic: createAnthropic({ apiKey: env.ANTHROPIC_API_KEY }),
    }),
  agents: registry,

  // Point to the same DO namespace used for the parent
  subAgentNamespace: (env) => env.AGENTS,
});

The sub-agent DO instances are resolved by session ID using namespace.idFromName(sessionId), so each sub-agent session gets a deterministic, isolated DO.

subAgentNamespace is required for sub-agents

If you use createSubAgentTool() in your agents but do not configure subAgentNamespace, the sub-agent tools will not be rewritten. The sub-agent will execute in-process within the same DO instance as the parent, sharing its single-session SQLite database. This will corrupt the parent's state. Always set subAgentNamespace when your agents include sub-agent tools.

Defining Sub-Agent Tools

Define sub-agents and tools exactly as documented in Sub-Agent Orchestration — no DO-specific changes needed. The only requirement is that both parent and sub-agents are registered in the same AgentRegistry, since the sibling DO resolves the sub-agent type from the same registry.

typescript
import { defineAgent, createSubAgentTool } from '@helix-agents/core';
import { AgentRegistry } from '@helix-agents/runtime-cloudflare';
import { z } from 'zod';

const AnalyzerAgent = defineAgent({
  name: 'analyzer',
  systemPrompt: 'You analyze text for sentiment and key topics.',
  outputSchema: z.object({
    sentiment: z.enum(['positive', 'negative', 'neutral']),
    topics: z.array(z.string()),
  }),
  maxSteps: 5,
});

const analyzerTool = createSubAgentTool(
  AnalyzerAgent,
  z.object({ text: z.string().describe('Text to analyze') }),
  {
    description: 'Analyze text for sentiment and topics',
    timeoutMs: 60_000, // 60 second timeout (default: 10 minutes)
  }
);

const OrchestratorAgent = defineAgent({
  name: 'orchestrator',
  systemPrompt: 'Coordinate analysis tasks.',
  tools: [analyzerTool],
  outputSchema: z.object({ summary: z.string() }),
});

// Both parent and sub-agents must be in the same registry
const registry = new AgentRegistry();
registry.register(OrchestratorAgent);
registry.register(AnalyzerAgent);

LLM adapter for sub-agents

Sub-agent DOs use the same llmAdapter factory configured in createAgentServer. Any llmConfig specified on the agent definition is passed to the adapter but the adapter itself is always created by the server-level factory. Use the LLMAdapterContext.agentType parameter if you need per-agent model selection.

How It Works

When a parent agent calls a sub-agent tool:

  1. Tool rewriting — At execution start, createSubAgentTool() tools are transparently converted to createRemoteSubAgentTool() calls backed by DOStubTransport
  2. Session ID — A deterministic session ID is generated: ${parentSessionId}-remote-${toolCallId}
  3. Sibling DODOStubTransport resolves a sibling DO via namespace.idFromName(sessionId) and calls its /subagent/:agentType/start endpoint
  4. Isolated execution — The sibling DO runs the sub-agent with its own SQLite state
  5. Streaming — Events stream back through SSE from the sibling DO and are proxied into the parent's stream

The parent stream includes subagent_start/subagent_end events identical to local sub-agents — frontends don't need to distinguish between runtimes.

Persistent Sub-Agents

Persistent sub-agents in the DO runtime work the same as ephemeral sub-agents at the infrastructure level -- each child gets its own sibling DO instance. The difference is in lifecycle management.

Auto-injected companion tools (v7): Persistent-sub-agent agents have these companion tools auto-injected at registration time. The previous v7-pre gating is lifted — they are unconditionally available on the DO runtime in v7:

ToolPurpose
companion__spawnAgentSpawn a new persistent child (blocking or non-blocking)
companion__sendMessageSend a follow-up message to an already-spawned child
companion__listChildrenList the parent's persistent children
companion__getChildStatusQuery a child's current status
companion__waitForResultBlock until a child reaches a terminal status
companion__terminateChildForcibly stop a child
  • The subAgentNamespace configuration is required (same as for ephemeral sub-agents).
  • Persistent children maintain state across multiple interactions within their DO's SQLite.
  • The DO V4 migration adds mode and name columns to __agents_sub_session_refs for persistent child tracking.

No additional DO configuration is needed beyond what is already documented for ephemeral sub-agents.

Re-spawning a completed persistent child continues it on its preserved session (memory retained) rather than recreating it — see Re-consulting a persistent companion (the critic loop).

DOUsageStore

The DOUsageStore tracks token usage, tool calls, and sub-agent invocations using the DO's built-in SQLite:

typescript
import { DOUsageStore } from '@helix-agents/runtime-cloudflare';

// Created internally by AgentServer
const usageStore = new DOUsageStore({
  sql: this.sql, // PartyServer's sql tagged template
  sessionId: 'session-123',
  logger: this.logger,
});

// Implements UsageStore interface
await usageStore.recordEntry(entry);
const entries = await usageStore.getEntries(sessionId);
const rollup = await usageStore.getRollup(sessionId);

Getting Usage via HTTP

typescript
// Get usage rollup for parent agent only
const response = await stub.fetch('/usage');
const rollup = await response.json();

// Get usage rollup including sub-agents (note: limited in DO runtime)
const response = await stub.fetch('/usage?mode=rollup');

// Get raw entries
const response = await stub.fetch('/usage?mode=entries');

Usage Response

typescript
interface TokenCounts {
  prompt?: number;
  completion?: number;
  reasoning?: number;
  cached?: number;
  total?: number;
}

interface UsageRollup {
  runId: string;

  // Token usage for this agent only
  tokens: TokenCounts;
  tokensByModel: Record<string, TokenCounts>;

  // Token usage including sub-agents (see limitation below)
  tokensIncludingSubAgents: TokenCounts;

  // Tool execution statistics
  toolStats: {
    totalCalls: number;
    successfulCalls: number;
    failedCalls: number;
    totalDurationMs: number;
    byTool: Record<
      string,
      {
        calls: number;
        successfulCalls: number;
        failedCalls: number;
        totalDurationMs: number;
      }
    >;
  };

  // Sub-agent execution statistics
  subAgentStats: {
    totalCalls: number;
    successfulCalls: number;
    failedCalls: number;
    totalDurationMs: number;
    byType: Record<
      string,
      {
        calls: number;
        successfulCalls: number;
        failedCalls: number;
        totalDurationMs: number;
      }
    >;
  };

  // Custom metrics aggregated by type, then by name
  // Example: { 'api_calls': { 'tavily': 5 }, 'bytes': { 'input': 1024 } }
  custom: Record<string, Record<string, number>>;

  // Timing
  startedAt?: number;
  lastUpdatedAt?: number;
  entryCount: number;
}

Sub-Agent Usage Limitation

In the DO runtime, sub-agents run in separate Durable Objects. Cross-DO usage aggregation is not currently supported. The tokensIncludingSubAgents field will only include the parent agent's tokens. For accurate sub-agent usage, query each sub-agent's DO directly.

Migration from Legacy AgentServer API

The subclass-based AgentServer API has been removed. Use createAgentServer() instead:

typescript
import { createAgentServer } from '@helix-agents/runtime-cloudflare';

const registry = new AgentRegistry();
registry.register(myAgent);

export const MyAgentServer = createAgentServer<Env>({
  llmAdapter: (env) =>
    new VercelAIAdapter({
      model: openai('gpt-4o', { apiKey: env.OPENAI_API_KEY }),
    }),
  agents: registry,
  hooks: {
    beforeStart: async ({ executionState }) => {
      if (executionState.isExecuting) {
        await executionState.interrupt('superseded');
      }
    },
  },
});

Limitations

LLM Adapter Serialization

LLM adapters contain functions and API clients that can't be serialized. With the composition API, the llmAdapter factory handles this automatically since it runs inside the DO.

Single DO Per Run

Each agent run gets its own DO instance. This means:

  • No shared state between runs (use D1 if needed)
  • Each run has its own SQLite database
  • Geographic pinning based on first access

Geographic Pinning

DOs are pinned to a region on first access:

typescript
// First access pins to nearest region
const stub = env.AGENTS.get(doId);

// All subsequent access goes to that region
// Consider region affinity for latency-sensitive apps

Memory Limits

DOs have memory limits. For very long conversations:

  • Consider truncating message history
  • Use checkpoints to manage state size
  • Monitor memory usage in production

No Step-Level Durability

Unlike Workflows, the DO runtime doesn't have automatic step-level retries:

  • If the DO crashes mid-execution, state reflects last committed step
  • Use checkpoints for crash recovery
  • Consider Workflows runtime for critical durability needs

Best Practices

1. Keep Connections Alive

typescript
// Client-side keepalive
setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: 'ping' }));
  }
}, 30000);

2. Handle Reconnection

typescript
// Track last received sequence
let lastSequence = 0;

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.sequence) {
    lastSequence = data.sequence;
  }
};

// Reconnect from last sequence
function reconnect() {
  ws = new WebSocket(`wss://.../ws/${sessionId}?fromSequence=${lastSequence}`);
}

3. Use Proper Error Handling

typescript
try {
  const response = await stub.fetch('/start', { ... });
  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.error);
  }
} catch (error) {
  // Handle DO errors (network, timeout, etc.)
  console.error('DO error:', error);
}

4. Monitor Subrequest Usage

typescript
// The only subrequests should be LLM API calls
// Monitor via Cloudflare dashboard to ensure streaming is free

Deployment

Development

bash
wrangler dev

Production

bash
wrangler deploy

Secrets

bash
wrangler secret put OPENAI_API_KEY

Next Steps

Released under the MIT License.