@helix-agents/runtime-dbos

DBOS Transact runtime for durable agent execution on Postgres. Provides crash recovery, automatic step retries, and durable interrupt/abort. Implements the standard AgentExecutor interface, so the public surface mirrors JSAgentExecutor and TemporalAgentExecutor.

DBOS-native suspension

Unlike the v7 stateless-suspension runtimes (JS, Temporal, Cloudflare) that suspend via a durable-state suspensionContext, runtime-dbos uses DBOS-native DBOS.recv / DBOS.send primitives over Postgres-backed workflow replay. Functionally equivalent for callers, architecturally separate. See the DBOS Runtime guide.

Known limitations

Memory auto-injection/extraction is unsupported. The memory: config field is accepted at the type level but is never invoked on DBOS. Agents requiring memory recall or storage should run on the JS or Cloudflare runtimes.

Prompt caching is supported

LLMConfig.cache strategies (anthropicCache, openaiCache, xaiCache, or a custom CacheStrategy) are applied on DBOS at parity with the other runtimes. Strategies are not JSON-serializable, so the executor registers the live strategy in a process-local registry (alongside the live model) and the workflow body applies it before each LLM call.

Installation

bash

npm install @helix-agents/runtime-dbos @dbos-inc/dbos-sdk

DBOSAgentExecutor

The top-level AgentExecutor implementation. The constructor binds every @DBOS.step's static dependencies and calls registerDBOSAgentWorkflows() (idempotent).

typescript

import { DBOS } from '@dbos-inc/dbos-sdk';
import { DBOSAgentExecutor, registerDBOSAgentWorkflows } from '@helix-agents/runtime-dbos';
import { PostgresStateStore } from '@helix-agents/store-postgres';
import { RedisStreamManager } from '@helix-agents/store-redis';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';

registerDBOSAgentWorkflows(); // before DBOS.launch()
await DBOS.launch();

const executor = new DBOSAgentExecutor({
  stateStore: new PostgresStateStore({ connectionString: process.env.DATABASE_URL! }),
  streamManager: new RedisStreamManager({ url: process.env.REDIS_URL! }),
  llmAdapter: new VercelAIAdapter({ model: openai('gpt-4o') }),
});

DBOSAgentExecutorConfig

Constructor configuration. stateStore, streamManager, and llmAdapter are required; all other fields are optional.

typescript

interface DBOSAgentExecutorConfig {
  /** State store for session data, messages, and checkpoints. `PostgresStateStore` recommended. */
  stateStore: SessionStateStore;
  /** Stream manager for real-time token streaming. `RedisStreamManager` required in production. */
  streamManager: StreamManager;
  /** LLM adapter (e.g., `VercelAIAdapter`). */
  llmAdapter: LLMAdapter;
  /** Optional usage store for token-count tracking. */
  usageStore?: UsageStore;
  /** Optional hook manager for lifecycle hooks. */
  hookManager?: HookManager;
  /** Optional logger (compatible with pino, winston, etc.). Defaults to no-op. */
  logger?: Logger;
  /** Default idle timeout for persistent workflows (ms). Default: 86_400_000 (24h). */
  defaultPersistentIdleTimeoutMs?: number;
  /** Per-runtime tool retry policy. Defaults match runtime-temporal. */
  toolRetryPolicy?: ToolRetryPolicy;
}

No registry argument

DBOSAgentExecutor does not take an AgentRegistry. Agents are registered lazily per execute() / resume() / retry() call. Pass stateStore, streamManager, and llmAdapter directly.

The default tool retry policy:

typescript

const DEFAULT_TOOL_RETRY_POLICY = {
  retriesAllowed: true,
  maxAttempts: 3,
  initialIntervalSeconds: 1.0,
  backoffCoefficient: 2.0,
  maximumIntervalSeconds: 60.0,
};

Methods

The executor implements the standard AgentExecutor interface plus internal helpers invoked by handles.

execute

Starts a new agent execution (standard mode) or routes a message into an existing persistent session (persistent mode). options.sessionId is required — execute() throws if it is omitted.

typescript

execute<TState, TOutput>(
  agent: AgentConfig<z.ZodType<TState>, z.ZodType<TOutput>>,
  input: AgentInput<TState>,
  options?: ExecuteOptions
): Promise<AgentExecutionHandle<TOutput>>;

In standard mode, starts a new short-lived DBOS workflow; a second call for an already-running session returns a handle to the existing run rather than duplicating execution. In persistent mode, the first call starts the recv-loop workflow and subsequent calls DBOS.send a message into it.

getHandle

Reconnects to an existing session without starting new execution. Returns null if the session does not exist.

typescript

getHandle<TState, TOutput>(
  agent: AgentConfig<z.ZodType<TState>, z.ZodType<TOutput>>,
  sessionId: string,
  options?: Pick<ExecuteOptions, 'usageStore'>
): Promise<AgentExecutionHandle<TOutput> | null>;

resume

Resumes execution after an interrupt or pause. CAS-transitions the session status from interrupted to active, then starts a new DBOS workflow (standard mode) or sends a wake signal (persistent mode).

typescript

resume<TState, TOutput>(
  agent: AgentConfig<z.ZodType<TState>, z.ZodType<TOutput>>,
  sessionId: string,
  options?: ResumeOptions  // mode: 'continue' | 'with_message' | 'with_confirmation' | 'from_checkpoint'
): Promise<AgentExecutionHandle<TOutput>>;

retry

Retries a failed session. Unlike the JS and Temporal runtimes, DBOS does not restore from a checkpoint — it starts a fresh workflow that re-runs the agent loop from message-history scratch (the current checkpointId is forwarded only so onAgentResumed sees the right metadata). State accumulated in the session is preserved; only the LLM call sequence reruns from step 0. If no checkpoint exists (the run failed before its first checkpoint), behavior is identical — the agent loop restarts from the session's message history.

typescript

retry<TState, TOutput>(
  agent: AgentConfig<z.ZodType<TState>, z.ZodType<TOutput>>,
  sessionId: string,
  options?: RetryOptions
): Promise<AgentExecutionHandle<TOutput>>;

submitToolResult

Submits the outcome of a client-executed tool call (or an approval decision) back into the durable agent loop. Routing flows through the shared routeSubmitToolResult helper, which resolves the owning workflow and delivers the payload via DBOS.send(workflowId, payload, toolCallId) — the toolCallId doubles as both the recv topic and the idempotency key.

typescript

submitToolResult(params: SubmitToolResult): Promise<SubmitToolResultResponse>;

SubmitToolResult is a discriminated union on kind:

{ kind: 'client-tool-result', sessionId, toolCallId, result? , error? } — a client tool's output (or error).
{ kind: 'approval-response', sessionId, toolCallId, approvalId, approved, reason? } — an approval-gate decision.

The response status is one of 'accepted', 'already_completed', or 'unknown_tool_call'. Submission is exactly-once: concurrent / retried submits on the same toolCallId are serialized by an OCC-stamped submittedAt marker (first writer returns 'accepted'; losers return 'already_completed'), and the DBOS.send idempotency key is scoped by destination workflow ID. A durable completedClientToolCalls marker makes 'already_completed' survive executor restart and persistent-workflow exit.

Clients always submit against the root sessionId. The framework demuxes toolCallId → owner workflow (for sub-agent-owned tool calls). Submit-side outputSchema validation is currently skipped on DBOS (the registry stores SerializedAgent records without Zod schemas).

describeCapabilities / getWorkspaceRegistry

typescript

describeCapabilities(): { workspaceProviderKinds: readonly string[] }; // returns { workspaceProviderKinds: [] }
getWorkspaceRegistry(): undefined;

runtime-dbos does not support workspaces. describeCapabilities() returns an empty list so AgentServer surfaces a RUNTIME_NO_WORKSPACE_SUPPORT 404 on the workspace HTTP route.

Internal handle callbacks

_interrupt(sessionId, workflowId) and _abort(sessionId, workflowId, mode) are internal methods invoked by handle.interrupt() / handle.abort(). Callers use the handle methods rather than these directly. In persistent mode, interrupt cancels the current turn (the workflow returns to its recv loop and hibernates again); abort kills the entire workflow.

AgentExecutionHandle

Returned by execute(), resume(), retry(), and (when found) getHandle(). Provides the standard handle surface: stream(), result(), getState(), canResume(), resume(options?), retry(options?), abort(reason?), interrupt(reason?), and send(input) for post-completion continuation. Executor callbacks (resume / retry / send) are injected into each handle at construction so they delegate back to the executor without holding a direct reference to it.

Key Exports

typescript

import {
  DBOSAgentExecutor,
  registerDBOSAgentWorkflows,
  registerAgent,
  bindCallLLMStep,
  DbosClientToolResolver,
  ClientToolStep,
  bindClientToolStep,
  resolveOwnerWorkflowId,
  DBOSPersistentSessionTerminatedError,
  DBOSWorkflowNotFoundError,
  DBOSModeMismatchError,
} from '@helix-agents/runtime-dbos';

import type {
  DBOSAgentExecutorConfig,
  DBOSAgentMode,
  ToolRetryPolicy,
  CompletionReason,
} from '@helix-agents/runtime-dbos';

registerDBOSAgentWorkflows

Evaluates the @DBOS.workflow() decorators and wires the sub-agent start-workflow callback. Must be called before DBOS.launch() and before any execute() call. Idempotent — subsequent calls are no-ops. The DBOSAgentExecutor constructor calls it internally; call it yourself at app startup to make the dependency explicit (as shown in Setup).

typescript

function registerDBOSAgentWorkflows(): void;

registerAgent

Populates the process-local registries (tools, agent definitions, live LanguageModel, remote sub-agent transports) for an agent.

typescript

function registerAgent(agent: AgentConfig): void;

Must be called before DBOS.launch() when a process may host a recovered workflow without first going through execute() / resume() / retry() (e.g., a submit-only or recovery-only process). Those executor methods call it internally, so the common path is covered. Idempotent; re-registering the same agent.name with a different live LanguageModel logs a warning and overwrites (last writer wins).

bindCallLLMStep

typescript

function bindCallLLMStep(llmAdapter: LLMAdapter, streamManager?: StreamManager): void;

Wires the LLM adapter (and optionally a stream manager) into the DBOS CallLLMStep static slot. Internal — the DBOSAgentExecutor constructor calls it from your config. Exposed only because e2e test infrastructure needs to swap the adapter between runs. Application code should configure the adapter via the constructor, not by calling this directly.

Client-tool exports

The public client-tool surface is small; applications normally only need these for advanced custom-executor integration or tests.

DbosClientToolResolver — the resolver implementation; register additional resolver behavior in a custom executor.
ClientToolStep / bindClientToolStep — the durable step for pending/ownership writes and its binder.
resolveOwnerWorkflowId(stateStore, ownerSessionId, agentType) — resolves an owner session's live DBOS workflow ID (standard-mode IDs are deterministic (agentType, sessionId, runId) triples; persistent-mode IDs are read from state.metadata.dbosWorkflowId).

Errors

typescript

import {
  DBOSModeMismatchError,
  DBOSWorkflowNotFoundError,
  DBOSPersistentSessionTerminatedError,
} from '@helix-agents/runtime-dbos';

DBOSModeMismatchError ({ sessionId, expected, actual }) — thrown when execute() is called with a mode that conflicts with the mode recorded at session creation (standard vs persistent). A session's mode is fixed at first execute(). Recovery: use the same mode, or start a new session with a different sessionId.
DBOSWorkflowNotFoundError ({ workflowId }) — a DBOS workflow ID expected to exist (based on the Helix state store) cannot be found in the DBOS system database (e.g., the DBOS system DB was reset independently, or retention GC'd the workflow). Recovery: executor.retry() from the last Helix checkpoint.
DBOSPersistentSessionTerminatedError ({ sessionId }) — routing a message to a persistent session failed because its workflow terminated and a concurrent start attempt also failed (CAS conflict). Rare; typically a transient high-concurrency conflict. Recovery: retry execute() once.

See the Error Reference deep-dive for the full catalog.

Modes & Types

typescript

type DBOSAgentMode = 'standard' | 'persistent';

interface ToolRetryPolicy {
  retriesAllowed?: boolean; // default true
  maxAttempts?: number; // default 3
  initialIntervalSeconds?: number; // default 1.0
  backoffCoefficient?: number; // default 2.0
  maximumIntervalSeconds?: number; // default 60.0
}

CompletionReason is re-exported from @helix-agents/core — the user-facing reason a run halted (e.g. 'finish_tool', 'max_steps', 'stop_when', 'idle_timeout', 'shutdown', 'interrupted', 'aborted', 'failed'). The persistent-mode-specific reasons ('idle_timeout', 'shutdown') only arise on the DBOS runtime.

@helix-agents/runtime-dbos ​

Installation ​

DBOSAgentExecutor ​

DBOSAgentExecutorConfig ​

Methods ​

execute ​

getHandle ​

resume ​

retry ​

submitToolResult ​

describeCapabilities / getWorkspaceRegistry ​

Internal handle callbacks ​

AgentExecutionHandle ​

Key Exports ​

registerDBOSAgentWorkflows ​

registerAgent ​

bindCallLLMStep ​

Client-tool exports ​

Errors ​

Modes & Types ​

See Also ​

@helix-agents/runtime-dbos

Installation

DBOSAgentExecutor

DBOSAgentExecutorConfig

Methods

execute

getHandle

resume

retry

submitToolResult

describeCapabilities / getWorkspaceRegistry

Internal handle callbacks

AgentExecutionHandle

Key Exports

registerDBOSAgentWorkflows

registerAgent

bindCallLLMStep

Client-tool exports

Errors

Modes & Types

See Also