@helix-agents/runtime-dbos
DBOS Transact runtime for durable agent execution on Postgres. Provides crash recovery, automatic step retries, and durable interrupt/abort. Implements the standard AgentExecutor interface, so the public surface mirrors JSAgentExecutor and TemporalAgentExecutor.
DBOS-native suspension
Unlike the v7 stateless-suspension runtimes (JS, Temporal, Cloudflare) that suspend via a durable-state suspensionContext, runtime-dbos uses DBOS-native DBOS.recv / DBOS.send primitives over Postgres-backed workflow replay. Functionally equivalent for callers, architecturally separate. See the DBOS Runtime guide.
Known limitations
LLMConfig.cache is a no-op on DBOS. Cache strategies are not serializable across the durable step boundary, so prompt-cache annotations are never applied. Setting cache on LLMConfig has no effect.
Memory auto-injection/extraction is unsupported. The memory: config field is accepted at the type level but is never invoked on DBOS. Agents requiring memory recall or storage should run on the JS or Cloudflare runtimes.
Installation
npm install @helix-agents/runtime-dbos @dbos-inc/dbos-sdkDBOSAgentExecutor
The top-level AgentExecutor implementation. The constructor binds every @DBOS.step's static dependencies and calls registerDBOSAgentWorkflows() (idempotent).
import { DBOS } from '@dbos-inc/dbos-sdk';
import { DBOSAgentExecutor, registerDBOSAgentWorkflows } from '@helix-agents/runtime-dbos';
import { PostgresStateStore } from '@helix-agents/store-postgres';
import { RedisStreamManager } from '@helix-agents/store-redis';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
registerDBOSAgentWorkflows(); // before DBOS.launch()
await DBOS.launch();
const executor = new DBOSAgentExecutor({
stateStore: new PostgresStateStore({ connectionString: process.env.DATABASE_URL! }),
streamManager: new RedisStreamManager({ url: process.env.REDIS_URL! }),
llmAdapter: new VercelAIAdapter({ model: openai('gpt-4o') }),
});DBOSAgentExecutorConfig
Constructor configuration. stateStore, streamManager, and llmAdapter are required; all other fields are optional.
interface DBOSAgentExecutorConfig {
/** State store for session data, messages, and checkpoints. `PostgresStateStore` recommended. */
stateStore: SessionStateStore;
/** Stream manager for real-time token streaming. `RedisStreamManager` required in production. */
streamManager: StreamManager;
/** LLM adapter (e.g., `VercelAIAdapter`). */
llmAdapter: LLMAdapter;
/** Optional usage store for token-count tracking. */
usageStore?: UsageStore;
/** Optional hook manager for lifecycle hooks. */
hookManager?: HookManager;
/** Optional logger (compatible with pino, winston, etc.). Defaults to no-op. */
logger?: Logger;
/** Default idle timeout for persistent workflows (ms). Default: 86_400_000 (24h). */
defaultPersistentIdleTimeoutMs?: number;
/** Per-runtime tool retry policy. Defaults match runtime-temporal. */
toolRetryPolicy?: ToolRetryPolicy;
}No registry argument
DBOSAgentExecutor does not take an AgentRegistry. Agents are registered lazily per execute() / resume() / retry() call. Pass stateStore, streamManager, and llmAdapter directly.
The default tool retry policy:
const DEFAULT_TOOL_RETRY_POLICY = {
retriesAllowed: true,
maxAttempts: 3,
initialIntervalSeconds: 1.0,
backoffCoefficient: 2.0,
maximumIntervalSeconds: 60.0,
};Methods
The executor implements the standard AgentExecutor interface plus internal helpers invoked by handles.
execute
Starts a new agent execution (standard mode) or routes a message into an existing persistent session (persistent mode). options.sessionId is required — execute() throws if it is omitted.
execute<TState, TOutput>(
agent: AgentConfig<z.ZodType<TState>, z.ZodType<TOutput>>,
input: AgentInput<TState>,
options?: ExecuteOptions
): Promise<AgentExecutionHandle<TOutput>>;In standard mode, starts a new short-lived DBOS workflow; a second call for an already-running session returns a handle to the existing run rather than duplicating execution. In persistent mode, the first call starts the recv-loop workflow and subsequent calls DBOS.send a message into it.
getHandle
Reconnects to an existing session without starting new execution. Returns null if the session does not exist.
getHandle<TState, TOutput>(
agent: AgentConfig<z.ZodType<TState>, z.ZodType<TOutput>>,
sessionId: string,
options?: Pick<ExecuteOptions, 'usageStore'>
): Promise<AgentExecutionHandle<TOutput> | null>;resume
Resumes execution after an interrupt or pause. CAS-transitions the session status from interrupted to active, then starts a new DBOS workflow (standard mode) or sends a wake signal (persistent mode).
resume<TState, TOutput>(
agent: AgentConfig<z.ZodType<TState>, z.ZodType<TOutput>>,
sessionId: string,
options?: ResumeOptions // mode: 'continue' | 'with_message' | 'with_confirmation' | 'from_checkpoint'
): Promise<AgentExecutionHandle<TOutput>>;retry
Retries a failed session. Unlike the JS and Temporal runtimes, DBOS does not restore from a checkpoint — it starts a fresh workflow that re-runs the agent loop from message-history scratch (the current checkpointId is forwarded only so onAgentResumed sees the right metadata). State accumulated in the session is preserved; only the LLM call sequence reruns from step 0. If no checkpoint exists (the run failed before its first checkpoint), behavior is identical — the agent loop restarts from the session's message history.
retry<TState, TOutput>(
agent: AgentConfig<z.ZodType<TState>, z.ZodType<TOutput>>,
sessionId: string,
options?: RetryOptions
): Promise<AgentExecutionHandle<TOutput>>;submitToolResult
Submits the outcome of a client-executed tool call (or an approval decision) back into the durable agent loop. Routing flows through the shared routeSubmitToolResult helper, which resolves the owning workflow and delivers the payload via DBOS.send(workflowId, payload, toolCallId) — the toolCallId doubles as both the recv topic and the idempotency key.
submitToolResult(params: SubmitToolResult): Promise<SubmitToolResultResponse>;SubmitToolResult is a discriminated union on kind:
{ kind: 'client-tool-result', sessionId, toolCallId, result? , error? }— a client tool's output (or error).{ kind: 'approval-response', sessionId, toolCallId, approvalId, approved, reason? }— an approval-gate decision.
The response status is one of 'accepted', 'already_completed', or 'unknown_tool_call'. Submission is exactly-once: concurrent / retried submits on the same toolCallId are serialized by an OCC-stamped submittedAt marker (first writer returns 'accepted'; losers return 'already_completed'), and the DBOS.send idempotency key is scoped by destination workflow ID. A durable completedClientToolCalls marker makes 'already_completed' survive executor restart and persistent-workflow exit.
Clients always submit against the root
sessionId. The framework demuxestoolCallId→ owner workflow (for sub-agent-owned tool calls). Submit-sideoutputSchemavalidation is currently skipped on DBOS (the registry storesSerializedAgentrecords without Zod schemas).
describeCapabilities / getWorkspaceRegistry
describeCapabilities(): { workspaceProviderKinds: readonly string[] }; // returns { workspaceProviderKinds: [] }
getWorkspaceRegistry(): undefined;runtime-dbos does not support workspaces. describeCapabilities() returns an empty list so AgentServer surfaces a RUNTIME_NO_WORKSPACE_SUPPORT 404 on the workspace HTTP route.
Internal handle callbacks
_interrupt(sessionId, workflowId) and _abort(sessionId, workflowId, mode) are internal methods invoked by handle.interrupt() / handle.abort(). Callers use the handle methods rather than these directly. In persistent mode, interrupt cancels the current turn (the workflow returns to its recv loop and hibernates again); abort kills the entire workflow.
AgentExecutionHandle
Returned by execute(), resume(), retry(), and (when found) getHandle(). Provides the standard handle surface: stream(), result(), getState(), canResume(), resume(options?), retry(options?), abort(reason?), interrupt(reason?), and send(input) for post-completion continuation. Executor callbacks (resume / retry / send) are injected into each handle at construction so they delegate back to the executor without holding a direct reference to it.
Key Exports
import {
DBOSAgentExecutor,
registerDBOSAgentWorkflows,
registerAgent,
bindCallLLMStep,
DbosClientToolResolver,
ClientToolStep,
bindClientToolStep,
resolveOwnerWorkflowId,
DBOSPersistentSessionTerminatedError,
DBOSWorkflowNotFoundError,
DBOSModeMismatchError,
} from '@helix-agents/runtime-dbos';
import type {
DBOSAgentExecutorConfig,
DBOSAgentMode,
ToolRetryPolicy,
CompletionReason,
} from '@helix-agents/runtime-dbos';registerDBOSAgentWorkflows
Evaluates the @DBOS.workflow() decorators and wires the sub-agent start-workflow callback. Must be called before DBOS.launch() and before any execute() call. Idempotent — subsequent calls are no-ops. The DBOSAgentExecutor constructor calls it internally; call it yourself at app startup to make the dependency explicit (as shown in Setup).
function registerDBOSAgentWorkflows(): void;registerAgent
Populates the process-local registries (tools, agent definitions, live LanguageModel, remote sub-agent transports) for an agent.
function registerAgent(agent: AgentConfig): void;Must be called before DBOS.launch() when a process may host a recovered workflow without first going through execute() / resume() / retry() (e.g., a submit-only or recovery-only process). Those executor methods call it internally, so the common path is covered. Idempotent; re-registering the same agent.name with a different live LanguageModel logs a warning and overwrites (last writer wins).
bindCallLLMStep
function bindCallLLMStep(llmAdapter: LLMAdapter, streamManager?: StreamManager): void;Wires the LLM adapter (and optionally a stream manager) into the DBOS CallLLMStep static slot. Internal — the DBOSAgentExecutor constructor calls it from your config. Exposed only because e2e test infrastructure needs to swap the adapter between runs. Application code should configure the adapter via the constructor, not by calling this directly.
Client-tool exports
The public client-tool surface is small; applications normally only need these for advanced custom-executor integration or tests.
DbosClientToolResolver— the resolver implementation; register additional resolver behavior in a custom executor.ClientToolStep/bindClientToolStep— the durable step for pending/ownership writes and its binder.resolveOwnerWorkflowId(stateStore, ownerSessionId, agentType)— resolves an owner session's live DBOS workflow ID (standard-mode IDs are deterministic(agentType, sessionId, runId)triples; persistent-mode IDs are read fromstate.metadata.dbosWorkflowId).
Errors
import {
DBOSModeMismatchError,
DBOSWorkflowNotFoundError,
DBOSPersistentSessionTerminatedError,
} from '@helix-agents/runtime-dbos';DBOSModeMismatchError({ sessionId, expected, actual }) — thrown whenexecute()is called with amodethat conflicts with the mode recorded at session creation (standard vs persistent). A session's mode is fixed at firstexecute(). Recovery: use the same mode, or start a new session with a differentsessionId.DBOSWorkflowNotFoundError({ workflowId }) — a DBOS workflow ID expected to exist (based on the Helix state store) cannot be found in the DBOS system database (e.g., the DBOS system DB was reset independently, or retention GC'd the workflow). Recovery:executor.retry()from the last Helix checkpoint.DBOSPersistentSessionTerminatedError({ sessionId }) — routing a message to a persistent session failed because its workflow terminated and a concurrent start attempt also failed (CAS conflict). Rare; typically a transient high-concurrency conflict. Recovery: retryexecute()once.
See the Error Reference deep-dive for the full catalog.
Modes & Types
type DBOSAgentMode = 'standard' | 'persistent';
interface ToolRetryPolicy {
retriesAllowed?: boolean; // default true
maxAttempts?: number; // default 3
initialIntervalSeconds?: number; // default 1.0
backoffCoefficient?: number; // default 2.0
maximumIntervalSeconds?: number; // default 60.0
}CompletionReason is re-exported from @helix-agents/core — the user-facing reason a run halted (e.g. 'finish_tool', 'max_steps', 'stop_when', 'idle_timeout', 'shutdown', 'interrupted', 'aborted', 'failed'). The persistent-mode-specific reasons ('idle_timeout', 'shutdown') only arise on the DBOS runtime.