Helix Agents Framework Concepts

This document is the canonical reference for Helix Agents framework concepts. It describes current behavior (post-A.2 / A.3); for the v6 → v7 migration delta, see ../upgrade-guides/v6-to-v7-stateless-suspension.md.

For agents: Cross-runtime work (HITL, sub-agents, state store semantics) usually requires fetching this file in addition to the top-level CLAUDE.md. The pointer table in CLAUDE.md lists this file under "framework concepts deep-dive."

Session — The primary unit of agent conversation state. A session contains all messages, custom state, and checkpoints for a conversation. Sessions are identified by sessionId, which is the primary key for all state operations. Multiple runs can occur within a single session (e.g., after interrupts or when continuing a conversation).

Run — A single execution within a session. Each time an agent executes (via execute() or resume()), a new run is created with a unique runId. Runs track execution metadata like turn number, step count, status (running/completed/failed/interrupted), timing, and startSequence (the stream position when the run started, used for filtering chunks in multi-run scenarios). Use getCurrentRun(sessionId) to get the active run and listRuns(sessionId) to see run history. As of v7, AgentResult.status may also be 'suspended_client_tool' | 'suspended_awaiting_children' | 'suspended_step_partial' for HITL agents that paused mid-run; exhaustive switch statements must handle these three additional cases. The result.suspended field carries the routing info (toolCallIds, children, stepId) needed to drive resume.

Agent — Created with defineAgent(). Has a system prompt, tools, output schema (Zod), LLM config, and max steps. The outputSchema auto-injects a __finish__ tool for structured output.

Agent registry replace API. Runtimes that resolve agents BY NAME (Temporal, Cloudflare Workflows) provide a AgentRegistry.replace(config) method for swapping the registered reference — used most commonly in tests that need per-call hooks on the same agent type. JS and DBOS read the agent reference inline and don't need this API. See ../runtimes/temporal.md for the full API description.

Tool — Created with defineTool(). Has a name, description, Zod parameters schema, and execute function. Tools receive a ToolContext with getState(), updateState() (Immer), emit(), and abortSignal.

Tool Execution Order — When the LLM returns multiple tool calls in one response, they execute in two phases: (1) regular tools run in parallel, their state changes become visible (in-memory for JS runtime, committed to store for Temporal/Cloudflare); (2) finishWith tools run sequentially, seeing the updated state from phase 1. This ensures finishWith tools always see the complete state from all other tools in the batch. In the JS runtime, sub-agents execute alongside regular tools in phase 1. In Temporal/Cloudflare, sub-agents execute after phase 2 as child workflows/instances. Sub-agents always execute even when a finishWith tool succeeds — completion is deferred until sub-agents finish. Companion tools have their own phase (existing behavior, unchanged). v7 wraps this same phase logic inside the new runStepIterator (the step-iterator emits per-phase StepOutcomes), but the observable execution order is unchanged.

Sub-Agents (Ephemeral) — Created with createSubAgentTool(). Parent agents can delegate to child agents. The child runs to completion and returns the result as a tool result. Child sessions are isolated but stream events flow to the parent's stream. Each sub-agent gets its own sessionId. This is the default mode (mode: 'ephemeral' on SubSessionRef). Dispatch failures (failed createSession, child workflow start failure, DO instance creation error) surface to the LLM via tool_error chunk + synthetic tool result message + subagent.dispatch_failed ERROR log. Consistent across runtime-js, runtime-temporal, and runtime-cloudflare DO; parent execution continues so siblings in the same dispatch batch are unaffected.

Persistent Sub-Agents — Configured via persistentAgents on AgentConfig. Unlike ephemeral sub-agents, persistent children can receive follow-up messages and maintain state across multiple interactions. Two modes: blocking (parent waits for child to complete) and non-blocking (parent continues immediately, gets completion notification later). Persistent children are managed through auto-injected companion tools, not createSubAgentTool(). Each persistent child gets a stable session ID: {parentSessionId}-agent-{name}. Children can be named explicitly or auto-named ({agentType}-{counter}). State tracking uses SubSessionRef with mode: 'persistent' and name fields. D1StateStore requires V4 migration for the mode and name columns on __agents_sub_session_refs.

Companion Tools — Auto-injected into parent agents that have persistentAgents configured. Up to six tools prefixed with companion__: spawnAgent (create and start a child), sendMessage (interrupt/resume a running child with a new message), listChildren (list all persistent children), getChildStatus (check a specific child's status and output), terminateChild (kill a running child) — these five are always injected. The sixth tool, waitForResult (blocks until a child completes), is conditionally injected only when at least one persistent agent has mode: 'blocking' configured. Companion tools are handled separately from regular tools in the execution loop.

Remote Sub-Agents — Created with createRemoteSubAgentTool(). Parent agents can delegate to agents running on a separate HTTP service. Uses HttpRemoteAgentTransport for communication via HTTP + SSE. The remote service hosts agents using AgentServer from @helix-agents/agent-server. Remote sub-agents are first-class constructs across all runtimes (JS, Temporal, Cloudflare) with stream proxying, SubSessionRef tracking with remote metadata, deterministic session IDs, crash recovery via transport.getStatus(), and interrupt propagation via transport.interrupt(). Each runtime routes remote sub-agent calls through a dedicated execution path separate from regular tool calls.

Stateless Suspension Model (v7) — At every HTTP request boundary, the runtime is free to die. The state store is the only durable thing across requests. When an agent reaches a HITL boundary (client-executed tool, approval-gated tool, sub-agent wait), the runtime writes SessionState.suspensionContext and emits the relevant stream chunk, then returns from the run. There is no in-memory waiter, no setTimeout promise, no DO hibernation guard. Resumption is driven by executor.resume() reading durable suspension context — never by signaling an in-memory waiter. This unblocks long pauses (~80% wall-time reduction on multi-minute HITL waits) and makes deadline semantics deterministic (deadlines measure durable clock time, not in-memory promise lifetime). v7.0 ships this model on all 4 runtimes: JS, Cloudflare DO, Temporal, and Cloudflare Workflows. The runtime-dbos runtime added by main uses its own DBOS-native DBOS.recv/DBOS.send primitives for HITL suspension (Postgres-backed workflow replay) and is not part of the unified suspensionContext model — see the runtime-dbos package docs for its specifics.

Client-Executed Tools (v7) — Created with defineTool({ execute: 'client' }). When the LLM calls such a tool, the runtime writes a pending entry to the session's pendingClientToolCalls map (durable state), emits a tool_start stream chunk, and returns from the run with RunOutcome.kind = 'suspended_client_tool' (which surfaces as AgentResult.status = 'suspended_client_tool'). The runtime does not block in-memory. Consumers call executor.submitToolResult({ kind: 'client-tool-result', toolCallId, result | error }) (or the 'approval-response' variant for approval-gated tools — both share the same SubmitToolResult union); the submission writes the result to durable state and triggers a fresh run via executor.resume(). The canonical cross-runtime signal for "awaiting client submission" is pendingClientToolCalls map presence; session-level SessionStatus remains 'active'. Mixing execute: 'client' with finishWith: true is rejected at defineTool time, as is mixing execute: 'client' with requireApproval. The framework maintains SessionState.clientToolCallOwnership on root sessions to route submissions to the owning sub-agent; SessionState.rootSessionId points each sub-agent at its root. Submissions always go against the ROOT sessionId. Per-runtime status:

runtime-js: durable state writes + executor.resume() on submission. No in-memory promise map. Process restart is safe — pending entries are recovered from the state store on the next request that touches the session. The runLoop polls stateStore.checkInterruptFlag (atomic check-and-clear) at the top of each step iteration, so durable interrupts written by other processes are observed immediately. This brings JS to parity with CF DO and CFW Workflows on cross-process interrupt semantics.
runtime-cloudflare (DO path): durable state writes via DOStateStore. The hibernation guard is removed in v7; DOs are free to evict during HITL waits. Deadlines are enforced at request time via findExpiredPending (no alarm subscriber).
runtime-temporal: Workflow exits cleanly on every HITL boundary (mirrors CFW Workflows). Suspension state is durable in the session store; the workflow returns AgentWorkflowResult { status: 'suspended_*' } and Temporal releases the workflow. executor.resume(sessionId) starts a NEW workflow instance with workflow ID ${prefix}__${agentType}__${sessionId}__resume-${N} (single-dash suffix; WorkflowIdReusePolicy.ALLOW_DUPLICATE). The new workflow's mode='resume' branch calls the applyResultsAndReload activity, which drains submitted client-tool results into messages, fires onMessage + afterTool hooks, synthesizes timeouts for expired deadlines, and drains completed sub-agent children via recordSubSessionResult. submitToolResult is durable-only — no Temporal signal is sent (the workflow has already exited). Sub-agents are child workflows started via wf.startChild; on parent suspension, in-flight children are marked failed:'parent_suspended' (mitigation #3) and re-spawned via the __resume-N workflow ID convention on parent's resume (γ-cascade, spec §5.2). Approval-gated tools share the same suspension primitive (tool_approval_request chunk + 'suspended_client_tool' status); approve / deny submissions are routed via the same durable submitToolResult flow.
runtime-cloudflare (Workflow path): Workflow returns early from runAgentWorkflow on HITL boundaries with status: 'suspended_*' and durable suspension state via commitSuspendedStep activity. executor.resume() starts a new workflow instance with mode: 'resume' that drains submissions via applyResultsAndReload and continues. Sub-agents cascade up — child suspensions propagate to parent's suspended_awaiting_children. γ-cascade re-spawn on parent resume (FU-A2-40, mirrors Temporal FU-A2-09): when the parent suspends mid-sub-agent dispatch, commitSuspendedStep marks each suspendedAwaitingChildren entry's child session as failed:'parent_suspended'. On resume, applyResultsAndReload surfaces those children via childrenToRespawn; the workflow body re-dispatches each via workflowBinding.create({ id: 'agent__<type>__<id>__respawn-<attempt>' }), polls the child's durable state until terminal, and records the outcome via recordSubSessionResult. A drain-clear step then resets the parent's suspension discriminators when fully resolved. Eliminates v6's billable wall-time during HITL waits (~80% reduction on multi-minute approvals).

Hook firing on the timeout path is consistent across all 4 runtimes. When a client-tool deadline elapses, every runtime appends a synthetic tool_error message, emits a tool_end chunk, records usage with success: false, and fires onMessage + afterTool hooks with the timeout payload. CFW Workflows is the reference implementation; runtime-js (and CF DO via runtime-js) and runtime-temporal were brought to parity in commit 799aeea77.

Lifecycle hook firing order is canonical across all 4 stateless-suspension runtimes; DBOS fires SAME sequence with one timing caveat. For the regular (and approval-gated approve) tool execution path, every runtime fires user-facing AgentHooks in the canonical sequence:

beforeTool → execute → onStateChange → onMessage → afterTool

onStateChange reflects the immediate state mutation from execute; onMessage surfaces the result-as-message; afterTool is universal cleanup with the full result payload. Pre-2026-05-02 each of runtime-js, runtime-temporal, and CFW Workflows fired in a different order (runtime-js fired onMessage AFTER afterTool; runtime-temporal fired onMessage BEFORE onStateChange); sub-projects #2 + #3 unified the order so portable hook code can rely on cross-runtime sequencing. Per-runtime regression guards live in:

packages/runtime-js/src/__tests__/js-agent-executor-hooks.test.ts (regular path) and approve-path-hooks.test.ts (approve drain path)
packages/runtime-temporal/src/__tests__/v7-activities-hooks.test.ts (regular path) and v7-approve-path-hooks.test.ts (approve drain path)
packages/runtime-cloudflare/src/__tests__/approve-path-hooks-do.test.ts (DO approve drain path)
packages/e2e/src/__tests__/approval-gate-hook-parity.integ.test.ts (cross-backend)

Implementation note: runtime-js executes phase-1 tools in PARALLEL via Promise.all. To preserve LLM-input ordering of state.messages while maintaining the canonical hook order, runServerTool (in packages/runtime-js/src/run-loop.ts) defers afterTool firing back to the iterator's collection loop in packages/core/src/orchestration/step-iterator.ts via ExecuteServerToolResult.deferredAfterTool. The iterator pushes the message → fires onMessage → fires the deferred afterTool per result, in input order. runtime-temporal and CFW Workflows execute tools sequentially so they fire all hooks inline inside their executeServerToolWithHooks / per-tool helpers without needing the deferred-payload indirection.

DBOS hook firing order — known divergence (per second-round review #5/#8 finding P1.3). DBOS fires the same canonical hook SEQUENCE (beforeTool → execute → onStateChange → onMessage → afterTool) but with a timing caveat: hooks fire BEFORE the enclosing Promise.all-driven sub-agent dispatch resolves, while JS / Temporal / CF fire them AFTER the dispatch settles. Tracing pipelines that assume the post-Promise.all timing observe DBOS as an outlier. Functional behavior is identical (same payloads, same order); only the wall-clock arrival time within a step differs. This is consequence of DBOS's inline DBOS.recv() model — see ../upgrade-guides/v6-to-v7-stateless-suspension.md "DBOS divergence" for the architectural rationale.

Hook firing parity table:

Runtime	beforeTool	onStateChange	onMessage	afterTool	Promise.all timing
runtime-js	✓	✓	✓	✓ deferred	After Promise.all
runtime-temporal	✓	✓	✓	✓ inline	After Promise.all
CFW DO (via `runtime-js`)	✓	✓	✓	✓ deferred	After Promise.all
CFW Workflows	✓	✓	✓	✓ inline	After Promise.all
runtime-dbos	✓	✓	✓	✓ inline	Before Promise.all ⚠️

onAgentSuspended / onAgentResumed:

Runtime	client_tool reason	awaiting_children reason	Notes
runtime-js	✓	✓	Reference impl
runtime-temporal	✓	✓	Matches reference
CFW DO	✓	✓	Matches reference
CFW Workflows	✓	✓	Matches reference
runtime-dbos	✓	✓ (post v7.0-final)	Earlier v7 versions skipped `awaiting_children` due to a guard bug; fixed per second-round review P1.1

Hook customState reconcile after preStep capture (commit 8654a2686) — Hooks that fire AFTER the iterator captures preStepCustomState (assistant onMessage, phase-1/phase-2 beforeTool/afterTool) call hookContext.updateState, which mutates nextState.customState in-place via Immer. They do NOT contribute to the staging-changes pipeline that commitStep promotes. After commitStep returns, the iterator reconciles by writing the freshest nextState.customState back to the store as a follow-up saveState IF it differs from the just-committed value. Hook-less steps still do exactly one durable write per step; steps that mutate via post-snapshot hooks do two (commit + reconcile). Best-effort: a follow-up save failure logs a warning but doesn't fail the step.

InMemoryStreamManager cursor rebase on cleanup (commit 65eaaf235) — InMemoryStreamManager tracks reader sequences as literal indices into the chunks array. cleanupToStep filters orphan-step chunks, shifting surviving chunks earlier. Active readers' currentSequence pointers are now rebased in cleanupToStep (via a Set<ReaderCursor> on the stream) so they observe chunks emitted just before cleanup (e.g. the run_interrupted boundary marker the runtime-js soft-interrupt path emits). Active createReader / createResumableReader consumers now correctly observe chunks that survived a cleanup — previously these were silently invisible to the reader.

D1 saveStateAndPromoteStaging atomicity (commit 7509872e3) — D1StateStore.saveStateAndPromoteStaging now correctly surfaces concurrent-CAS losses as StaleStateError (was sometimes D1StateError, breaking retry paths that gate on instanceof StaleStateError). UNIQUE-constraint violations on __agents_messages.(session_id, sequence) are caught and re-thrown as StaleStateError. The trailing DELETE on __agents_staging runs OUTSIDE the atomic batch — only after the version-pinned UPDATE confirms changes==1. Observable: rare network failures between successful main commit and staging DELETE leave a stale staging row that's harmless and self-healing (next stageChanges upserts the same key).

Driving the agent loop after submitToolResult: In v7, submitToolResult is a durable write only — it does NOT auto-resume the agent loop. After submission, consumers continue the loop in one of two ways:

Use the framework's chat plumbing — handleChatStream (server) and useChat + useResumeClientTools (React) drive the resume internally. This is the recommended path for typical web app deployments.
Call executor.resume(agent, sessionId) explicitly — returns a new handle observing the resumed run. Use this when calling the executor directly (custom server, integration tests). The pattern: await executor.execute(...) returns 'suspended_*' → await executor.submitToolResult(...) writes durable result → const newHandle = await executor.resume(...) drives the loop forward → await newHandle.result() resolves with 'completed' (or whatever terminal state the resumed run reaches).

This separation is by-design for v7 stateless purity — submission and resumption can happen in different processes, with no in-memory bridge between them.

Structured Logger events: client_tool.suspended, client_tool.submitted, client_tool.timeout, client_tool.aborted, client_tool.ownership_write_failed/ownership_clear_failed/ownership_retry, client_tool.validation_failed. Records CLIENT_TOOL_WAIT_MS_METRIC per call (now measured from durable suspension write to durable submission write); aggregate via __agents_usage (Postgres/D1 column source_type = 'client_tool_wait_ms', with per-tool breakdown via source_name). See ../guide/client-executed-tools.md for the full guide and ../upgrade-guides/v6-to-v7-stateless-suspension.md for v6→v7 migration steps including the operator runbook for force-failing stuck calls.

Approval-Gated Tools (v7) — First-class HITL primitive on defineTool: pass requireApproval: true (always require approval) or requireApproval: (input, ctx) => boolean (function form, evaluated per-call). When the gate matches, the runtime emits a tool_approval_request stream chunk with the parsed input and suspends with 'suspended_client_tool' (the same primitive carries both client-tool and approval flows; routing happens off the kind field of the submission, not the stream-chunk type). Resume by calling executor.submitToolResult({ kind: 'approval-response', toolCallId, approved, reason? }). On approved: true, the original execute() runs normally with the original input. On approved: false, the runtime emits tool_error ('Tool call X was not approved by the user') and skips execute() entirely. The function form fails-closed: an exception inside the evaluator is treated as requireApproval = true (matches the Mastra precedent — fail safe by requiring approval rather than silently bypassing). requireApproval is mutually exclusive with execute: 'client' and finishWith: true; both combinations are rejected at defineTool time. All 4 runtimes support approval flows on the v7 stateless model: the runtime suspends durably, and submitToolResult({ kind: 'approval-response', ... }) triggers a fresh run via executor.resume() (or via the framework's chat plumbing). CFW Workflows now uses the same v7 stateless model — workflow exits on approval-gate match; resume drains the approve/deny submission via applyResultsAndReload.

Agent Server — The @helix-agents/agent-server package provides AgentServer for hosting agents over HTTP. Accepts any AgentExecutor implementation and exposes the following routes:

Executor routes (always wired): POST /start, POST /resume, GET /sse, GET /status, POST /interrupt, POST /abort, POST /submit-tool-result, GET /workspaces.
Chat handler routes (wired when chatHandler is configured): POST /chat, GET /chat/{sessionId}/stream, POST /chat/{sessionId}/submit-tool-result, POST /chat/{sessionId}/interrupt, POST /chat/{sessionId}/abort. These layer on top of the executor and provide the canonical chat-style flow used by useChat + useResumeClientTools.

Transport adapters: createHttpAdapter() (generic), createExpressAdapter() (Express). Tracks active execution handles in memory for interrupt/abort — these only work on the same server instance that started execution (handles are lost on restart; sessions remain recoverable via resume). Fail-closed auth: the constructor throws if neither authenticate hook nor explicit allowUnauthenticated: true is configured.

v7 removed the v6 INTERRUPT_NOT_LOCAL 503 — interrupts are now durable writes (via stateStore.setInterruptFlag) picked up by the runLoop at the next checkpoint, regardless of which process holds the in-memory handle. HTTP clients no longer need to retry against the "owning" server.

Runtime — Executes the agent loop. JSAgentExecutor runs in-process, TemporalAgentExecutor uses Temporal workflows for durability, DBOSAgentExecutor uses DBOS Transact (Postgres-backed workflow replay) for durability and supports the same client-executed-tools surface as Temporal and Cloudflare.

Workspace and HITL runtime support — Two orthogonal capabilities, with overlapping (but not identical) runtime support:

Workspaces (agent.workspaces): runs on JS runtime and Cloudflare Durable Object runtime (via @helix-agents/agent-server). Temporal, CF Workflows, and DBOS do not support workspaces (Temporal and CF Workflows fail-fast at run-start; DBOS silently passes workspaces: undefined).
HITL (client-executed tools, requireApproval): runs on all 5 runtimes — JS, Cloudflare Durable Object, Cloudflare Workflows, Temporal, and DBOS. The first 4 use the v7 stateless suspension model (durable-state-only suspension via SessionState.suspensionContext); DBOS uses its own DBOS-native DBOS.recv / DBOS.send primitives over Postgres-backed workflow replay (functionally equivalent but architecturally separate from the unified suspensionContext model).

Runtime	Workspaces	HITL	Notes
JS (`runtime-js`)	Full	Full (v7 stateless)	All providers; recommended for dev + non-DO production.
CF Durable Objects (via `agent-server`)	Full	Full (v7 stateless)	All providers; recommended for CF production.
Cloudflare Workflows (`runtime-cloudflare/src/workflow.ts`)	Fail-fast	Full (v7 stateless)	All providers; recommended for Cloudflare Workflows production. Workspaces remain unsupported (run-start fail-fast).
Temporal (`runtime-temporal`)	Fail-fast	Full (v7 stateless)	All providers; recommended for Temporal-backed production. Workspaces remain unsupported (run-start fail-fast).
DBOS (`runtime-dbos`)	Unsupported	Full (DBOS-native)	Postgres-backed; recommended when consumers already use DBOS Transact. Workspaces silently unsupported (no fail-fast guard yet).

State Store — Persists session state (messages, custom state, checkpoints). Uses SessionStateStore interface with sessionId as the primary key. Implementations: InMemoryStateStore for dev, RedisStateStore for prod, PostgresStateStore for prod (works across all runtimes including Cloudflare Workers via Neon/Hyperdrive). All implementations guarantee atomic createSession() — concurrent calls for the same sessionId result in exactly one winner (others throw). This is the foundation for preventing duplicate execution across all runtimes.

Stream Manager — Handles real-time streaming of agent events. Implementations: InMemoryStreamManager, RedisStreamManager.

LLM Adapter — Abstracts the LLM provider. VercelAIAdapter wraps the Vercel AI SDK. MockLLMAdapter for testing.

Checkpoint — Complete state snapshot saved after each step. Enables crash recovery, time-travel, and branching. Checkpoints are scoped to a session.

Lock Manager — Distributed coordination interface. Prevents concurrent execution of the same agent across processes. Implementations: NoOpLockManager, InMemoryLockManager, RedisLockManager, PostgresLockManager, DurableObjectLockManager.

Logger — All SDK components accept an optional Logger interface (info, warn, error, debug? methods) defined in core/src/types/logger.ts. Defaults to noopLogger (silent). Use consoleLogger for development. Compatible with pino, winston, and other structured logging libraries. Configured via constructor options on executors, state stores, adapters, and tracing hooks. Zero bare console.* calls exist in production source files — all logging goes through Logger.

Tracing — @helix-agents/tracing-langfuse is the supported tracing adapter. As of v7, it seeds the Langfuse trace ID from sessionId (not runId) so that a single conversational session — which spans many runs once HITL boundaries are involved — appears as a single trace in the Langfuse UI. New onAgentResumed and onAgentSuspended hook handlers emit matching event spans inside the session-scoped trace, so you can visually see where the run paused and where it resumed. The legacy core/tracing/tracing-hooks.ts adapter is HITL-incompatible: it relies on an in-memory tracingStateMap that the stateless-suspension model cannot populate across process restarts, and v7 fail-fasts when requireApproval or client-executed tools are run with the legacy adapter. Use @helix-agents/tracing-langfuse (or implement the v7 hook interface in your own adapter) before upgrading.

Embedding Executor — Controls how vector embeddings are computed after a memory is saved. InlineEmbeddingExecutor (default) computes synchronously. BackgroundEmbeddingExecutor fires-and-forgets with maxConcurrency limit and shutdown() drain. Memories are saved with embeddingStatus: 'pending' and immediately FTS-searchable; the executor updates them to 'complete' once the embedding is computed. MemoryManager.processUnembeddedMemories() recovers orphaned pending memories (e.g., after embedding service failures). Configured via embeddingExecutor on MemoryConfig.

Helix Agents Framework Concepts ​

Helix Agents Framework Concepts