Sub-Agent Execution
This document explains how Helix Agents handles sub-agent (child agent) execution and the patterns involved.
Overview
Sub-agents enable:
- Task Delegation - Parent agents delegate specialized tasks
- Composition - Build complex agents from simpler ones
- Isolation - Sub-agents have their own state
- Reusability - Define once, use from multiple parents
Creating Sub-Agent Tools
Using createSubAgentTool
import { createSubAgentTool, defineAgent } from '@helix-agents/core';
import { z } from 'zod';
// First define the sub-agent with an outputSchema
const SummarizerAgent = defineAgent({
name: 'summarizer',
outputSchema: z.object({
summary: z.string(),
keyPoints: z.array(z.string()),
}),
// ... other config
});
// Then create a tool that invokes it
const summarizeTool = createSubAgentTool(
SummarizerAgent, // Full agent config (must have outputSchema)
z.object({
texts: z.array(z.string()),
maxLength: z.number().optional(),
}),
{ description: 'Summarize a list of texts' } // Optional
);Tool Name Convention
Sub-agent tools use the subagent__ prefix internally:
// When LLM calls the tool, it uses 'summarize'
// Internally stored as 'subagent__summarizer'
const SUBAGENT_TOOL_PREFIX = 'subagent__';Detecting Sub-Agent Tools
import { isSubAgentTool, SUBAGENT_TOOL_PREFIX } from '@helix-agents/core';
// Check if a tool call is for a sub-agent
if (isSubAgentTool(toolName)) {
// Extract agent type
const agentType = toolName.slice(SUBAGENT_TOOL_PREFIX.length);
}Execution Flow
1. Parent Agent Makes Tool Call
Parent Agent
│
├── LLM decides to use 'summarize' tool
│
▼
{ type: 'tool_calls', toolCalls: [
{ id: 's1', name: 'summarize', arguments: { texts: [...] } }
]}2. Tool Call is Recognized as Sub-Agent
// In planStepProcessing
const subAgentCalls = toolCalls
.filter((tc) => isSubAgentTool(tc.name))
.map((tc) => ({
id: tc.id,
agentType: tc.name.slice(SUBAGENT_TOOL_PREFIX.length),
input: tc.arguments,
}));3. Sub-Agent is Executed
graph TB
Parent["Parent Agent (paused)"]
Parent --> SubCreate["Sub-Agent Created"]
subgraph SubConfig [" "]
direction LR
C1["sessionId: unique ID"]
C2["streamId: linked to parent"]
C3["parentSessionId: parent's sessionId"]
C4["input: from tool arguments"]
end
SubCreate --> SubConfig
SubConfig --> ExecLoop["Sub-Agent Execution Loop"]
subgraph ExecSteps [" "]
direction TB
E1["Initialize state"]
E2["Call LLM"]
E3["Execute tools"]
E4["Complete with output"]
E1 --> E2 --> E3 --> E4
end
ExecLoop --> ExecSteps4. Result Returned to Parent
Sub-Agent Complete
│
├── Output returned to parent
│
▼
Parent Agent (resumed)
│
└── Receives tool result with sub-agent outputState Isolation
Parent State
interface ParentState {
notes: Note[];
searchCount: number;
}Sub-Agent State
interface SummarizerState {
texts: string[];
processedCount: number;
}Sub-agents cannot directly modify parent state. Communication happens through:
- Input - Data passed when invoking sub-agent
- Output - Structured result returned on completion
Stream Event Flow
subagent_start
Emitted when sub-agent begins:
{
type: 'subagent_start',
subAgentId: 'session-child-abc123', // Sub-agent's sessionId
agentType: 'summarizer',
input: { texts: ['text1', 'text2'] },
parentSessionId: 'session-parent-xyz789', // Parent's sessionId
timestamp: 1702329600000
}Sub-Agent Events
Sub-agent emits its own events (text_delta, tool_start, etc.) with its own agentId (which is its sessionId):
{
type: 'text_delta',
delta: 'Summarizing...',
agentId: 'session-child-abc123', // Sub-agent's sessionId
agentType: 'summarizer',
timestamp: 1702329600100
}subagent_end
Emitted when sub-agent completes:
{
type: 'subagent_end',
subAgentId: 'session-child-abc123', // Sub-agent's sessionId
agentType: 'summarizer',
result: { summary: 'The texts discuss...' },
success: true,
parentSessionId: 'session-parent-xyz789', // Parent's sessionId
timestamp: 1702329601000
}Runtime Implementations
JS Runtime
Sub-agents execute recursively in the same process:
// In JSAgentExecutor
for (const subAgentCall of plan.pendingSubAgentCalls) {
const subAgent = registry.get(subAgentCall.agentType);
// Execute sub-agent (recursive call)
const handle = await this.execute(
subAgent,
{
message: JSON.stringify(subAgentCall.input),
state: subAgentCall.input,
},
{
parentSessionId: state.sessionId, // Parent's session ID (primary key)
}
);
const result = await handle.result();
// Add result to parent's messages
state.messages.push(
createSubAgentResultMessage({
toolCallId: subAgentCall.id,
agentType: subAgentCall.agentType,
result: result.output,
success: result.status === 'completed',
})
);
}Temporal Runtime
Sub-agents run as child workflows:
// In workflow
for (const subAgentCall of plan.pendingSubAgentCalls) {
const subSessionId = generateSubSessionId();
const childResult = await executeChild(agentWorkflow, {
workflowId: subSessionId,
args: [
{
agentType: subAgentCall.agentType,
sessionId: subSessionId,
streamId: parentStreamId, // Share stream
message: JSON.stringify(subAgentCall.input),
parentSessionId: input.sessionId,
},
],
parentClosePolicy: 'ABANDON',
});
// Record result
await activities.recordSubAgentResult({
parentSessionId: input.sessionId,
subAgentCall,
result: childResult,
});
}Cloudflare Runtime
Sub-agents spawn as separate workflow instances:
// In workflow step
for (const subAgentCall of plan.pendingSubAgentCalls) {
const instance = await workflowBinding.create({
id: generateSubSessionId(),
params: {
agentType: subAgentCall.agentType,
message: JSON.stringify(subAgentCall.input),
parentSessionId: input.sessionId, // Parent's session ID (primary key)
},
});
// Wait for completion
const result = await instance.status();
}Message Recording
Assistant Message
Tool calls including sub-agent calls are recorded:
{
role: 'assistant',
content: 'I will summarize these texts.',
toolCalls: [
{ id: 's1', name: 'subagent__summarizer', arguments: { texts: [...] } }
]
}Tool Result Message
Sub-agent result is recorded as a tool result:
{
role: 'tool',
toolCallId: 's1',
toolName: 'subagent__summarizer',
content: '{"summary":"The texts discuss..."}'
}Parallel Sub-Agents
Multiple sub-agents can run in parallel:
// LLM requests multiple sub-agents
{
type: 'tool_calls',
toolCalls: [],
subAgentCalls: [
{ id: 's1', agentType: 'summarizer', input: { texts: batch1 } },
{ id: 's2', agentType: 'summarizer', input: { texts: batch2 } },
{ id: 's3', agentType: 'analyzer', input: { data: {...} } },
]
}The runtime executes them in parallel:
// JS Runtime
const results = await Promise.all(
subAgentCalls.map(call => executeSubAgent(call))
);
// Temporal Runtime
await Promise.all(
subAgentCalls.map(call => executeChild(agentWorkflow, { ... }))
);Interrupt Propagation
When a parent agent is interrupted while sub-agents are running, the interrupt must propagate through the entire hierarchy to ensure responsive cancellation.
The Challenge
Without interrupt propagation, the parent would block on Promise.all() waiting for children to complete:
// Problematic pattern - parent blocked until ALL children finish
const results = await Promise.all(subAgentCalls.map((call) => executeSubAgent(call)));
// Interrupt signal cannot be processed here!This causes unacceptable latency - users might wait 60+ seconds for an interrupt to take effect.
Solution: Racing Pattern
The runtimes use a racing pattern to enable sub-second interrupt response:
// Temporal: Race Promise.all against interrupt trigger
const raceResult = await Promise.race([
Promise.all(childPromises).then((results) => ({ type: 'completed', results })),
interruptTrigger.then((reason) => ({ type: 'interrupted', reason })),
]);
if (raceResult.type === 'interrupted') {
// Signal all running children to stop
for (const childId of runningChildren) {
await getExternalWorkflowHandle(childId).signal(INTERRUPT_SIGNAL_NAME, reason);
}
}Per-Runtime Implementation
JS Runtime
Uses AbortSignal linked between parent and children:
// Parent creates controller linked to its own signal
const batchController = new AbortController();
parentAbortSignal.addEventListener('abort', () => batchController.abort());
// Children receive linked signal
const childHandle = await executor.execute(childAgent, input, {
abortSignal: batchController.signal,
});Temporal Runtime
Uses Trigger primitive + external workflow handles:
// Platform adapter sets up interrupt trigger
const interruptTrigger = new Trigger<string>();
setHandler(interruptSignal, (reason) => {
interruptTrigger.resolve(reason); // Wake up immediately
});
// Workflow uses trigger in race
runAgentWorkflow(input, activities, {
interruptTrigger,
getExternalWorkflowHandle: (id) => getExternalWorkflowHandle(id),
});Cloudflare Runtime
Uses an event-based approach for immediate interrupt response:
// Executor sends both flag and event for immediate wake-up
async interrupt(reason: string) {
// Set flag for persistence
await stateStore.setInterruptFlag(runId, reason);
// Send event for immediate wake-up
await instance.sendEvent({ type: `interrupt-${runId}`, payload: { reason } });
}
// Workflow races completion against interrupt event
const result = await Promise.race([
step.waitForEvent(`sub-agent-complete-${subSessionId}`, { timeout: maxWait })
.then(e => ({ type: 'complete', event: e })),
step.waitForEvent(`interrupt-${runId}`, { timeout: maxWait })
.then(e => ({ type: 'interrupt', reason: e.payload?.reason })),
]);
if (result.type === 'interrupt') {
// Propagate to children
for (const child of pendingChildren) {
await stateStore.setInterruptFlag(child.runId, result.reason);
}
throw new InterruptDetectedError(result.reason);
}The event-based approach provides:
- Immediate response: Interrupt events win the race immediately (< 100ms)
- Pre-spawn check: Interrupts are also checked before spawning sub-agents
- No polling overhead: Pure event-driven detection
Propagation Sequence
graph TB
User["User clicks 'Stop'"]
User --> Flag["Parent interrupt flag set"]
Flag --> Detect["Parent detects interrupt<br/>(immediate via event)"]
Detect --> Child1["Child 1: interrupt flag set +<br/>parent-interrupted event"]
Detect --> Child2["Child 2: interrupt flag set +<br/>parent-interrupted event"]
Detect --> Child3["Child 3: interrupt flag set +<br/>parent-interrupted event"]
Child1 --> Stop["Each child stops at next safe point"]
Child2 --> Stop
Child3 --> Stop
Stop --> Return["Parent returns { status: 'interrupted' }"]Target Latency
| Runtime | Interrupt Detection | Child Signaling | Total |
|---|---|---|---|
| JS | Immediate | Immediate | < 100ms |
| Temporal | Immediate (Trigger) | < 100ms | < 500ms |
| Cloudflare | Immediate (event) | < 100ms | < 200ms |
Lifecycle Hook Guarantees
Sub-agents fire their own lifecycle hooks (onAgentStart, onAgentComplete, onAgentFail) independently from the parent. This is critical for tracing integrations (e.g., Langfuse) that need to emit root spans when a sub-agent completes.
The Problem
Sub-agents share the parent's stream. Without special handling, the sub-agent's onAgentComplete hook would either not fire (if stream finalization is skipped) or would close the parent's stream prematurely.
The Solution: skipStreamClose
All runtimes call endAgentStream for sub-agents with skipStreamClose: true. This fires onAgentComplete without closing the parent's stream:
// Root agent: fire hooks and close stream
await endAgentStream({ sessionId });
// Sub-agent: fire hooks but keep parent's stream open
await endAgentStream({ sessionId: subAgentSessionId, skipStreamClose: true });This applies to both completion paths:
- Normal completion (
__finish__tool) — sub-agent hooks fire withskipStreamClose: true finishWithcompletion — sub-agent hooks fire with bothfinishWithOutputandskipStreamClose: true
Per-Runtime Behavior
| Runtime | Root Agent | Sub-Agent |
|---|---|---|
| JS | endAgentStream({ sessionId }) | endAgentStream({ sessionId, skipStreamClose: true }) |
| Temporal | endAgentStream({ sessionId }) | endAgentStream({ sessionId, skipStreamClose: true }) |
| Cloudflare | endAgentStream({ sessionId }) | endAgentStream({ sessionId, skipStreamClose: true }) |
All three runtimes follow the same pattern, ensuring hook behavior is consistent regardless of where the agent runs.
Error Handling
Sub-Agent Failure
If a sub-agent fails, the result indicates failure:
{
type: 'subagent_end',
subAgentId: 'run-child-123',
agentType: 'summarizer',
success: false,
error: 'Max steps exceeded',
parentSessionId: 'run-parent-456',
timestamp: 1702329601000
}Parent Handling
Parent receives error as tool result:
{
role: 'tool',
toolCallId: 's1',
toolName: 'subagent__summarizer',
content: '{"error":"Max steps exceeded"}'
}The LLM can then decide how to handle the failure.
State Reference Tracking
Parents track sub-agent references:
interface AgentState {
subSessionRefs: SubSessionRef[]; // Note: field is subSessionRefs, not subAgents
}
interface SubSessionRef {
id: string; // Sub-agent's runId
toolCallId: string; // Original tool call ID
agentType: string;
status: 'running' | 'completed' | 'failed';
output?: unknown;
error?: string;
}This allows:
- Querying sub-agent status
- Retrieving sub-agent results
- Cleanup on parent completion
Best Practices
1. Clear Input/Output Contracts
// Define clear schemas for sub-agent communication
const SummarizerInputSchema = z.object({
texts: z.array(z.string()),
maxLength: z.number().optional().default(500),
});
const SummarizerOutputSchema = z.object({
summary: z.string(),
keyPoints: z.array(z.string()),
});2. Meaningful Agent Types
// Good: descriptive type names
agentType: 'code-reviewer';
agentType: 'data-analyzer';
agentType: 'email-composer';
// Bad: generic names
agentType: 'helper';
agentType: 'agent1';3. Limit Nesting Depth
Avoid deep nesting of sub-agents:
Parent
└── Sub-Agent
└── Sub-Sub-Agent // Avoid this level
└── ... // Definitely avoid this4. Handle Failures Gracefully
// In parent's system prompt
'If a sub-agent fails, try to complete the task yourself or report the failure.';Testing
import { MockLLMAdapter } from '@helix-agents/core';
describe('SubAgent', () => {
it('executes sub-agent and returns result', async () => {
const mock = new MockLLMAdapter([
// Parent calls sub-agent
{
type: 'tool_calls',
toolCalls: [],
subAgentCalls: [
{
id: 's1',
agentType: 'summarizer',
input: { texts: ['text1'] },
},
],
},
// Parent finishes with sub-agent result
{
type: 'structured_output',
output: { result: 'Used summary: ...' },
},
]);
// Register both agents
registry.register(ParentAgent);
registry.register(SummarizerAgent);
const result = await executor.execute(ParentAgent, 'Summarize texts');
expect(result.status).toBe('completed');
});
});Remote Sub-Agent Execution
Remote sub-agents (createRemoteSubAgentTool()) follow a different execution path from local sub-agents. Instead of spawning an in-process or child workflow execution, remote sub-agents delegate to agents reachable via a RemoteAgentTransport — typically HttpRemoteAgentTransport for cross-service calls, or DOStubTransport for sibling Durable Object routing.
DO Runtime Transparent Rewriting
In the Cloudflare DO runtime, createSubAgentTool() tools are transparently rewritten to createRemoteSubAgentTool() at execution time when subAgentNamespace is configured. This means the three-way routing below still applies — by the time the executor sees the tools, local sub-agent tools have already been converted to remote sub-agent tools backed by DOStubTransport. See Sub-Agents in the DO Runtime.
Three-Way Tool Call Routing
When the LLM returns tool calls, all three runtimes partition them into three groups:
- Regular tools — Executed via the standard tool execution path
- Local sub-agent calls — Routed to child workflow / in-process execution
- Remote sub-agent calls — Routed to a dedicated
executeRemoteSubAgentCallactivity/step
Detection uses isRemoteSubAgentTool() which checks the _isRemoteSubAgent marker on the tool.
Execution Flow
- Generate deterministic session ID —
{parentSessionId}-remote-{toolCallId}ensures idempotent restarts - Register SubSessionRef — Tracks the remote session with
remote: { streamId, lastSequence }metadata - Call
transport.start()— SendsPOST /startto the remote agent server - Consume
transport.stream()— Reads SSE events, proxies chunks to the parent stream - Handle completion — Updates SubSessionRef status, returns output as tool result
Crash Recovery (Temporal and Cloudflare)
If the runtime crashes mid-execution:
- Check
transport.getStatus()— Determine if the remote agent is still running, completed, or failed - If completed — Return the output directly without re-executing
- If still running — Reconnect to
transport.stream()withfromSequenceto avoid duplicate events - If failed — Return the error
Resume Reconnection (JS Runtime)
The JS runtime's reconcileRemoteSubAgents() handles resume after interrupts:
- Loads SubSessionRefs with
remotemetadata from the state store - For each running remote sub-agent, checks
transport.getStatus() - Reconnects to streams or records completions/failures
- Updates tool result messages in the conversation history
See Also
- Remote Agents Guide — Full guide with setup and patterns
- API Reference — AgentServer and transport API
Persistent Sub-Agent Execution
Persistent sub-agents use a different execution model from ephemeral sub-agents. Instead of being invoked as tool calls that run to completion and return results, persistent children are long-lived agents managed through companion tools.
Companion Tool Architecture
When buildEffectiveTools() processes an agent with persistentAgents, it dynamically generates companion tools:
// In buildEffectiveTools (packages/core/src/orchestration/state-operations.ts)
if (config.persistentAgents && config.persistentAgents.length > 0) {
// Generate companion tools based on configured persistent agents
tools.push(createSpawnAgentTool(config.persistentAgents));
tools.push(createSendMessageTool());
tools.push(createListChildrenTool());
tools.push(createGetChildStatusTool());
tools.push(createTerminateChildTool());
// waitForResult only available if at least one blocking agent exists
if (config.persistentAgents.some((pa) => pa.mode === 'blocking')) {
tools.push(createWaitForResultTool());
}
}Companion Tool Call Routing
Companion tool calls are handled separately from regular tool calls in all runtimes. The three-way tool routing becomes four-way:
- Regular tools -- Standard tool execution
- Local sub-agent calls -- Ephemeral child agent execution
- Remote sub-agent calls -- HTTP-based delegation
- Companion tool calls -- Persistent child management
Detection uses isCompanionTool() which checks the _isCompanionTool marker:
if (isCompanionTool(tool)) {
// Route to companion tool handler
return executeCompanionToolCall(tool, args, sessionState);
}Execution Flow: Blocking Spawn
graph TB
Parent["Parent Agent"]
Parent --> Spawn["companion__spawnAgent called"]
Spawn --> CreateRef["Create SubSessionRef (mode: 'persistent')"]
CreateRef --> CreateChild["Initialize child agent session"]
CreateChild --> RunChild["Execute child agent loop"]
RunChild --> ChildComplete["Child calls __finish__"]
ChildComplete --> UpdateRef["Update SubSessionRef (status: 'completed')"]
UpdateRef --> ReturnResult["Return result to parent"]
ReturnResult --> ParentContinues["Parent continues execution"]Execution Flow: Non-Blocking Spawn
graph TB
Parent["Parent Agent"]
Parent --> Spawn["companion__spawnAgent called"]
Spawn --> CreateRef["Create SubSessionRef (mode: 'persistent')"]
CreateRef --> StartChild["Start child agent (fire-and-forget)"]
StartChild --> ReturnImmediate["Return immediately to parent"]
ReturnImmediate --> ParentContinues["Parent continues execution"]
StartChild --> ChildRuns["Child runs concurrently"]
ChildRuns --> ChildComplete["Child completes later"]
ChildComplete --> UpdateRef["Update SubSessionRef"]SubSessionRef with Persistent Mode
The SubSessionRef interface was extended to support persistent children:
interface SubSessionRef {
subSessionId: string;
agentType: string;
parentToolCallId: string;
status: 'running' | 'completed' | 'failed' | 'terminated';
startedAt: number;
completedAt?: number;
mode: 'ephemeral' | 'persistent'; // Added for persistent sub-agents
name?: string; // Added for persistent sub-agents
}Per-Runtime Implementation
JS Runtime
Companion tool calls are executed via executeCompanionToolCall() in the JS executor. Blocking spawns run the child's full loop inline. Non-blocking spawns use Promise.resolve() to start the child without awaiting.
Temporal Runtime
Companion tool calls are executed via an executeCompanionToolCall activity. Blocking spawns use executeChild() to run a child workflow. The parent workflow waits for the child to complete. After a blocking child completes, the workflow calls markPersistentChildStatus to update the SubSessionRef.
The Temporal runtime does NOT store companion tool results as separate ToolResultMessages in the state store. Results flow through the workflow execution context.
Cloudflare Runtime
Both the Workflows and DO runtimes handle companion tools through their respective execution paths. D1StateStore was updated with a V4 migration to add mode and name columns to the __agents_sub_session_refs table.
State Store Requirements
The persistent sub-agent feature requires state stores to support the mode and name fields on SubSessionRef:
| Store | Support | Notes |
|---|---|---|
| InMemoryStateStore | Yes | Fields stored in-memory |
| RedisStateStore | Yes | Fields serialized with JSON |
| D1StateStore | Yes | V4 migration adds columns |
| DOStateStore | Yes | SQLite schema includes fields |
See Also
- Sub-Agents Guide
- Remote Agents Guide
- Durable Objects Sub-Agents — Transparent sub-agent routing via
DOStubTransport - Step Processing
- Stream Protocol