Temporal Runtime
The Temporal runtime (@helix-agents/runtime-temporal) executes agents as durable Temporal workflows. This provides crash recovery, automatic retries, and production-grade reliability for long-running agent tasks.
When to Use
Good fit:
- Production workloads requiring reliability
- Long-running agents (hours or days)
- Agents that must survive process restarts
- Complex multi-agent orchestrations
- Operations requiring audit trails and observability
Not ideal for:
- Quick development iteration (infrastructure overhead)
- Simple, short-lived agents
- Cost-sensitive deployments without existing Temporal infrastructure
Prerequisites
You need a running Temporal server:
Option 1: Temporal Cloud (Recommended for production)
# Sign up at https://temporal.io/cloudOption 2: Local development
# Using Docker
docker run -d --name temporal \
-p 7233:7233 -p 8233:8233 \
temporalio/auto-setup:latest
# Or using Temporal CLI
temporal server start-devInstallation
npm install @helix-agents/runtime-temporal @helix-agents/store-redis @temporalio/client @temporalio/workerArchitecture
┌──────────────────────────────────────────────────────────────┐
│ Your Application │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ TemporalAgentExecutor │ │
│ │ - Starts workflows │ │
│ │ - Returns handles │ │
│ │ - Reads from StateStore/StreamManager │ │
│ └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Temporal Server │
│ - Persists workflow state │
│ - Manages task queues │
│ - Handles retries and timeouts │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Temporal Worker │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Agent Workflow │ │
│ │ - Orchestrates execution │ │
│ │ - Calls activities for LLM/tools │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Activities │ │
│ │ - LLM calls │ │
│ │ - Tool execution │ │
│ │ - State persistence │ │
│ └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘Setup Guide
1. Create the Workflow
Define a workflow that wraps the agent execution:
// src/workflows/agent-workflow.ts
import { proxyActivities, defineSignal, setHandler } from '@temporalio/workflow';
import type { AgentWorkflowInput, AgentWorkflowResult } from '@helix-agents/runtime-temporal';
import type * as activities from '../activities';
// Proxy activities with timeouts
const { executeAgentStep, saveState, loadState } = proxyActivities<typeof activities>({
startToCloseTimeout: '5 minutes',
retry: {
maximumAttempts: 3,
backoffCoefficient: 2,
},
});
// Abort signal
export const abortSignal = defineSignal('abort');
export async function agentWorkflow(input: AgentWorkflowInput): Promise<AgentWorkflowResult> {
let aborted = false;
setHandler(abortSignal, () => {
aborted = true;
});
try {
// Load or initialize state
let state = await loadState(input.runId);
if (!state) {
state = await initializeState(input);
}
// Main execution loop
while (state.status === 'running' && !aborted) {
const stepResult = await executeAgentStep(input.agentType, state);
state = await processStepResult(state, stepResult);
await saveState(state);
}
if (aborted) {
return { status: 'failed', error: 'Workflow aborted' };
}
return {
status: state.status === 'completed' ? 'completed' : 'failed',
output: state.output,
error: state.error,
};
} catch (error) {
return {
status: 'failed',
error: error instanceof Error ? error.message : String(error),
};
}
}2. Create Activities
Activities perform the actual work (LLM calls, tool execution):
// src/activities/agent-activities.ts
import { AgentRegistry, type AgentState } from '@helix-agents/runtime-temporal';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { RedisStateStore, RedisStreamManager } from '@helix-agents/store-redis';
const stateStore = new RedisStateStore(redisClient);
const streamManager = new RedisStreamManager(redisClient);
const llmAdapter = new VercelAIAdapter();
const registry = new AgentRegistry();
// Register your agents
registry.register(ResearchAgent);
registry.register(AnalyzerAgent);
export async function loadState(runId: string): Promise<AgentState<unknown, unknown> | null> {
return stateStore.load(runId);
}
export async function saveState(state: AgentState<unknown, unknown>): Promise<void> {
await stateStore.save(state);
}
export async function executeAgentStep(
agentType: string,
state: AgentState<unknown, unknown>
): Promise<StepResult<unknown>> {
const agent = registry.get(agentType);
if (!agent) {
throw new Error(`Unknown agent type: ${agentType}`);
}
// Execute one LLM step
return executeStep(agent, state, llmAdapter, streamManager);
}3. Create the Worker
The worker processes workflows and activities:
// src/worker.ts
import { Worker, NativeConnection } from '@temporalio/worker';
import * as activities from './activities';
async function runWorker() {
const connection = await NativeConnection.connect({
address: process.env.TEMPORAL_ADDRESS ?? 'localhost:7233',
});
const worker = await Worker.create({
connection,
namespace: 'default',
taskQueue: 'agent-tasks',
workflowsPath: require.resolve('./workflows'),
activities,
});
await worker.run();
}
runWorker().catch(console.error);4. Create the Executor
The executor starts workflows and returns handles:
// src/executor.ts
import { Client, Connection } from '@temporalio/client';
import { TemporalAgentExecutor } from '@helix-agents/runtime-temporal';
import { RedisStateStore, RedisStreamManager } from '@helix-agents/store-redis';
async function createExecutor() {
const connection = await Connection.connect({
address: process.env.TEMPORAL_ADDRESS ?? 'localhost:7233',
});
const client = new Client({ connection });
// Wrap Temporal client to match interface
const temporalClientAdapter = {
startWorkflow: async (name, options) => {
const handle = await client.workflow.start(name, {
workflowId: options.workflowId,
taskQueue: options.taskQueue,
args: options.args,
});
return wrapHandle(handle);
},
getHandle: (workflowId) => {
return wrapHandle(client.workflow.getHandle(workflowId));
},
};
return new TemporalAgentExecutor({
client: temporalClientAdapter,
stateStore: new RedisStateStore(redis),
streamManager: new RedisStreamManager(redis),
workflowName: 'agentWorkflow',
taskQueue: 'agent-tasks',
});
}Using the Executor
Once set up, usage is identical to other runtimes:
const executor = await createExecutor();
// Execute agent
const handle = await executor.execute(ResearchAgent, 'Research quantum computing');
// Stream events
const stream = await handle.stream();
for await (const chunk of stream) {
console.log(chunk);
}
// Get result
const result = await handle.result();Agent Registry
Register agents so the worker can instantiate them:
import { AgentRegistry } from '@helix-agents/runtime-temporal';
const registry = new AgentRegistry();
// Register each agent type
registry.register(ResearchAgent); // name: 'researcher'
registry.register(AnalyzerAgent); // name: 'analyzer'
registry.register(SummarizerAgent); // name: 'summarizer'
// In activities, look up by type
export async function executeAgentStep(agentType: string, state) {
const agent = registry.get(agentType); // Returns the agent config
// ...
}Sub-Agent Handling
Sub-agents execute as child workflows:
// In workflow
import { executeChild } from '@temporalio/workflow';
// When parent needs to execute sub-agent
const subAgentResult = await executeChild('agentWorkflow', {
args: [
{
agentType: 'analyzer',
runId: `${parentRunId}-sub-${callId}`,
streamId: parentStreamId, // Same stream for unified streaming
message: inputMessage,
parentAgentId: parentRunId,
},
],
workflowId: `agent__analyzer__${subRunId}`,
taskQueue: 'agent-tasks',
});Benefits of child workflows:
- Independent retry policies
- Separate timeouts
- Can be cancelled independently
- Full workflow history preserved
Activity Configuration
Configure timeouts and retries per activity:
const { executeAgentStep } = proxyActivities<typeof activities>({
// How long the activity can run
startToCloseTimeout: '10 minutes',
// How long to wait for worker to start processing
scheduleToStartTimeout: '1 minute',
// Heartbeat timeout for long activities
heartbeatTimeout: '30 seconds',
// Retry configuration
retry: {
initialInterval: '1 second',
backoffCoefficient: 2,
maximumInterval: '1 minute',
maximumAttempts: 5,
nonRetryableErrorTypes: ['InvalidAgentError'],
},
});Crash Recovery
Temporal provides automatic crash recovery:
Worker 1 starts workflow
│
├── Step 1 completes, state saved
├── Step 2 completes, state saved
│
└── Worker 1 crashes
│
▼
Temporal detects failure
│
├── Workflow task rescheduled
│
▼
Worker 2 picks up
│
├── Replays history (deterministic)
├── Continues from Step 3
└── Completes normallyKey points:
- Workflow code must be deterministic
- State is reconstructed from event history
- Activities are not re-executed (results cached)
Determinism Requirements
Workflow code must be deterministic:
// BAD - Non-deterministic
export async function agentWorkflow(input) {
const timestamp = Date.now(); // Different on replay!
const random = Math.random(); // Different on replay!
const uuid = crypto.randomUUID(); // Different on replay!
}
// GOOD - Use Temporal APIs
import { sleep, uuid4, workflowInfo } from '@temporalio/workflow';
export async function agentWorkflow(input) {
const info = workflowInfo();
const timestamp = info.startTime; // Deterministic
const id = uuid4(); // Deterministic (seeded)
await sleep('5 seconds'); // Deterministic timer
}Move non-deterministic operations to activities:
- LLM API calls
- Database queries
- External API calls
- Random number generation
Observability
Temporal Web UI
Access at http://localhost:8233 (local) or via Temporal Cloud.
View:
- Workflow history and events
- Activity execution details
- Pending/failed workflows
- Search by workflow ID or type
Workflow Queries
Query running workflows:
// In workflow
import { defineQuery, setHandler } from '@temporalio/workflow';
export const getProgressQuery = defineQuery<{ stepCount: number; status: string }>('getProgress');
export async function agentWorkflow(input) {
let progress = { stepCount: 0, status: 'running' };
setHandler(getProgressQuery, () => progress);
// Update progress during execution
progress.stepCount++;
// ...
}
// From client
const handle = client.workflow.getHandle(workflowId);
const progress = await handle.query(getProgressQuery);Production Deployment
Worker Scaling
Run multiple workers for throughput:
# Scale horizontally
docker-compose scale worker=5Workers pull from the same task queue - Temporal handles distribution.
Temporal Cloud
For production, use Temporal Cloud:
import { Connection, Client } from '@temporalio/client';
const connection = await Connection.connect({
address: 'your-namespace.tmprl.cloud:7233',
tls: {
clientCertPair: {
crt: fs.readFileSync('client.pem'),
key: fs.readFileSync('client.key'),
},
},
});Monitoring
Set up metrics:
import { Runtime } from '@temporalio/worker';
Runtime.install({
telemetryOptions: {
metrics: {
prometheus: { bindAddress: '0.0.0.0:9464' },
},
},
});Limitations
Higher Latency
Each activity invocation adds network overhead. Batch operations when possible.
Determinism Constraints
Workflow code restrictions can be challenging. Move all I/O to activities.
Infrastructure Overhead
Requires running Temporal server and workers alongside your application.
Learning Curve
Temporal concepts (workflows, activities, replay, determinism) require understanding.
Best Practices
1. Keep Workflows Thin
Workflow code should only orchestrate - move logic to activities:
// Workflow just coordinates
export async function agentWorkflow(input) {
const state = await loadState(input.runId);
const result = await executeStep(state); // Activity does the work
await saveState(state);
}2. Appropriate Timeouts
Set realistic timeouts:
const { executeAgentStep } = proxyActivities<typeof activities>({
startToCloseTimeout: '5 minutes', // LLM calls can be slow
});3. Heartbeat Long Activities
For activities > 30 seconds, implement heartbeating:
export async function executeLongTool(input: ToolInput): Promise<ToolResult> {
for (const item of items) {
await processItem(item);
Context.current().heartbeat(); // Report progress
}
return result;
}4. Use Continue-As-New for Long Histories
Workflows with many events should reset:
import { continueAsNew, workflowInfo } from '@temporalio/workflow';
export async function agentWorkflow(input) {
const info = workflowInfo();
// After many steps, continue as new to reset history
if (info.historyLength > 10000) {
await continueAsNew<typeof agentWorkflow>(input);
}
// ...
}Next Steps
- JavaScript Runtime - Simpler option for development
- Cloudflare Runtime - Edge deployment alternative
- Storage: Redis - Recommended store for Temporal