Temporal Runtime

The Temporal runtime (@helix-agents/runtime-temporal) executes agents as durable Temporal workflows. This provides crash recovery, automatic retries, and production-grade reliability for long-running agent tasks.

When to Use

Good fit:

Production workloads requiring reliability
Long-running agents (hours or days)
Agents that must survive process restarts
Complex multi-agent orchestrations
Operations requiring audit trails and observability

Not ideal for:

Quick development iteration (infrastructure overhead)
Simple, short-lived agents
Cost-sensitive deployments without existing Temporal infrastructure

Prerequisites

You need a running Temporal server:

Option 1: Temporal Cloud (Recommended for production)

bash

# Sign up at https://temporal.io/cloud

Option 2: Local development

bash

# Using Docker
docker run -d --name temporal \
  -p 7233:7233 -p 8233:8233 \
  temporalio/auto-setup:latest

# Or using Temporal CLI
temporal server start-dev

Installation

bash

npm install @helix-agents/runtime-temporal @helix-agents/store-redis @temporalio/client @temporalio/worker

Architecture

┌──────────────────────────────────────────────────────────────┐
│                      Your Application                         │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ TemporalAgentExecutor                                    │ │
│  │   - Starts workflows                                     │ │
│  │   - Returns handles                                      │ │
│  │   - Reads from StateStore/StreamManager                  │ │
│  └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│                    Temporal Server                            │
│  - Persists workflow state                                    │
│  - Manages task queues                                        │
│  - Handles retries and timeouts                               │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│                     Temporal Worker                           │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Agent Workflow                                           │ │
│  │   - Orchestrates execution                               │ │
│  │   - Calls activities for LLM/tools                       │ │
│  └─────────────────────────────────────────────────────────┘ │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Activities                                               │ │
│  │   - LLM calls                                            │ │
│  │   - Tool execution                                       │ │
│  │   - State persistence                                    │ │
│  └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Setup Guide

1. Create the Workflow

Define a workflow that wraps the agent execution:

typescript

// src/workflows/agent-workflow.ts
import { proxyActivities, defineSignal, setHandler } from '@temporalio/workflow';
import type { AgentWorkflowInput, AgentWorkflowResult } from '@helix-agents/runtime-temporal';
import type * as activities from '../activities';

// Proxy activities with timeouts
const { executeAgentStep, saveState, loadState } = proxyActivities<typeof activities>({
  startToCloseTimeout: '5 minutes',
  retry: {
    maximumAttempts: 3,
    backoffCoefficient: 2,
  },
});

// Abort signal
export const abortSignal = defineSignal('abort');

export async function agentWorkflow(input: AgentWorkflowInput): Promise<AgentWorkflowResult> {
  let aborted = false;
  setHandler(abortSignal, () => {
    aborted = true;
  });

  try {
    // Load or initialize state
    let state = await loadState(input.runId);
    if (!state) {
      state = await initializeState(input);
    }

    // Main execution loop
    while (state.status === 'running' && !aborted) {
      const stepResult = await executeAgentStep(input.agentType, state);
      state = await processStepResult(state, stepResult);
      await saveState(state);
    }

    if (aborted) {
      return { status: 'failed', error: 'Workflow aborted' };
    }

    return {
      status: state.status === 'completed' ? 'completed' : 'failed',
      output: state.output,
      error: state.error,
    };
  } catch (error) {
    return {
      status: 'failed',
      error: error instanceof Error ? error.message : String(error),
    };
  }
}

2. Create Activities

Activities perform the actual work (LLM calls, tool execution):

typescript

// src/activities/agent-activities.ts
import { AgentRegistry, type AgentState } from '@helix-agents/runtime-temporal';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { RedisStateStore, RedisStreamManager } from '@helix-agents/store-redis';

const stateStore = new RedisStateStore(redisClient);
const streamManager = new RedisStreamManager(redisClient);
const llmAdapter = new VercelAIAdapter();
const registry = new AgentRegistry();

// Register your agents
registry.register(ResearchAgent);
registry.register(AnalyzerAgent);

export async function loadState(runId: string): Promise<AgentState<unknown, unknown> | null> {
  return stateStore.load(runId);
}

export async function saveState(state: AgentState<unknown, unknown>): Promise<void> {
  await stateStore.save(state);
}

export async function executeAgentStep(
  agentType: string,
  state: AgentState<unknown, unknown>
): Promise<StepResult<unknown>> {
  const agent = registry.get(agentType);
  if (!agent) {
    throw new Error(`Unknown agent type: ${agentType}`);
  }

  // Execute one LLM step
  return executeStep(agent, state, llmAdapter, streamManager);
}

3. Create the Worker

The worker processes workflows and activities:

typescript

// src/worker.ts
import { Worker, NativeConnection } from '@temporalio/worker';
import * as activities from './activities';

async function runWorker() {
  const connection = await NativeConnection.connect({
    address: process.env.TEMPORAL_ADDRESS ?? 'localhost:7233',
  });

  const worker = await Worker.create({
    connection,
    namespace: 'default',
    taskQueue: 'agent-tasks',
    workflowsPath: require.resolve('./workflows'),
    activities,
  });

  await worker.run();
}

runWorker().catch(console.error);

4. Create the Executor

The executor starts workflows and returns handles:

typescript

// src/executor.ts
import { Client, Connection } from '@temporalio/client';
import { TemporalAgentExecutor } from '@helix-agents/runtime-temporal';
import { RedisStateStore, RedisStreamManager } from '@helix-agents/store-redis';

async function createExecutor() {
  const connection = await Connection.connect({
    address: process.env.TEMPORAL_ADDRESS ?? 'localhost:7233',
  });

  const client = new Client({ connection });

  // Wrap Temporal client to match interface
  const temporalClientAdapter = {
    startWorkflow: async (name, options) => {
      const handle = await client.workflow.start(name, {
        workflowId: options.workflowId,
        taskQueue: options.taskQueue,
        args: options.args,
      });
      return wrapHandle(handle);
    },
    getHandle: (workflowId) => {
      return wrapHandle(client.workflow.getHandle(workflowId));
    },
  };

  return new TemporalAgentExecutor({
    client: temporalClientAdapter,
    stateStore: new RedisStateStore(redis),
    streamManager: new RedisStreamManager(redis),
    workflowName: 'agentWorkflow',
    taskQueue: 'agent-tasks',
  });
}

Using the Executor

Once set up, usage is identical to other runtimes:

typescript

const executor = await createExecutor();

// Execute agent
const handle = await executor.execute(ResearchAgent, 'Research quantum computing');

// Stream events
const stream = await handle.stream();
for await (const chunk of stream) {
  console.log(chunk);
}

// Get result
const result = await handle.result();

Agent Registry

typescript

import { AgentRegistry } from '@helix-agents/runtime-temporal';

const registry = new AgentRegistry();

// Register each agent type
registry.register(ResearchAgent); // name: 'researcher'
registry.register(AnalyzerAgent); // name: 'analyzer'
registry.register(SummarizerAgent); // name: 'summarizer'

// In activities, look up by type
export async function executeAgentStep(agentType: string, state) {
  const agent = registry.get(agentType); // Returns the agent config
  // ...
}

Sub-Agent Handling

Sub-agents execute as child workflows:

typescript

// In workflow
import { executeChild } from '@temporalio/workflow';

// When parent needs to execute sub-agent
const subAgentResult = await executeChild('agentWorkflow', {
  args: [
    {
      agentType: 'analyzer',
      runId: `${parentRunId}-sub-${callId}`,
      streamId: parentStreamId, // Same stream for unified streaming
      message: inputMessage,
      parentAgentId: parentRunId,
    },
  ],
  workflowId: `agent__analyzer__${subRunId}`,
  taskQueue: 'agent-tasks',
});

Benefits of child workflows:

Independent retry policies
Separate timeouts
Can be cancelled independently
Full workflow history preserved

Activity Configuration

Configure timeouts and retries per activity:

typescript

const { executeAgentStep } = proxyActivities<typeof activities>({
  // How long the activity can run
  startToCloseTimeout: '10 minutes',

  // How long to wait for worker to start processing
  scheduleToStartTimeout: '1 minute',

  // Heartbeat timeout for long activities
  heartbeatTimeout: '30 seconds',

  // Retry configuration
  retry: {
    initialInterval: '1 second',
    backoffCoefficient: 2,
    maximumInterval: '1 minute',
    maximumAttempts: 5,
    nonRetryableErrorTypes: ['InvalidAgentError'],
  },
});

Crash Recovery

Temporal provides automatic crash recovery:

Worker 1 starts workflow
    │
    ├── Step 1 completes, state saved
    ├── Step 2 completes, state saved
    │
    └── Worker 1 crashes
           │
           ▼
Temporal detects failure
    │
    ├── Workflow task rescheduled
    │
    ▼
Worker 2 picks up
    │
    ├── Replays history (deterministic)
    ├── Continues from Step 3
    └── Completes normally

Key points:

Workflow code must be deterministic
State is reconstructed from event history
Activities are not re-executed (results cached)

Determinism Requirements

Workflow code must be deterministic:

typescript

// BAD - Non-deterministic
export async function agentWorkflow(input) {
  const timestamp = Date.now(); // Different on replay!
  const random = Math.random(); // Different on replay!
  const uuid = crypto.randomUUID(); // Different on replay!
}

// GOOD - Use Temporal APIs
import { sleep, uuid4, workflowInfo } from '@temporalio/workflow';

export async function agentWorkflow(input) {
  const info = workflowInfo();
  const timestamp = info.startTime; // Deterministic
  const id = uuid4(); // Deterministic (seeded)
  await sleep('5 seconds'); // Deterministic timer
}

Move non-deterministic operations to activities:

LLM API calls
Database queries
External API calls
Random number generation

Observability

Temporal Web UI

Access at http://localhost:8233 (local) or via Temporal Cloud.

View:

Workflow history and events
Activity execution details
Pending/failed workflows
Search by workflow ID or type

Workflow Queries

Query running workflows:

typescript

// In workflow
import { defineQuery, setHandler } from '@temporalio/workflow';

export const getProgressQuery = defineQuery<{ stepCount: number; status: string }>('getProgress');

export async function agentWorkflow(input) {
  let progress = { stepCount: 0, status: 'running' };

  setHandler(getProgressQuery, () => progress);

  // Update progress during execution
  progress.stepCount++;
  // ...
}

// From client
const handle = client.workflow.getHandle(workflowId);
const progress = await handle.query(getProgressQuery);

Production Deployment

Worker Scaling

Run multiple workers for throughput:

bash

# Scale horizontally
docker-compose scale worker=5

Workers pull from the same task queue - Temporal handles distribution.

Temporal Cloud

For production, use Temporal Cloud:

typescript

import { Connection, Client } from '@temporalio/client';

const connection = await Connection.connect({
  address: 'your-namespace.tmprl.cloud:7233',
  tls: {
    clientCertPair: {
      crt: fs.readFileSync('client.pem'),
      key: fs.readFileSync('client.key'),
    },
  },
});

Monitoring

Set up metrics:

typescript

import { Runtime } from '@temporalio/worker';

Runtime.install({
  telemetryOptions: {
    metrics: {
      prometheus: { bindAddress: '0.0.0.0:9464' },
    },
  },
});

Limitations

Higher Latency

Each activity invocation adds network overhead. Batch operations when possible.

Determinism Constraints

Workflow code restrictions can be challenging. Move all I/O to activities.

Infrastructure Overhead

Requires running Temporal server and workers alongside your application.

Learning Curve

Temporal concepts (workflows, activities, replay, determinism) require understanding.

Best Practices

1. Keep Workflows Thin

Workflow code should only orchestrate - move logic to activities:

typescript

// Workflow just coordinates
export async function agentWorkflow(input) {
  const state = await loadState(input.runId);
  const result = await executeStep(state); // Activity does the work
  await saveState(state);
}

2. Appropriate Timeouts

Set realistic timeouts:

typescript

const { executeAgentStep } = proxyActivities<typeof activities>({
  startToCloseTimeout: '5 minutes', // LLM calls can be slow
});

3. Heartbeat Long Activities

For activities > 30 seconds, implement heartbeating:

typescript

export async function executeLongTool(input: ToolInput): Promise<ToolResult> {
  for (const item of items) {
    await processItem(item);
    Context.current().heartbeat(); // Report progress
  }
  return result;
}

4. Use Continue-As-New for Long Histories

Workflows with many events should reset:

typescript

import { continueAsNew, workflowInfo } from '@temporalio/workflow';

export async function agentWorkflow(input) {
  const info = workflowInfo();

  // After many steps, continue as new to reset history
  if (info.historyLength > 10000) {
    await continueAsNew<typeof agentWorkflow>(input);
  }
  // ...
}

Next Steps

JavaScript Runtime - Simpler option for development
Cloudflare Runtime - Edge deployment alternative
Storage: Redis - Recommended store for Temporal

Temporal Runtime ​

When to Use ​

Prerequisites ​

Installation ​

Architecture ​

Setup Guide ​

1. Create the Workflow ​

2. Create Activities ​

3. Create the Worker ​

4. Create the Executor ​

Using the Executor ​

Agent Registry ​

Sub-Agent Handling ​

Activity Configuration ​

Crash Recovery ​

Determinism Requirements ​

Observability ​

Temporal Web UI ​

Workflow Queries ​

Production Deployment ​

Worker Scaling ​

Temporal Cloud ​

Monitoring ​

Limitations ​

Higher Latency ​

Determinism Constraints ​

Infrastructure Overhead ​

Learning Curve ​

Best Practices ​

1. Keep Workflows Thin ​

2. Appropriate Timeouts ​

3. Heartbeat Long Activities ​

4. Use Continue-As-New for Long Histories ​

Next Steps ​

Temporal Runtime

When to Use

Prerequisites

Installation

Architecture

Setup Guide

1. Create the Workflow

2. Create Activities

3. Create the Worker

4. Create the Executor

Using the Executor

Agent Registry

Sub-Agent Handling

Activity Configuration

Crash Recovery

Determinism Requirements

Observability

Temporal Web UI

Workflow Queries

Production Deployment

Worker Scaling

Temporal Cloud

Monitoring

Limitations

Higher Latency

Determinism Constraints

Infrastructure Overhead

Learning Curve

Best Practices

1. Keep Workflows Thin

2. Appropriate Timeouts

3. Heartbeat Long Activities

4. Use Continue-As-New for Long Histories

Next Steps