Skip to content

Temporal Runtime

The Temporal runtime (@helix-agents/runtime-temporal) executes agents as durable Temporal workflows. This provides crash recovery, automatic retries, and production-grade reliability for long-running agent tasks.

When to Use

Good fit:

  • Production workloads requiring reliability
  • Long-running agents (hours or days)
  • Agents that must survive process restarts
  • Complex multi-agent orchestrations
  • Operations requiring audit trails and observability

Not ideal for:

  • Quick development iteration (infrastructure overhead)
  • Simple, short-lived agents
  • Cost-sensitive deployments without existing Temporal infrastructure

Prerequisites

You need a running Temporal server:

Option 1: Temporal Cloud (Recommended for production)

bash
# Sign up at https://temporal.io/cloud

Option 2: Local development

bash
# Using Docker
docker run -d --name temporal \
  -p 7233:7233 -p 8233:8233 \
  temporalio/auto-setup:latest

# Or using Temporal CLI
temporal server start-dev

Installation

bash
npm install @helix-agents/runtime-temporal @helix-agents/store-redis @temporalio/client @temporalio/worker

Architecture

┌──────────────────────────────────────────────────────────────┐
│                      Your Application                         │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ TemporalAgentExecutor                                    │ │
│  │   - Starts workflows                                     │ │
│  │   - Returns handles                                      │ │
│  │   - Reads from StateStore/StreamManager                  │ │
│  └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘


┌──────────────────────────────────────────────────────────────┐
│                    Temporal Server                            │
│  - Persists workflow state                                    │
│  - Manages task queues                                        │
│  - Handles retries and timeouts                               │
└──────────────────────────────────────────────────────────────┘


┌──────────────────────────────────────────────────────────────┐
│                     Temporal Worker                           │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Agent Workflow                                           │ │
│  │   - Orchestrates execution                               │ │
│  │   - Calls activities for LLM/tools                       │ │
│  └─────────────────────────────────────────────────────────┘ │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Activities                                               │ │
│  │   - LLM calls                                            │ │
│  │   - Tool execution                                       │ │
│  │   - State persistence                                    │ │
│  └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Setup Guide

1. Create the Workflow

Define a workflow that wraps the agent execution:

typescript
// src/workflows/agent-workflow.ts
import { proxyActivities, defineSignal, setHandler } from '@temporalio/workflow';
import type { AgentWorkflowInput, AgentWorkflowResult } from '@helix-agents/runtime-temporal';
import type * as activities from '../activities';

// Proxy activities with timeouts
const { executeAgentStep, saveState, loadState } = proxyActivities<typeof activities>({
  startToCloseTimeout: '5 minutes',
  retry: {
    maximumAttempts: 3,
    backoffCoefficient: 2,
  },
});

// Abort signal
export const abortSignal = defineSignal('abort');

export async function agentWorkflow(input: AgentWorkflowInput): Promise<AgentWorkflowResult> {
  let aborted = false;
  setHandler(abortSignal, () => {
    aborted = true;
  });

  try {
    // Load or initialize state
    let state = await loadState(input.runId);
    if (!state) {
      state = await initializeState(input);
    }

    // Main execution loop
    while (state.status === 'running' && !aborted) {
      const stepResult = await executeAgentStep(input.agentType, state);
      state = await processStepResult(state, stepResult);
      await saveState(state);
    }

    if (aborted) {
      return { status: 'failed', error: 'Workflow aborted' };
    }

    return {
      status: state.status === 'completed' ? 'completed' : 'failed',
      output: state.output,
      error: state.error,
    };
  } catch (error) {
    return {
      status: 'failed',
      error: error instanceof Error ? error.message : String(error),
    };
  }
}

2. Create Activities

Activities perform the actual work (LLM calls, tool execution):

typescript
// src/activities/agent-activities.ts
import { AgentRegistry, type AgentState } from '@helix-agents/runtime-temporal';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { RedisStateStore, RedisStreamManager } from '@helix-agents/store-redis';

const stateStore = new RedisStateStore(redisClient);
const streamManager = new RedisStreamManager(redisClient);
const llmAdapter = new VercelAIAdapter();
const registry = new AgentRegistry();

// Register your agents
registry.register(ResearchAgent);
registry.register(AnalyzerAgent);

export async function loadState(runId: string): Promise<AgentState<unknown, unknown> | null> {
  return stateStore.load(runId);
}

export async function saveState(state: AgentState<unknown, unknown>): Promise<void> {
  await stateStore.save(state);
}

export async function executeAgentStep(
  agentType: string,
  state: AgentState<unknown, unknown>
): Promise<StepResult<unknown>> {
  const agent = registry.get(agentType);
  if (!agent) {
    throw new Error(`Unknown agent type: ${agentType}`);
  }

  // Execute one LLM step
  return executeStep(agent, state, llmAdapter, streamManager);
}

3. Create the Worker

The worker processes workflows and activities:

typescript
// src/worker.ts
import { Worker, NativeConnection } from '@temporalio/worker';
import * as activities from './activities';

async function runWorker() {
  const connection = await NativeConnection.connect({
    address: process.env.TEMPORAL_ADDRESS ?? 'localhost:7233',
  });

  const worker = await Worker.create({
    connection,
    namespace: 'default',
    taskQueue: 'agent-tasks',
    workflowsPath: require.resolve('./workflows'),
    activities,
  });

  await worker.run();
}

runWorker().catch(console.error);

4. Create the Executor

The executor starts workflows and returns handles:

typescript
// src/executor.ts
import { Client, Connection } from '@temporalio/client';
import { TemporalAgentExecutor } from '@helix-agents/runtime-temporal';
import { RedisStateStore, RedisStreamManager } from '@helix-agents/store-redis';

async function createExecutor() {
  const connection = await Connection.connect({
    address: process.env.TEMPORAL_ADDRESS ?? 'localhost:7233',
  });

  const client = new Client({ connection });

  // Wrap Temporal client to match interface
  const temporalClientAdapter = {
    startWorkflow: async (name, options) => {
      const handle = await client.workflow.start(name, {
        workflowId: options.workflowId,
        taskQueue: options.taskQueue,
        args: options.args,
      });
      return wrapHandle(handle);
    },
    getHandle: (workflowId) => {
      return wrapHandle(client.workflow.getHandle(workflowId));
    },
  };

  return new TemporalAgentExecutor({
    client: temporalClientAdapter,
    stateStore: new RedisStateStore(redis),
    streamManager: new RedisStreamManager(redis),
    workflowName: 'agentWorkflow',
    taskQueue: 'agent-tasks',
  });
}

Using the Executor

Once set up, usage is identical to other runtimes:

typescript
const executor = await createExecutor();

// Execute agent
const handle = await executor.execute(ResearchAgent, 'Research quantum computing');

// Stream events
const stream = await handle.stream();
for await (const chunk of stream) {
  console.log(chunk);
}

// Get result
const result = await handle.result();

Agent Registry

Register agents so the worker can instantiate them:

typescript
import { AgentRegistry } from '@helix-agents/runtime-temporal';

const registry = new AgentRegistry();

// Register each agent type
registry.register(ResearchAgent); // name: 'researcher'
registry.register(AnalyzerAgent); // name: 'analyzer'
registry.register(SummarizerAgent); // name: 'summarizer'

// In activities, look up by type
export async function executeAgentStep(agentType: string, state) {
  const agent = registry.get(agentType); // Returns the agent config
  // ...
}

Sub-Agent Handling

Sub-agents execute as child workflows:

typescript
// In workflow
import { executeChild } from '@temporalio/workflow';

// When parent needs to execute sub-agent
const subAgentResult = await executeChild('agentWorkflow', {
  args: [
    {
      agentType: 'analyzer',
      runId: `${parentRunId}-sub-${callId}`,
      streamId: parentStreamId, // Same stream for unified streaming
      message: inputMessage,
      parentAgentId: parentRunId,
    },
  ],
  workflowId: `agent__analyzer__${subRunId}`,
  taskQueue: 'agent-tasks',
});

Benefits of child workflows:

  • Independent retry policies
  • Separate timeouts
  • Can be cancelled independently
  • Full workflow history preserved

Activity Configuration

Configure timeouts and retries per activity:

typescript
const { executeAgentStep } = proxyActivities<typeof activities>({
  // How long the activity can run
  startToCloseTimeout: '10 minutes',

  // How long to wait for worker to start processing
  scheduleToStartTimeout: '1 minute',

  // Heartbeat timeout for long activities
  heartbeatTimeout: '30 seconds',

  // Retry configuration
  retry: {
    initialInterval: '1 second',
    backoffCoefficient: 2,
    maximumInterval: '1 minute',
    maximumAttempts: 5,
    nonRetryableErrorTypes: ['InvalidAgentError'],
  },
});

Crash Recovery

Temporal provides automatic crash recovery:

Worker 1 starts workflow

    ├── Step 1 completes, state saved
    ├── Step 2 completes, state saved

    └── Worker 1 crashes


Temporal detects failure

    ├── Workflow task rescheduled


Worker 2 picks up

    ├── Replays history (deterministic)
    ├── Continues from Step 3
    └── Completes normally

Key points:

  • Workflow code must be deterministic
  • State is reconstructed from event history
  • Activities are not re-executed (results cached)

Determinism Requirements

Workflow code must be deterministic:

typescript
// BAD - Non-deterministic
export async function agentWorkflow(input) {
  const timestamp = Date.now(); // Different on replay!
  const random = Math.random(); // Different on replay!
  const uuid = crypto.randomUUID(); // Different on replay!
}

// GOOD - Use Temporal APIs
import { sleep, uuid4, workflowInfo } from '@temporalio/workflow';

export async function agentWorkflow(input) {
  const info = workflowInfo();
  const timestamp = info.startTime; // Deterministic
  const id = uuid4(); // Deterministic (seeded)
  await sleep('5 seconds'); // Deterministic timer
}

Move non-deterministic operations to activities:

  • LLM API calls
  • Database queries
  • External API calls
  • Random number generation

Observability

Temporal Web UI

Access at http://localhost:8233 (local) or via Temporal Cloud.

View:

  • Workflow history and events
  • Activity execution details
  • Pending/failed workflows
  • Search by workflow ID or type

Workflow Queries

Query running workflows:

typescript
// In workflow
import { defineQuery, setHandler } from '@temporalio/workflow';

export const getProgressQuery = defineQuery<{ stepCount: number; status: string }>('getProgress');

export async function agentWorkflow(input) {
  let progress = { stepCount: 0, status: 'running' };

  setHandler(getProgressQuery, () => progress);

  // Update progress during execution
  progress.stepCount++;
  // ...
}

// From client
const handle = client.workflow.getHandle(workflowId);
const progress = await handle.query(getProgressQuery);

Production Deployment

Worker Scaling

Run multiple workers for throughput:

bash
# Scale horizontally
docker-compose scale worker=5

Workers pull from the same task queue - Temporal handles distribution.

Temporal Cloud

For production, use Temporal Cloud:

typescript
import { Connection, Client } from '@temporalio/client';

const connection = await Connection.connect({
  address: 'your-namespace.tmprl.cloud:7233',
  tls: {
    clientCertPair: {
      crt: fs.readFileSync('client.pem'),
      key: fs.readFileSync('client.key'),
    },
  },
});

Monitoring

Set up metrics:

typescript
import { Runtime } from '@temporalio/worker';

Runtime.install({
  telemetryOptions: {
    metrics: {
      prometheus: { bindAddress: '0.0.0.0:9464' },
    },
  },
});

Limitations

Higher Latency

Each activity invocation adds network overhead. Batch operations when possible.

Determinism Constraints

Workflow code restrictions can be challenging. Move all I/O to activities.

Infrastructure Overhead

Requires running Temporal server and workers alongside your application.

Learning Curve

Temporal concepts (workflows, activities, replay, determinism) require understanding.

Best Practices

1. Keep Workflows Thin

Workflow code should only orchestrate - move logic to activities:

typescript
// Workflow just coordinates
export async function agentWorkflow(input) {
  const state = await loadState(input.runId);
  const result = await executeStep(state); // Activity does the work
  await saveState(state);
}

2. Appropriate Timeouts

Set realistic timeouts:

typescript
const { executeAgentStep } = proxyActivities<typeof activities>({
  startToCloseTimeout: '5 minutes', // LLM calls can be slow
});

3. Heartbeat Long Activities

For activities > 30 seconds, implement heartbeating:

typescript
export async function executeLongTool(input: ToolInput): Promise<ToolResult> {
  for (const item of items) {
    await processItem(item);
    Context.current().heartbeat(); // Report progress
  }
  return result;
}

4. Use Continue-As-New for Long Histories

Workflows with many events should reset:

typescript
import { continueAsNew, workflowInfo } from '@temporalio/workflow';

export async function agentWorkflow(input) {
  const info = workflowInfo();

  // After many steps, continue as new to reset history
  if (info.historyLength > 10000) {
    await continueAsNew<typeof agentWorkflow>(input);
  }
  // ...
}

Next Steps

Released under the MIT License.