Error Handling

Helix Agents provides a unified error handling architecture that classifies errors at the backend, transports them through the stream protocol, and reconstructs them on the frontend. Every error — whether from an LLM provider, a tool, or the framework itself — flows through the same pipeline with typed codes, categories, and retry information.

Overview

The error system has three layers:

Backend Classification (@helix-agents/core) — Converts any error into a typed HelixError with a code, category, and retryability flag
LLM Error Mapping (@helix-agents/llm-vercel) — Maps Vercel AI SDK errors (rate limits, auth failures, timeouts) to HelixError
Frontend Reconstruction (@helix-agents/ai-sdk) — Reconstructs typed errors from stream events for client-side handling

The end-to-end flow:

LLM throws → mapVercelError() classifies → runtime onError writes ErrorChunk
  → SSE transport → HelixStreamError.fromEvent() reconstructs on frontend

Error Classification

All errors are normalized into HelixError, which extends Error with structured metadata:

typescript

import { HelixError } from '@helix-agents/core';

const error = new HelixError({
  message: 'Rate limit exceeded',
  code: 'provider_rate_limited',
  retryable: true,
  statusCode: 429,
  cause: originalError,
});

error.code; // 'provider_rate_limited'
error.category; // 'provider' (derived from code prefix)
error.retryable; // true
error.statusCode; // 429
error.cause; // original error for debugging

The category is automatically derived from the error code prefix unless explicitly overridden. A code like provider_rate_limited maps to category provider, while tool_execution_failed maps to tool.

HelixError.isInstance()

Use HelixError.isInstance() to check if an error is a HelixError. This uses a Symbol.for marker rather than instanceof, so it works reliably across package versions and module boundaries:

typescript

import { HelixError } from '@helix-agents/core';

if (HelixError.isInstance(error)) {
  console.log(error.code); // typed ErrorCode
  console.log(error.retryable); // boolean
}

Utility Functions

Two utilities handle the common problem of extracting useful information from unknown error values:

typescript

import { extractErrorMessage, ensureError } from '@helix-agents/core';

// Safe message extraction from any value
extractErrorMessage(new Error('fail')); // 'fail'
extractErrorMessage({ message: 'fail' }); // 'fail'
extractErrorMessage('string error'); // 'string error'
extractErrorMessage({ foo: 'bar' }); // '{"foo":"bar"}'
extractErrorMessage(null); // 'null'

// Wrap non-Error values into proper Error instances
const err = ensureError('string error');
err instanceof Error; // true
err.message; // 'string error'
err.cause; // 'string error' (original value preserved)

classifyError()

The classifyError() function converts any error into a HelixError. It recognizes framework-specific error types and maps them to appropriate codes:

typescript

import { classifyError } from '@helix-agents/core';

const classified = classifyError(someUnknownError);
// Always returns a HelixError, never throws

Classification rules:

Input	ErrorCode
Already a `HelixError`	Passed through unchanged
`AbortError` / `DOMException('AbortError')`	`framework_cancelled`
`StaleStateError`	`state_concurrency_conflict`
`FencingTokenMismatchError`	`state_concurrency_conflict`
`ExecutorSupersededError`	`state_concurrency_conflict`
`AgentAlreadyRunningError`	`state_already_running`
`AgentNotResumableError`	`state_not_resumable`
`StreamDropError`	`transport_error`
`RemoteAgentFailedError`	`framework_internal_error`
Any other `Error`	`framework_internal_error`
Non-Error value	`framework_internal_error`

Error Codes

The ErrorCode type defines 22 specific codes organized by category:

Provider Errors

Errors from the LLM provider (OpenAI, Anthropic, etc.):

Code	Description	Retryable
`provider_overloaded`	Service at capacity (503, 529)	Yes
`provider_rate_limited`	Rate limit hit (429)	Yes
`provider_auth_error`	Invalid credentials (401, 403)	No
`provider_content_filtered`	Content policy violation	No
`provider_refused`	Request refused	No
`provider_timeout`	Request timed out (408)	Yes
`provider_invalid_request`	Malformed request (400, 422)	No
`provider_error`	Generic provider error (5xx)	Depends

Tool Errors

Errors during tool execution:

Code	Description
`tool_input_invalid`	LLM provided invalid arguments
`tool_execution_failed`	Tool's `execute()` threw
`tool_not_found`	Referenced tool doesn't exist
`tool_timeout`	Tool exceeded time limit

Diagnostic: non-object tool-call input

If you see [helix-agents] non-object tool-call input coerced to {} in your logs, the model/SDK returned a tool-call input that was not a JSON object and could not be parsed into one. The framework replaced it with {}. If the tool's schema has required fields, this fails input validation (tool_input_invalid) and surfaces as a recoverable tool error rather than crashing the run on the next step's replay; if every field is optional, execute() runs with {} and this warning is your only signal. A [helix-agents] structured output repaired from JSON string warning means a structured output arrived as a JSON string and was repaired against the agent's outputSchema. Both are robustness guards that enforce the "tool input is an object" invariant (issue #155).

State Errors

Errors from the state store or session management:

Code	Description
`state_concurrency_conflict`	Concurrent modification detected
`state_session_not_found`	Session doesn't exist
`state_already_running`	Session has an active run
`state_not_resumable`	Session can't be resumed

Transport Errors

Code	Description
`transport_error`	Communication failure

Remote Agent Errors

These errors are used internally by the runtimes for remote sub-agent stream recovery. They are not HelixError codes — they are standalone error classes that runtimes catch via instanceof to trigger platform-specific retry mechanisms.

Error Class	When Thrown	Runtime Handling
`StreamDropError`	SSE stream ends without an `end` event (network drop, server restart)	Temporal: triggers activity retry. Cloudflare: triggers step retry. JS: handled internally (not thrown).
`RemoteAgentFailedError`	`transport.getStatus()` reports a failed state, or a non-recoverable transport error occurs	All runtimes: surfaces as a tool failure to the parent agent

Both classes carry metadata for recovery:

typescript

import { StreamDropError, RemoteAgentFailedError } from '@helix-agents/core';

// StreamDropError — carries position for retry
try {
  /* stream reading */
} catch (e) {
  if (e instanceof StreamDropError) {
    console.log(e.remoteSessionId); // Which remote agent
    console.log(e.lastSequence); // Resume from this position
  }
}

// RemoteAgentFailedError — carries failure reason
try {
  /* remote execution */
} catch (e) {
  if (e instanceof RemoteAgentFailedError) {
    console.log(e.remoteSessionId);
    console.log(e.remoteError); // Why the remote agent failed
  }
}

See Remote Agents — Stream Recovery for configuration options.

Validation Errors

Code	Description
`validation_error`	Input validation failed

Framework Errors

Code	Description
`framework_internal_error`	Unexpected internal error
`framework_not_supported`	Feature not supported
`framework_cancelled`	Operation was cancelled

Currently Produced vs Reserved Codes

The framework currently produces the following error codes:

Provider codes (via mapVercelError in llm-vercel):

provider_auth_error (401, 403)
provider_rate_limited (429)
provider_timeout (408)
provider_invalid_request (400, 422)
provider_overloaded (503, 529)
provider_error (other 5xx, no status code)

State codes (via classifyError in core):

state_concurrency_conflict (StaleStateError, FencingTokenMismatchError, ExecutorSupersededError)
state_already_running (AgentAlreadyRunningError)
state_not_resumable (AgentNotResumableError)

Framework codes (via classifyError in core):

framework_cancelled (AbortError)
framework_internal_error (generic Error, unknown values)

Reserved for future use (not currently produced):

provider_content_filtered - for content policy violations
provider_refused - for model refusals
tool_input_invalid - for Zod validation failures (currently uses generic error)
tool_execution_failed - for tool execution errors (currently uses generic error)
tool_not_found - for missing tools (currently uses generic error)
tool_timeout - for tool timeouts (currently uses generic error)
state_session_not_found - for missing sessions (currently throws, not classified)
transport_error - for network failures (currently uses generic error)
validation_error - for schema validation (currently uses generic error)
framework_not_supported - for unsupported features (currently not used)

LLM Error Mapping

The @helix-agents/llm-vercel package maps Vercel AI SDK errors to HelixError using mapVercelError(). This runs automatically when using the Vercel adapter -- you don't call it directly in most cases.

typescript

import { mapVercelError } from '@helix-agents/llm-vercel';

const classified = mapVercelError(vercelSdkError);
// Returns a HelixError with appropriate code and retryability

ApiCallError Mapping

The Vercel AI SDK throws ApiCallError for HTTP-level failures. These are mapped by status code:

HTTP Status	ErrorCode
401, 403	`provider_auth_error`
408	`provider_timeout`
429	`provider_rate_limited`
400, 422	`provider_invalid_request`
503, 529	`provider_overloaded`
Other 5xx	`provider_error`

The retryable flag is preserved from the Vercel SDK's own isRetryable property.

RetryError Mapping

When the Vercel SDK exhausts its retry budget, it throws a RetryError wrapping the last error. mapVercelError() unwraps it, classifies the inner error, and marks the result as non-retryable (since retries already failed):

typescript

// RetryError containing a 429 ApiCallError:
// → code: 'provider_rate_limited'
// → retryable: false (retries exhausted)
// → message: 'Failed after retries: Rate limited'

Fallback

If the error isn't an ApiCallError or RetryError, mapVercelError() falls back to classifyError() from core.

Runtime Error Handling

All three runtimes (JS, Temporal, Cloudflare) handle errors consistently through the LLM adapter's onError callback. When the LLM step produces an error, the runtime:

Checks if the error is a HelixError via HelixError.isInstance()
Writes an ErrorChunk to the stream with the classification data
If unclassified, writes with recoverable: false and no code

typescript

// What every runtime does internally (simplified):
onError: async (error: Error) => {
  const isHelix = HelixError.isInstance(error);

  await writer.write({
    type: 'error',
    agentId: state.sessionId,
    agentType: state.agentType,
    timestamp: Date.now(),
    step: state.stepCount,
    error: error.message,
    ...(isHelix && { code: error.code }),
    recoverable: isHelix ? error.retryable : false,
  });
},

This means the frontend always receives structured error information when the error originated from a classified source (LLM provider, state store, etc.), and a safe fallback for unexpected errors.

Stream Protocol

Errors are transported as ErrorChunk events in the stream:

typescript

interface ErrorChunk {
  type: 'error';
  error: string; // Human-readable message
  code?: string; // ErrorCode (present if classified)
  recoverable: boolean; // Whether the client can retry
  agentId: string;
  agentType: string;
  timestamp: number;
  step: number;
}

The code field is only present when the error was classified as a HelixError. The recoverable field maps to retryable from the backend classification -- the wire protocol uses "recoverable" while the TypeScript API uses "retryable".

For SSE transport, these chunks are serialized as JSON data events alongside all other stream chunk types.

Frontend Error Handling

The @helix-agents/ai-sdk package provides two error types for frontend use.

FrontendHandlerError

FrontendHandlerError represents HTTP-level errors from the FrontendHandler. These are thrown before the stream starts -- during request validation, session lookup, or stream creation.

typescript

import {
  FrontendHandlerError,
  ValidationError,
  StreamNotFoundError,
  StreamFailedError,
  ExecutionError,
} from '@helix-agents/ai-sdk';

Catch these in your API route handler:

typescript

export async function POST(req: Request) {
  try {
    return await handler.handleRequest(req);
  } catch (error) {
    if (error instanceof FrontendHandlerError) {
      return new Response(JSON.stringify({ error: error.message, code: error.code }), {
        status: error.statusCode,
      });
    }

    throw error;
  }
}

Subclasses and their HTTP status codes:

Error Class	Code	Status
`ValidationError`	`VALIDATION_ERROR`	400
`StreamNotFoundError`	`STREAM_NOT_FOUND`	404
`StreamReaderError`	`STREAM_READER_ERROR`	404
`StreamFailedError`	`STREAM_FAILED`	410
`ConfigurationError`	`CONFIGURATION_ERROR`	501
`ExecutionError`	`EXECUTION_ERROR`	500
`StreamCreationError`	`STREAM_CREATION_ERROR`	500

HelixStreamError

HelixStreamError represents errors received through the stream itself. Use HelixStreamError.fromEvent() to reconstruct a typed error from a stream error event:

typescript

import { HelixStreamError } from '@helix-agents/ai-sdk';

const { data } = useChat({ api: '/api/chat' });

const errorEvent = data?.find((d) => d.type === 'error');
if (errorEvent) {
  const err = HelixStreamError.fromEvent(errorEvent);

  console.log(err.message); // 'Rate limit exceeded'
  console.log(err.code); // 'provider_rate_limited'
  console.log(err.retryable); // true
}

Retry Logic

Use the retryable flag to implement retry behavior:

typescript

const err = HelixStreamError.fromEvent(errorEvent);

if (err.retryable) {
  // Safe to retry with backoff
  await delay(2000);
  await submitMessage(lastMessage);
} else {
  // Show error to user, don't retry
  setError(err.message);
}

Code-Based Error Handling

Use the code field for specific error handling:

typescript

const err = HelixStreamError.fromEvent(errorEvent);

switch (err.code) {
  case 'provider_rate_limited':
    showToast('Too many requests. Retrying shortly...');
    await retryWithBackoff();
    break;

  case 'provider_auth_error':
    showToast('Authentication failed. Check your API key.');
    break;

  case 'provider_overloaded':
    showToast('Service is busy. Please wait...');
    await retryWithBackoff();
    break;

  default:
    showToast(`Error: ${err.message}`);
}

Temporal Serialization

The Temporal runtime serializes error information through DTOs to cross activity/workflow boundaries. Two schemas carry the errorCode:

AgentStepResultSchema includes errorCode for step-level errors:

typescript

const AgentStepResultSchema = z.object({
  shouldContinue: z.boolean(),
  isComplete: z.boolean(),
  pendingToolCalls: z.array(/* ... */),
  error: z.string().optional(),
  recoverable: z.boolean().optional(),
  errorCode: z.string().optional(), // HelixError.code
  stopReason: z.custom<StopReason>(),
});

FailAgentStreamInputSchema includes errorCode for stream-level failures:

typescript

const FailAgentStreamInputSchema = z.object({
  sessionId: z.string().min(1),
  error: z.string(),
  errorCode: z.string().optional(), // HelixError.code
  startedAt: z.number().optional(),
});

This ensures that when a HelixError occurs inside a Temporal activity, its classification code survives serialization through the workflow boundary and reaches the stream with full type information.

Error Handling ​

Overview ​

Error Classification ​

HelixError.isInstance() ​

Utility Functions ​

classifyError() ​

Error Codes ​

Provider Errors ​

Tool Errors ​

Diagnostic: non-object tool-call input ​

State Errors ​

Transport Errors ​

Remote Agent Errors ​

Validation Errors ​

Framework Errors ​

Currently Produced vs Reserved Codes ​

LLM Error Mapping ​

ApiCallError Mapping ​

RetryError Mapping ​

Fallback ​

Runtime Error Handling ​

Stream Protocol ​

Frontend Error Handling ​

FrontendHandlerError ​

HelixStreamError ​

Retry Logic ​

Code-Based Error Handling ​

Temporal Serialization ​

See Also ​

Error Handling

Overview

Error Classification

HelixError.isInstance()

Utility Functions

classifyError()

Error Codes

Provider Errors

Tool Errors

Diagnostic: non-object tool-call input

State Errors

Transport Errors

Remote Agent Errors

Validation Errors

Framework Errors

Currently Produced vs Reserved Codes

LLM Error Mapping

ApiCallError Mapping

RetryError Mapping

Fallback

Runtime Error Handling

Stream Protocol

Frontend Error Handling

FrontendHandlerError

HelixStreamError

Retry Logic

Code-Based Error Handling

Temporal Serialization

See Also