Skip to content

Error Handling

Helix Agents provides a unified error handling architecture that classifies errors at the backend, transports them through the stream protocol, and reconstructs them on the frontend. Every error — whether from an LLM provider, a tool, or the framework itself — flows through the same pipeline with typed codes, categories, and retry information.

Overview

The error system has three layers:

  1. Backend Classification (@helix-agents/core) — Converts any error into a typed HelixError with a code, category, and retryability flag
  2. LLM Error Mapping (@helix-agents/llm-vercel) — Maps Vercel AI SDK errors (rate limits, auth failures, timeouts) to HelixError
  3. Frontend Reconstruction (@helix-agents/ai-sdk) — Reconstructs typed errors from stream events for client-side handling

The end-to-end flow:

LLM throws → mapVercelError() classifies → runtime onError writes ErrorChunk
  → SSE transport → HelixStreamError.fromEvent() reconstructs on frontend

Error Classification

All errors are normalized into HelixError, which extends Error with structured metadata:

typescript
import { HelixError } from '@helix-agents/core';

const error = new HelixError({
  message: 'Rate limit exceeded',
  code: 'provider_rate_limited',
  retryable: true,
  statusCode: 429,
  cause: originalError,
});

error.code; // 'provider_rate_limited'
error.category; // 'provider' (derived from code prefix)
error.retryable; // true
error.statusCode; // 429
error.cause; // original error for debugging

The category is automatically derived from the error code prefix unless explicitly overridden. A code like provider_rate_limited maps to category provider, while tool_execution_failed maps to tool.

HelixError.isInstance()

Use HelixError.isInstance() to check if an error is a HelixError. This uses a Symbol.for marker rather than instanceof, so it works reliably across package versions and module boundaries:

typescript
import { HelixError } from '@helix-agents/core';

if (HelixError.isInstance(error)) {
  console.log(error.code); // typed ErrorCode
  console.log(error.retryable); // boolean
}

Utility Functions

Two utilities handle the common problem of extracting useful information from unknown error values:

typescript
import { extractErrorMessage, ensureError } from '@helix-agents/core';

// Safe message extraction from any value
extractErrorMessage(new Error('fail')); // 'fail'
extractErrorMessage({ message: 'fail' }); // 'fail'
extractErrorMessage('string error'); // 'string error'
extractErrorMessage({ foo: 'bar' }); // '{"foo":"bar"}'
extractErrorMessage(null); // 'null'

// Wrap non-Error values into proper Error instances
const err = ensureError('string error');
err instanceof Error; // true
err.message; // 'string error'
err.cause; // 'string error' (original value preserved)

classifyError()

The classifyError() function converts any error into a HelixError. It recognizes framework-specific error types and maps them to appropriate codes:

typescript
import { classifyError } from '@helix-agents/core';

const classified = classifyError(someUnknownError);
// Always returns a HelixError, never throws

Classification rules:

InputErrorCode
Already a HelixErrorPassed through unchanged
AbortError / DOMException('AbortError')framework_cancelled
StaleStateErrorstate_concurrency_conflict
FencingTokenMismatchErrorstate_concurrency_conflict
ExecutorSupersededErrorstate_concurrency_conflict
AgentAlreadyRunningErrorstate_already_running
AgentNotResumableErrorstate_not_resumable
Any other Errorframework_internal_error
Non-Error valueframework_internal_error

Error Codes

The ErrorCode type defines 22 specific codes organized by category:

Provider Errors

Errors from the LLM provider (OpenAI, Anthropic, etc.):

CodeDescriptionRetryable
provider_overloadedService at capacity (503, 529)Yes
provider_rate_limitedRate limit hit (429)Yes
provider_auth_errorInvalid credentials (401, 403)No
provider_content_filteredContent policy violationNo
provider_refusedRequest refusedNo
provider_timeoutRequest timed out (408)Yes
provider_invalid_requestMalformed request (400, 422)No
provider_errorGeneric provider error (5xx)Depends

Tool Errors

Errors during tool execution:

CodeDescription
tool_input_invalidLLM provided invalid arguments
tool_execution_failedTool's execute() threw
tool_not_foundReferenced tool doesn't exist
tool_timeoutTool exceeded time limit

State Errors

Errors from the state store or session management:

CodeDescription
state_concurrency_conflictConcurrent modification detected
state_session_not_foundSession doesn't exist
state_already_runningSession has an active run
state_not_resumableSession can't be resumed

Transport Errors

CodeDescription
transport_errorCommunication failure

Validation Errors

CodeDescription
validation_errorInput validation failed

Framework Errors

CodeDescription
framework_internal_errorUnexpected internal error
framework_not_supportedFeature not supported
framework_cancelledOperation was cancelled

Currently Produced vs Reserved Codes

The framework currently produces the following error codes:

Provider codes (via mapVercelError in llm-vercel):

  • provider_auth_error (401, 403)
  • provider_rate_limited (429)
  • provider_timeout (408)
  • provider_invalid_request (400, 422)
  • provider_overloaded (503, 529)
  • provider_error (other 5xx, no status code)

State codes (via classifyError in core):

  • state_concurrency_conflict (StaleStateError, FencingTokenMismatchError, ExecutorSupersededError)
  • state_already_running (AgentAlreadyRunningError)
  • state_not_resumable (AgentNotResumableError)

Framework codes (via classifyError in core):

  • framework_cancelled (AbortError)
  • framework_internal_error (generic Error, unknown values)

Reserved for future use (not currently produced):

  • provider_content_filtered - for content policy violations
  • provider_refused - for model refusals
  • tool_input_invalid - for Zod validation failures (currently uses generic error)
  • tool_execution_failed - for tool execution errors (currently uses generic error)
  • tool_not_found - for missing tools (currently uses generic error)
  • tool_timeout - for tool timeouts (currently uses generic error)
  • state_session_not_found - for missing sessions (currently throws, not classified)
  • transport_error - for network failures (currently uses generic error)
  • validation_error - for schema validation (currently uses generic error)
  • framework_not_supported - for unsupported features (currently not used)

LLM Error Mapping

The @helix-agents/llm-vercel package maps Vercel AI SDK errors to HelixError using mapVercelError(). This runs automatically when using the Vercel adapter -- you don't call it directly in most cases.

typescript
import { mapVercelError } from '@helix-agents/llm-vercel';

const classified = mapVercelError(vercelSdkError);
// Returns a HelixError with appropriate code and retryability

ApiCallError Mapping

The Vercel AI SDK throws ApiCallError for HTTP-level failures. These are mapped by status code:

HTTP StatusErrorCode
401, 403provider_auth_error
408provider_timeout
429provider_rate_limited
400, 422provider_invalid_request
503, 529provider_overloaded
Other 5xxprovider_error

The retryable flag is preserved from the Vercel SDK's own isRetryable property.

RetryError Mapping

When the Vercel SDK exhausts its retry budget, it throws a RetryError wrapping the last error. mapVercelError() unwraps it, classifies the inner error, and marks the result as non-retryable (since retries already failed):

typescript
// RetryError containing a 429 ApiCallError:
// → code: 'provider_rate_limited'
// → retryable: false (retries exhausted)
// → message: 'Failed after retries: Rate limited'

Fallback

If the error isn't an ApiCallError or RetryError, mapVercelError() falls back to classifyError() from core.

Runtime Error Handling

All three runtimes (JS, Temporal, Cloudflare) handle errors consistently through the LLM adapter's onError callback. When the LLM step produces an error, the runtime:

  1. Checks if the error is a HelixError via HelixError.isInstance()
  2. Writes an ErrorChunk to the stream with the classification data
  3. If unclassified, writes with recoverable: false and no code
typescript
// What every runtime does internally (simplified):
onError: async (error: Error) => {
  const isHelix = HelixError.isInstance(error);

  await writer.write({
    type: 'error',
    agentId: state.sessionId,
    agentType: state.agentType,
    timestamp: Date.now(),
    step: state.stepCount,
    error: error.message,
    ...(isHelix && { code: error.code }),
    recoverable: isHelix ? error.retryable : false,
  });
},

This means the frontend always receives structured error information when the error originated from a classified source (LLM provider, state store, etc.), and a safe fallback for unexpected errors.

Stream Protocol

Errors are transported as ErrorChunk events in the stream:

typescript
interface ErrorChunk {
  type: 'error';
  error: string; // Human-readable message
  code?: string; // ErrorCode (present if classified)
  recoverable: boolean; // Whether the client can retry
  agentId: string;
  agentType: string;
  timestamp: number;
  step: number;
}

The code field is only present when the error was classified as a HelixError. The recoverable field maps to retryable from the backend classification -- the wire protocol uses "recoverable" while the TypeScript API uses "retryable".

For SSE transport, these chunks are serialized as JSON data events alongside all other stream chunk types.

Frontend Error Handling

The @helix-agents/ai-sdk package provides two error types for frontend use.

FrontendHandlerError

FrontendHandlerError represents HTTP-level errors from the FrontendHandler. These are thrown before the stream starts -- during request validation, session lookup, or stream creation.

typescript
import {
  FrontendHandlerError,
  ValidationError,
  StreamNotFoundError,
  StreamFailedError,
  ExecutionError,
} from '@helix-agents/ai-sdk';

Catch these in your API route handler:

typescript
export async function POST(req: Request) {
  try {
    return await handler.handleRequest(req);
  } catch (error) {
    if (error instanceof FrontendHandlerError) {
      return new Response(JSON.stringify({ error: error.message, code: error.code }), {
        status: error.statusCode,
      });
    }

    throw error;
  }
}

Subclasses and their HTTP status codes:

Error ClassCodeStatus
ValidationErrorVALIDATION_ERROR400
StreamNotFoundErrorSTREAM_NOT_FOUND404
StreamReaderErrorSTREAM_READER_ERROR404
StreamFailedErrorSTREAM_FAILED410
ConfigurationErrorCONFIGURATION_ERROR501
ExecutionErrorEXECUTION_ERROR500
StreamCreationErrorSTREAM_CREATION_ERROR500

HelixStreamError

HelixStreamError represents errors received through the stream itself. Use HelixStreamError.fromEvent() to reconstruct a typed error from a stream error event:

typescript
import { HelixStreamError } from '@helix-agents/ai-sdk';

const { data } = useChat({ api: '/api/chat' });

const errorEvent = data?.find((d) => d.type === 'error');
if (errorEvent) {
  const err = HelixStreamError.fromEvent(errorEvent);

  console.log(err.message); // 'Rate limit exceeded'
  console.log(err.code); // 'provider_rate_limited'
  console.log(err.retryable); // true
}

Retry Logic

Use the retryable flag to implement retry behavior:

typescript
const err = HelixStreamError.fromEvent(errorEvent);

if (err.retryable) {
  // Safe to retry with backoff
  await delay(2000);
  await submitMessage(lastMessage);
} else {
  // Show error to user, don't retry
  setError(err.message);
}

Code-Based Error Handling

Use the code field for specific error handling:

typescript
const err = HelixStreamError.fromEvent(errorEvent);

switch (err.code) {
  case 'provider_rate_limited':
    showToast('Too many requests. Retrying shortly...');
    await retryWithBackoff();
    break;

  case 'provider_auth_error':
    showToast('Authentication failed. Check your API key.');
    break;

  case 'provider_overloaded':
    showToast('Service is busy. Please wait...');
    await retryWithBackoff();
    break;

  default:
    showToast(`Error: ${err.message}`);
}

Temporal Serialization

The Temporal runtime serializes error information through DTOs to cross activity/workflow boundaries. Two schemas carry the errorCode:

AgentStepResultSchema includes errorCode for step-level errors:

typescript
const AgentStepResultSchema = z.object({
  shouldContinue: z.boolean(),
  isComplete: z.boolean(),
  pendingToolCalls: z.array(/* ... */),
  error: z.string().optional(),
  recoverable: z.boolean().optional(),
  errorCode: z.string().optional(), // HelixError.code
  stopReason: z.custom<StopReason>(),
});

FailAgentStreamInputSchema includes errorCode for stream-level failures:

typescript
const FailAgentStreamInputSchema = z.object({
  sessionId: z.string().min(1),
  error: z.string(),
  errorCode: z.string().optional(), // HelixError.code
  startedAt: z.number().optional(),
});

This ensures that when a HelixError occurs inside a Temporal activity, its classification code survives serialization through the workflow boundary and reaches the stream with full type information.

See Also

Released under the MIT License.