Error Handling
Helix Agents provides a unified error handling architecture that classifies errors at the backend, transports them through the stream protocol, and reconstructs them on the frontend. Every error — whether from an LLM provider, a tool, or the framework itself — flows through the same pipeline with typed codes, categories, and retry information.
Overview
The error system has three layers:
- Backend Classification (
@helix-agents/core) — Converts any error into a typedHelixErrorwith a code, category, and retryability flag - LLM Error Mapping (
@helix-agents/llm-vercel) — Maps Vercel AI SDK errors (rate limits, auth failures, timeouts) toHelixError - Frontend Reconstruction (
@helix-agents/ai-sdk) — Reconstructs typed errors from stream events for client-side handling
The end-to-end flow:
LLM throws → mapVercelError() classifies → runtime onError writes ErrorChunk
→ SSE transport → HelixStreamError.fromEvent() reconstructs on frontendError Classification
All errors are normalized into HelixError, which extends Error with structured metadata:
import { HelixError } from '@helix-agents/core';
const error = new HelixError({
message: 'Rate limit exceeded',
code: 'provider_rate_limited',
retryable: true,
statusCode: 429,
cause: originalError,
});
error.code; // 'provider_rate_limited'
error.category; // 'provider' (derived from code prefix)
error.retryable; // true
error.statusCode; // 429
error.cause; // original error for debuggingThe category is automatically derived from the error code prefix unless explicitly overridden. A code like provider_rate_limited maps to category provider, while tool_execution_failed maps to tool.
HelixError.isInstance()
Use HelixError.isInstance() to check if an error is a HelixError. This uses a Symbol.for marker rather than instanceof, so it works reliably across package versions and module boundaries:
import { HelixError } from '@helix-agents/core';
if (HelixError.isInstance(error)) {
console.log(error.code); // typed ErrorCode
console.log(error.retryable); // boolean
}Utility Functions
Two utilities handle the common problem of extracting useful information from unknown error values:
import { extractErrorMessage, ensureError } from '@helix-agents/core';
// Safe message extraction from any value
extractErrorMessage(new Error('fail')); // 'fail'
extractErrorMessage({ message: 'fail' }); // 'fail'
extractErrorMessage('string error'); // 'string error'
extractErrorMessage({ foo: 'bar' }); // '{"foo":"bar"}'
extractErrorMessage(null); // 'null'
// Wrap non-Error values into proper Error instances
const err = ensureError('string error');
err instanceof Error; // true
err.message; // 'string error'
err.cause; // 'string error' (original value preserved)classifyError()
The classifyError() function converts any error into a HelixError. It recognizes framework-specific error types and maps them to appropriate codes:
import { classifyError } from '@helix-agents/core';
const classified = classifyError(someUnknownError);
// Always returns a HelixError, never throwsClassification rules:
| Input | ErrorCode |
|---|---|
Already a HelixError | Passed through unchanged |
AbortError / DOMException('AbortError') | framework_cancelled |
StaleStateError | state_concurrency_conflict |
FencingTokenMismatchError | state_concurrency_conflict |
ExecutorSupersededError | state_concurrency_conflict |
AgentAlreadyRunningError | state_already_running |
AgentNotResumableError | state_not_resumable |
Any other Error | framework_internal_error |
| Non-Error value | framework_internal_error |
Error Codes
The ErrorCode type defines 22 specific codes organized by category:
Provider Errors
Errors from the LLM provider (OpenAI, Anthropic, etc.):
| Code | Description | Retryable |
|---|---|---|
provider_overloaded | Service at capacity (503, 529) | Yes |
provider_rate_limited | Rate limit hit (429) | Yes |
provider_auth_error | Invalid credentials (401, 403) | No |
provider_content_filtered | Content policy violation | No |
provider_refused | Request refused | No |
provider_timeout | Request timed out (408) | Yes |
provider_invalid_request | Malformed request (400, 422) | No |
provider_error | Generic provider error (5xx) | Depends |
Tool Errors
Errors during tool execution:
| Code | Description |
|---|---|
tool_input_invalid | LLM provided invalid arguments |
tool_execution_failed | Tool's execute() threw |
tool_not_found | Referenced tool doesn't exist |
tool_timeout | Tool exceeded time limit |
State Errors
Errors from the state store or session management:
| Code | Description |
|---|---|
state_concurrency_conflict | Concurrent modification detected |
state_session_not_found | Session doesn't exist |
state_already_running | Session has an active run |
state_not_resumable | Session can't be resumed |
Transport Errors
| Code | Description |
|---|---|
transport_error | Communication failure |
Validation Errors
| Code | Description |
|---|---|
validation_error | Input validation failed |
Framework Errors
| Code | Description |
|---|---|
framework_internal_error | Unexpected internal error |
framework_not_supported | Feature not supported |
framework_cancelled | Operation was cancelled |
Currently Produced vs Reserved Codes
The framework currently produces the following error codes:
Provider codes (via mapVercelError in llm-vercel):
provider_auth_error(401, 403)provider_rate_limited(429)provider_timeout(408)provider_invalid_request(400, 422)provider_overloaded(503, 529)provider_error(other 5xx, no status code)
State codes (via classifyError in core):
state_concurrency_conflict(StaleStateError, FencingTokenMismatchError, ExecutorSupersededError)state_already_running(AgentAlreadyRunningError)state_not_resumable(AgentNotResumableError)
Framework codes (via classifyError in core):
framework_cancelled(AbortError)framework_internal_error(generic Error, unknown values)
Reserved for future use (not currently produced):
provider_content_filtered- for content policy violationsprovider_refused- for model refusalstool_input_invalid- for Zod validation failures (currently uses generic error)tool_execution_failed- for tool execution errors (currently uses generic error)tool_not_found- for missing tools (currently uses generic error)tool_timeout- for tool timeouts (currently uses generic error)state_session_not_found- for missing sessions (currently throws, not classified)transport_error- for network failures (currently uses generic error)validation_error- for schema validation (currently uses generic error)framework_not_supported- for unsupported features (currently not used)
LLM Error Mapping
The @helix-agents/llm-vercel package maps Vercel AI SDK errors to HelixError using mapVercelError(). This runs automatically when using the Vercel adapter -- you don't call it directly in most cases.
import { mapVercelError } from '@helix-agents/llm-vercel';
const classified = mapVercelError(vercelSdkError);
// Returns a HelixError with appropriate code and retryabilityApiCallError Mapping
The Vercel AI SDK throws ApiCallError for HTTP-level failures. These are mapped by status code:
| HTTP Status | ErrorCode |
|---|---|
| 401, 403 | provider_auth_error |
| 408 | provider_timeout |
| 429 | provider_rate_limited |
| 400, 422 | provider_invalid_request |
| 503, 529 | provider_overloaded |
| Other 5xx | provider_error |
The retryable flag is preserved from the Vercel SDK's own isRetryable property.
RetryError Mapping
When the Vercel SDK exhausts its retry budget, it throws a RetryError wrapping the last error. mapVercelError() unwraps it, classifies the inner error, and marks the result as non-retryable (since retries already failed):
// RetryError containing a 429 ApiCallError:
// → code: 'provider_rate_limited'
// → retryable: false (retries exhausted)
// → message: 'Failed after retries: Rate limited'Fallback
If the error isn't an ApiCallError or RetryError, mapVercelError() falls back to classifyError() from core.
Runtime Error Handling
All three runtimes (JS, Temporal, Cloudflare) handle errors consistently through the LLM adapter's onError callback. When the LLM step produces an error, the runtime:
- Checks if the error is a
HelixErrorviaHelixError.isInstance() - Writes an
ErrorChunkto the stream with the classification data - If unclassified, writes with
recoverable: falseand no code
// What every runtime does internally (simplified):
onError: async (error: Error) => {
const isHelix = HelixError.isInstance(error);
await writer.write({
type: 'error',
agentId: state.sessionId,
agentType: state.agentType,
timestamp: Date.now(),
step: state.stepCount,
error: error.message,
...(isHelix && { code: error.code }),
recoverable: isHelix ? error.retryable : false,
});
},This means the frontend always receives structured error information when the error originated from a classified source (LLM provider, state store, etc.), and a safe fallback for unexpected errors.
Stream Protocol
Errors are transported as ErrorChunk events in the stream:
interface ErrorChunk {
type: 'error';
error: string; // Human-readable message
code?: string; // ErrorCode (present if classified)
recoverable: boolean; // Whether the client can retry
agentId: string;
agentType: string;
timestamp: number;
step: number;
}The code field is only present when the error was classified as a HelixError. The recoverable field maps to retryable from the backend classification -- the wire protocol uses "recoverable" while the TypeScript API uses "retryable".
For SSE transport, these chunks are serialized as JSON data events alongside all other stream chunk types.
Frontend Error Handling
The @helix-agents/ai-sdk package provides two error types for frontend use.
FrontendHandlerError
FrontendHandlerError represents HTTP-level errors from the FrontendHandler. These are thrown before the stream starts -- during request validation, session lookup, or stream creation.
import {
FrontendHandlerError,
ValidationError,
StreamNotFoundError,
StreamFailedError,
ExecutionError,
} from '@helix-agents/ai-sdk';Catch these in your API route handler:
export async function POST(req: Request) {
try {
return await handler.handleRequest(req);
} catch (error) {
if (error instanceof FrontendHandlerError) {
return new Response(JSON.stringify({ error: error.message, code: error.code }), {
status: error.statusCode,
});
}
throw error;
}
}Subclasses and their HTTP status codes:
| Error Class | Code | Status |
|---|---|---|
ValidationError | VALIDATION_ERROR | 400 |
StreamNotFoundError | STREAM_NOT_FOUND | 404 |
StreamReaderError | STREAM_READER_ERROR | 404 |
StreamFailedError | STREAM_FAILED | 410 |
ConfigurationError | CONFIGURATION_ERROR | 501 |
ExecutionError | EXECUTION_ERROR | 500 |
StreamCreationError | STREAM_CREATION_ERROR | 500 |
HelixStreamError
HelixStreamError represents errors received through the stream itself. Use HelixStreamError.fromEvent() to reconstruct a typed error from a stream error event:
import { HelixStreamError } from '@helix-agents/ai-sdk';
const { data } = useChat({ api: '/api/chat' });
const errorEvent = data?.find((d) => d.type === 'error');
if (errorEvent) {
const err = HelixStreamError.fromEvent(errorEvent);
console.log(err.message); // 'Rate limit exceeded'
console.log(err.code); // 'provider_rate_limited'
console.log(err.retryable); // true
}Retry Logic
Use the retryable flag to implement retry behavior:
const err = HelixStreamError.fromEvent(errorEvent);
if (err.retryable) {
// Safe to retry with backoff
await delay(2000);
await submitMessage(lastMessage);
} else {
// Show error to user, don't retry
setError(err.message);
}Code-Based Error Handling
Use the code field for specific error handling:
const err = HelixStreamError.fromEvent(errorEvent);
switch (err.code) {
case 'provider_rate_limited':
showToast('Too many requests. Retrying shortly...');
await retryWithBackoff();
break;
case 'provider_auth_error':
showToast('Authentication failed. Check your API key.');
break;
case 'provider_overloaded':
showToast('Service is busy. Please wait...');
await retryWithBackoff();
break;
default:
showToast(`Error: ${err.message}`);
}Temporal Serialization
The Temporal runtime serializes error information through DTOs to cross activity/workflow boundaries. Two schemas carry the errorCode:
AgentStepResultSchema includes errorCode for step-level errors:
const AgentStepResultSchema = z.object({
shouldContinue: z.boolean(),
isComplete: z.boolean(),
pendingToolCalls: z.array(/* ... */),
error: z.string().optional(),
recoverable: z.boolean().optional(),
errorCode: z.string().optional(), // HelixError.code
stopReason: z.custom<StopReason>(),
});FailAgentStreamInputSchema includes errorCode for stream-level failures:
const FailAgentStreamInputSchema = z.object({
sessionId: z.string().min(1),
error: z.string(),
errorCode: z.string().optional(), // HelixError.code
startedAt: z.number().optional(),
});This ensures that when a HelixError occurs inside a Temporal activity, its classification code survives serialization through the workflow boundary and reaches the stream with full type information.
See Also
- Stream Protocol — Wire format for all stream chunks including errors
- @helix-agents/core Reference — Full API reference for error utilities
- @helix-agents/ai-sdk Reference — Frontend error classes and stream handling
- @helix-agents/llm-vercel Reference — LLM adapter and error mapping