Skip to content

LLM Integration Overview

LLM adapters connect Helix Agents to language model providers. The adapter interface abstracts provider-specific APIs, allowing you to switch models without changing agent code.

The LLMAdapter Interface

Every LLM adapter implements the LLMAdapter interface:

typescript
interface LLMAdapter {
  generateStep(input: LLMGenerateInput): Promise<StepResult<unknown>>;
}

The adapter is responsible for:

  1. Message Conversion - Translating framework messages to provider format
  2. Tool Conversion - Converting Zod schemas to provider tool format
  3. Streaming - Invoking callbacks for real-time chunk delivery
  4. Response Parsing - Converting provider responses to StepResult
  5. Stop Reason Mapping - Normalizing provider finish reasons

Available Adapters

AdapterPackageProviders
VercelAIAdapter@helix-agents/llm-vercelOpenAI, Anthropic, Google, Cohere, Mistral, and more

The Vercel AI SDK adapter is the recommended choice for most applications, as it supports a wide range of providers through a unified interface.

LLMGenerateInput

The input passed to generateStep():

typescript
interface LLMGenerateInput {
  // Conversation history (including system prompt)
  messages: Message[];

  // Available tools (framework format, adapter converts)
  tools: Tool[];

  // LLM configuration
  config: LLMConfig;

  // Optional cancellation
  abortSignal?: AbortSignal;

  // Streaming callbacks
  callbacks?: LLMStreamCallbacks;

  // Agent context
  agentId: string;
  agentType: string;
}

LLMConfig

LLM configuration options:

typescript
interface LLMConfig {
  // Required: The language model to use
  model: LanguageModel;

  // Generation parameters
  maxOutputTokens?: number;
  temperature?: number; // 0-2
  topP?: number;
  topK?: number;
  presencePenalty?: number;
  frequencyPenalty?: number;

  // Control parameters
  stopSequences?: string[];
  seed?: number; // For deterministic outputs
  maxRetries?: number; // Default: 3

  // Prompt caching
  caching?: 'auto' | false;

  // HTTP configuration
  headers?: Record<string, string>;

  // Provider-specific options
  providerOptions?: Record<string, Record<string, JSONValue>>;
}

Prompt Caching

Enable automatic prompt caching to reduce cost and latency for repeated conversations:

typescript
const agent = defineAgent({
  name: 'my-agent',
  systemPrompt: 'You are a helpful assistant.',
  llmConfig: {
    model: anthropic('claude-sonnet-4-20250514'),
    caching: 'auto', // Enable automatic cache breakpoints
  },
});

When caching is 'auto', the framework automatically applies provider-specific caching optimizations before each LLM call. Different providers use different mechanisms:

ProviderMechanismWhat the Framework Does
AnthropicExplicit cache_control markersPlaces breakpoints on system prompt, last tool, and conversation boundary
OpenAIImplicit prefix caching with routing hintsSets promptCacheKey from sessionId for cache affinity
Google GeminiAutomatic prefix cachingNo action needed (caching is built-in)
xAI/GrokImplicit with conversation routingSets x-grok-conv-id header from sessionId

Cache token usage is tracked through the standard usage tracking system and Langfuse tracing.

See the Vercel AI SDK Adapter - Prompt Caching section for detailed provider examples.

Provider Options

Enable provider-specific features:

typescript
// OpenAI reasoning summaries (o1, o3, o4-mini)
const config: LLMConfig = {
  model: openai('o1'),
  providerOptions: {
    openai: {
      reasoningSummary: 'detailed',
      reasoningEffort: 'high',
    },
  },
};

// Anthropic extended thinking (Claude)
const config: LLMConfig = {
  model: anthropic('claude-sonnet-4-20250514'),
  providerOptions: {
    anthropic: {
      thinking: { type: 'enabled', budgetTokens: 10000 },
    },
  },
};

Streaming Callbacks

Real-time streaming during generation:

typescript
interface LLMStreamCallbacks {
  // Called for each text token
  onTextDelta?: (delta: string) => void;

  // Called for thinking/reasoning content
  onThinking?: (content: string, isComplete: boolean) => void;

  // Called when LLM requests a tool
  onToolCall?: (toolCall: ParsedToolCall) => void;

  // Called on generation errors
  onError?: (error: Error) => void;
}

These callbacks enable real-time UI updates as the model generates content.

StepResult

The response from generateStep():

typescript
type StepResult<TOutput> =
  | TextStepResult
  | ToolCallsStepResult
  | StructuredOutputStepResult<TOutput>
  | ErrorStepResult;

Text Response

typescript
interface TextStepResult {
  type: 'text';
  content: string;
  thinking?: ThinkingContent;
  shouldStop: boolean;
  stopReason: StopReason;
}

Tool Calls

typescript
interface ToolCallsStepResult {
  type: 'tool_calls';
  toolCalls: ParsedToolCall[];
  subAgentCalls: ParsedSubAgentCall[];
  content?: string; // Text accompanying tool calls
  thinking?: ThinkingContent;
  shouldStop: false;
  stopReason: StopReason;
}

Structured Output

typescript
interface StructuredOutputStepResult<TOutput> {
  type: 'structured_output';
  output: TOutput; // Validated by outputSchema
  thinking?: ThinkingContent;
  shouldStop: true;
  stopReason: StopReason;
}

Error

typescript
interface ErrorStepResult {
  type: 'error';
  error: Error;
  shouldStop: boolean;
  stopReason: StopReason;
}

Stop Reason Normalization

Different LLM providers return different finish reasons. The framework normalizes them:

typescript
type StopReason =
  | 'end_turn' // Natural completion
  | 'tool_use' // Tool calls requested (not a stop)
  | 'max_tokens' // Token limit reached (FAILS agent)
  | 'content_filter' // Safety filter (FAILS agent)
  | 'refusal' // Model refused (FAILS agent)
  | 'stop_sequence' // Stop sequence hit (completes normally)
  | 'error' // Generation error (FAILS agent)
  | 'unknown'; // Unrecognized (FAILS agent)

Provider Mappings

FrameworkOpenAIAnthropicVercel AI SDK
end_turnstopend_turnstop
tool_usetool_callstool_usetool-calls
max_tokenslengthmax_tokenslength
content_filtercontent_filter-content-filter
refusal-refusal-

Terminal vs Continuation

Stop reasons determine agent behavior:

typescript
// Terminal conditions (agent fails)
function isErrorStopReason(reason: StopReason): boolean {
  return (
    reason === 'max_tokens' ||
    reason === 'content_filter' ||
    reason === 'refusal' ||
    reason === 'error' ||
    reason === 'unknown'
  );
}

// Recoverable errors (retryable for outputSchema agents)
function isRecoverableErrorStopReason(reason: StopReason): boolean {
  return reason === 'max_tokens';
}

// Continue execution (not a real stop)
// tool_use -> execute tools, then continue

// Normal completion
// end_turn, stop_sequence -> agent completes successfully

Completion retry for max_tokens: When an agent has an outputSchema and the LLM returns text with a max_tokens stop reason, the framework treats this as a recoverable error. Instead of failing immediately, it appends a correction message instructing the model to call the completion tool directly, then retries the step. This handles the common case where the model writes a long text response instead of calling __finish__, causing the response to be truncated. Non-recoverable errors (content_filter, refusal, error, unknown) still fail immediately.

Thinking/Reasoning Content

Some models provide reasoning traces:

typescript
interface ThinkingContent {
  content: string; // The reasoning text
}

Supported providers:

  • Anthropic Claude - Extended thinking via providerOptions.anthropic.thinking
  • OpenAI o-series - Reasoning summaries via providerOptions.openai.reasoningSummary

Thinking content is:

  • Streamed via onThinking callback
  • Stored in StepResult.thinking
  • Persisted in message history (if your state store supports it)
  • Streamed as thinking chunks to clients

Composability

The adapter interface enables composability:

typescript
// Development: Use mock adapter
const adapter = new MockLLMAdapter({
  responses: [{ type: 'text', content: 'Test response' }],
});

// Production: Use Vercel adapter
const adapter = new VercelAIAdapter();

// Both work with any runtime
const executor = new JSAgentExecutor(stateStore, streamManager, adapter);

You can swap adapters without changing agent definitions or runtime code.

Next Steps

Released under the MIT License.