Skip to content

Vercel AI SDK Adapter

The Vercel AI SDK adapter (@helix-agents/llm-vercel) connects Helix Agents to any LLM provider supported by the Vercel AI SDK. This is the recommended adapter for most applications.

When to Use

Good fit:

  • Production applications
  • Multiple provider support needed
  • Streaming responses required
  • Using OpenAI, Anthropic, Google, or other major providers

Not ideal for:

  • Unit testing (use MockLLMAdapter instead)
  • Custom/private LLM APIs not in Vercel AI SDK

Installation

bash
npm install @helix-agents/llm-vercel ai

Also install provider packages for your chosen models:

bash
# OpenAI
npm install @ai-sdk/openai

# Anthropic
npm install @ai-sdk/anthropic

# Google
npm install @ai-sdk/google

Basic Usage

typescript
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { JSAgentExecutor } from '@helix-agents/runtime-js';
import { InMemoryStateStore, InMemoryStreamManager } from '@helix-agents/store-memory';
import { openai } from '@ai-sdk/openai';

// Create adapter
const adapter = new VercelAIAdapter();

// Create executor
const executor = new JSAgentExecutor(
  new InMemoryStateStore(),
  new InMemoryStreamManager(),
  adapter
);

// Define agent with Vercel AI SDK model
const agent = defineAgent({
  name: 'assistant',
  systemPrompt: 'You are a helpful assistant.',
  llmConfig: {
    model: openai('gpt-4o'),
    temperature: 0.7,
  },
});

Supported Providers

The Vercel AI SDK supports many providers:

ProviderPackageExample Model
OpenAI@ai-sdk/openaiopenai('gpt-4o')
Anthropic@ai-sdk/anthropicanthropic('claude-sonnet-4-20250514')
Google@ai-sdk/googlegoogle('gemini-1.5-pro')
Cohere@ai-sdk/coherecohere('command-r-plus')
Mistral@ai-sdk/mistralmistral('mistral-large-latest')
Amazon Bedrock@ai-sdk/amazon-bedrockVarious models
Azure OpenAI@ai-sdk/azureAzure-hosted models

See the Vercel AI SDK documentation for the full list.

Configuration

Model Configuration

typescript
const agent = defineAgent({
  name: 'my-agent',
  systemPrompt: 'You are a helpful assistant.',
  llmConfig: {
    // Required: The model to use
    model: openai('gpt-4o'),

    // Generation parameters
    temperature: 0.7, // 0-2, higher = more creative
    maxOutputTokens: 4096, // Maximum tokens to generate
    topP: 0.95, // Nucleus sampling
    topK: 40, // Top-k sampling

    // Penalties
    presencePenalty: 0, // Reduce repetition of topics
    frequencyPenalty: 0, // Reduce repetition of tokens

    // Control
    stopSequences: ['END'], // Stop generation at these sequences
    seed: 12345, // For deterministic outputs

    // Reliability
    maxRetries: 3, // Retry on transient failures

    // import { anthropicCache } from '@helix-agents/core';
    // Prompt caching is opt-in. Supply a provider-specific strategy from
    // `@helix-agents/core` that matches your model (here: Anthropic).
    cache: anthropicCache({ ttl: '1h' }),
  },
});

Provider-Specific Options

Important: Reasoning features require AI SDK provider packages v3+:

  • @ai-sdk/openai@^3.0.0
  • @ai-sdk/anthropic@^3.0.0

Earlier v2.x versions use specificationVersion: "v2" which triggers compatibility mode in AI SDK v6, stripping reasoning features.

Enable features specific to certain providers:

typescript
// OpenAI o-series reasoning
const agent = defineAgent({
  name: 'reasoning-agent',
  systemPrompt: 'Solve complex problems step by step.',
  llmConfig: {
    model: openai('o1'),
    providerOptions: {
      openai: {
        reasoningSummary: 'detailed',
        reasoningEffort: 'high',
      },
    },
  },
});

// Anthropic extended thinking
const agent = defineAgent({
  name: 'thinking-agent',
  systemPrompt: 'Think through problems carefully.',
  llmConfig: {
    model: anthropic('claude-sonnet-4-20250514'),
    providerOptions: {
      anthropic: {
        thinking: {
          type: 'enabled',
          budgetTokens: 10000,
        },
      },
    },
  },
});

Dynamic Configuration

Override LLM config based on agent state:

typescript
const agent = defineAgent({
  name: 'adaptive-agent',
  stateSchema: z.object({
    complexity: z.enum(['simple', 'complex']),
    stepCount: z.number(),
  }),
  llmConfig: {
    model: openai('gpt-4o-mini'),
    temperature: 0.5,
  },
  llmConfigOverride: (customState, stepCount) => {
    // Use more powerful model for complex tasks
    if (customState.complexity === 'complex') {
      return {
        model: openai('gpt-4o'),
        temperature: 0.2,
        maxOutputTokens: 8192,
      };
    }

    // Increase temperature over time for variety
    if (stepCount > 5) {
      return { temperature: 0.8 };
    }

    return {};
  },
});

Prompt Caching

Prompt caching reduces cost and latency by reusing cached prompt prefixes across LLM calls. It is opt-in and provider-specific: you choose a cache strategy that matches your model and set it on llmConfig.cache. When cache is unset, no caching is applied. The framework performs no provider detection — pick the helper that matches your model.

typescript
import { anthropicCache } from '@helix-agents/core';

const agent = defineAgent({
  name: 'cached-agent',
  systemPrompt: 'You are a helpful assistant with detailed instructions...',
  llmConfig: {
    model: anthropic('claude-sonnet-4-20250514'),
    cache: anthropicCache({ ttl: '1h' }),
  },
});

Shipped strategies

HelperProviderWhat it does
anthropicCache({ ttl })Anthropic (Claude)Places cache_control markers on the system prompt, the tool definitions, and a rolling pair of conversation breakpoints. ttl is '<N>m' / '<N>h' (default '1h'), passed through to the provider for validation.
openaiCache()OpenAI (GPT-4o, o-series, …)Sets providerOptions.openai.promptCacheKey from the session ID, giving repeated requests in a session cache affinity.
xaiCache()xAI / GrokSets the x-grok-conv-id header from the session ID for conversation-level cache routing.

Google / Gemini needs no helper — it uses implicit prefix caching server-side, so there is nothing to annotate. There is intentionally no googleCache().

Each helper only does its provider's thing; the framework never detects providers, so the helper you choose must match the model you configured.

How anthropicCache places breakpoints

anthropicCache({ ttl }) marks, up to Anthropic's four-breakpoint limit:

  1. The last system message (caches the system prompt).
  2. The last tool definition (caches the tool schema).
  3. The end of the most recent turn — including tool-result turns — so the next step reads the entire prior prefix from cache.
  4. The end of the previous turn — a rolling second anchor. Anthropic only looks back ~20 content blocks from a breakpoint to find a prior cache entry, so a single tool-heavy turn (many parallel tool calls + results) could push the latest-turn breakpoint out of range and silently re-process the whole history. The previous-turn anchor sits where the prior request's breakpoint landed, keeping the history cached regardless of how large the latest turn is.

Composing strategies

cache also accepts an array of strategies, applied in order — useful for layering a provider strategy with your own:

typescript
llmConfig: {
  model: anthropic('claude-sonnet-4-20250514'),
  cache: [anthropicCache(), myCustomStrategy],
}

Writing a custom strategy

A CacheStrategy is a pure function (CacheRequest) => CacheResult. It can annotate messages/tools (returned in messages / tools) or add request-level options (providerOptions / headers). The CacheStrategy, CacheRequest, and CacheResult types — and applyCacheStrategies, the provider-agnostic folder the runtimes use — are exported from @helix-agents/core:

typescript
import type { CacheStrategy } from '@helix-agents/core';

const tagConversation: CacheStrategy = ({ context }) => ({
  headers: { 'x-my-conv-id': context.sessionId },
});

Cache Token Tracking

Cache hit/miss metrics flow through the standard token usage pipeline:

typescript
// In afterLLMCall hook
hooks: {
  afterLLMCall: (payload, ctx) => {
    if (payload.usage) {
      console.log(`Prompt tokens: ${payload.usage.promptTokens}`);
      console.log(`Cached tokens: ${payload.usage.cachedTokens}`);      // Cache hits
      console.log(`Cache writes: ${payload.usage.cacheWriteTokens}`);   // New cache entries
    }
  },
}

Cache tokens also appear in:

  • Stream chunks: step_end chunks include cachedTokens and cacheWriteTokens in their usage field
  • Usage tracking: The TokenCounts rollup includes cached and cacheWrite fields
  • Langfuse tracing: Mapped to cache_read_input_tokens and cache_creation_input_tokens

Streaming

The adapter supports real-time streaming:

typescript
// Streaming happens automatically in execute()
const handle = await executor.execute(agent, 'Research AI agents');

// Get the stream
const stream = await handle.stream();
if (stream) {
  for await (const chunk of stream) {
    switch (chunk.type) {
      case 'text_delta':
        process.stdout.write(chunk.delta);
        break;
      case 'thinking':
        console.log('[Thinking]', chunk.content);
        break;
      case 'tool_start':
        console.log(`[Tool: ${chunk.toolName}]`);
        break;
    }
  }
}

Chunk Mapping

The adapter maps Vercel AI SDK stream parts to framework chunks:

Vercel AI SDKFrameworkNotes
text-deltatext_deltaGenerated text tokens
reasoning-deltathinkingReasoning/thinking content
tool-input-starttool_startTool call begins
tool-calltool_startComplete tool call
tool-resulttool_endTool result
errorerrorGeneration error

Thinking/Reasoning Content

Both Anthropic and OpenAI support reasoning features:

Anthropic Extended Thinking

typescript
const agent = defineAgent({
  name: 'claude-thinker',
  llmConfig: {
    model: anthropic('claude-sonnet-4-20250514'),
    providerOptions: {
      anthropic: {
        thinking: {
          type: 'enabled',
          budgetTokens: 10000, // Token budget for thinking
        },
      },
    },
  },
});

// Thinking content streams via 'thinking' chunks
for await (const chunk of stream) {
  if (chunk.type === 'thinking') {
    console.log('[Claude thinking...]', chunk.content);
  }
}

OpenAI Reasoning

typescript
const agent = defineAgent({
  name: 'o1-reasoner',
  llmConfig: {
    model: openai('o1'),
    providerOptions: {
      openai: {
        reasoningSummary: 'detailed', // or 'concise'
        reasoningEffort: 'high', // or 'medium', 'low'
      },
    },
  },
});

Message Conversion

The adapter converts framework messages to Vercel AI SDK format:

Framework → Vercel AI SDK

typescript
// Framework format
const messages: Message[] = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hello' },
  {
    role: 'assistant',
    content: 'I will search for that.',
    toolCalls: [{ id: 'tc1', name: 'search', arguments: { q: 'test' } }],
  },
  {
    role: 'tool',
    toolCallId: 'tc1',
    toolName: 'search',
    content: JSON.stringify({ results: [] }),
  },
];

// Automatically converted to Vercel AI SDK ModelMessage[]

The conversion handles:

  • System, user, and assistant messages
  • Tool calls in assistant messages
  • Tool results in tool messages
  • Mixed text + tool call content

The adapter also coerces non-object tool-call inputs to objects — at ingestion, in stream-chunk mapping, and during message conversion (which heals any string arguments already persisted in durable stores before this guarantee existed). This complements the runtime-agnostic coercion in core's planStepProcessing(); together they ensure a tool input is never replayed as a non-object tool_use.input (which the provider rejects). Structured output is left to core's schema-aware repair, so non-object outputSchemas are preserved. See Robust tool inputs.

Tool Conversion

Framework tools (with Zod schemas) are converted to Vercel AI SDK tools:

typescript
// Framework tool
const searchTool = defineTool({
  name: 'search',
  description: 'Search the web',
  inputSchema: z.object({
    query: z.string(),
    limit: z.number().optional(),
  }),
  execute: async (input, ctx) => {
    // ...
  },
});

// Automatically converted to Vercel AI SDK tool format
// The Zod schema is passed directly (AI SDK 5.x supports Zod)

Error Handling

The adapter handles errors gracefully:

typescript
const adapter = new VercelAIAdapter({
  logger: console, // Optional: log warnings
});

// Errors are returned as ErrorStepResult, not thrown
const result = await adapter.generateStep(input);

if (result.type === 'error') {
  console.error('LLM error:', result.error.message);
  // Framework handles this appropriately
}

Retry Configuration

Configure retries for transient failures:

typescript
const agent = defineAgent({
  llmConfig: {
    model: openai('gpt-4o'),
    maxRetries: 5, // Retry up to 5 times on transient errors
  },
});

Logger Integration

Pass a custom logger for debug output:

typescript
import { VercelAIAdapter } from '@helix-agents/llm-vercel';

const logger = {
  debug: (msg: string) => console.debug(`[DEBUG] ${msg}`),
  info: (msg: string) => console.info(`[INFO] ${msg}`),
  warn: (msg: string) => console.warn(`[WARN] ${msg}`),
  error: (msg: string) => console.error(`[ERROR] ${msg}`),
};

const adapter = new VercelAIAdapter({ logger });

Complete Example

typescript
import { defineAgent, defineTool } from '@helix-agents/core';
import { JSAgentExecutor } from '@helix-agents/runtime-js';
import { InMemoryStateStore, InMemoryStreamManager } from '@helix-agents/store-memory';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

// Create adapter
const adapter = new VercelAIAdapter();

// Define tool
const searchTool = defineTool({
  name: 'web_search',
  description: 'Search the web for information',
  inputSchema: z.object({
    query: z.string().describe('Search query'),
  }),
  outputSchema: z.object({
    results: z.array(z.string()),
  }),
  execute: async (input) => {
    // Simulate search
    return { results: [`Result for: ${input.query}`] };
  },
});

// Define agent
const ResearchAgent = defineAgent({
  name: 'researcher',
  description: 'Researches topics using web search',
  systemPrompt: `You are a research assistant.
Use the web_search tool to find information.
Summarize your findings clearly.`,
  tools: [searchTool],
  outputSchema: z.object({
    summary: z.string(),
    sources: z.array(z.string()),
  }),
  llmConfig: {
    model: openai('gpt-4o'),
    temperature: 0.3,
    maxOutputTokens: 2048,
  },
});

// Create executor
const executor = new JSAgentExecutor(
  new InMemoryStateStore(),
  new InMemoryStreamManager(),
  adapter
);

// Execute
async function main() {
  const handle = await executor.execute(
    ResearchAgent,
    'What are the latest developments in AI agents?'
  );

  // Stream output
  const stream = await handle.stream();
  if (stream) {
    for await (const chunk of stream) {
      if (chunk.type === 'text_delta') {
        process.stdout.write(chunk.delta);
      }
    }
  }

  // Get result
  const result = await handle.result();
  console.log('\n\nResult:', result.output);
}

main();

Limitations

Model-Specific Features

Not all features work with all models:

  • Thinking/reasoning: Only Anthropic Claude and OpenAI o-series
  • Tool calling: Most models, but check provider docs
  • JSON mode: Provider-specific implementation

Token Counting

The adapter doesn't provide token counting. Use provider SDKs directly for token estimation.

Image/Multimodal

The framework supports file uploads (images, PDFs, etc.) via the files field in AgentInput. Files are converted to ContentPart[] alongside the text message and passed to the LLM. This works across all runtimes (JS, Temporal, Cloudflare).

typescript
await executor.execute(
  agent,
  {
    message: 'Describe this image',
    files: [
      {
        data: base64EncodedData,
        mediaType: 'image/png',
        filename: 'screenshot.png', // optional
      },
    ],
  },
  { sessionId }
);

Next Steps

Released under the MIT License.