Vercel AI SDK Adapter
The Vercel AI SDK adapter (@helix-agents/llm-vercel) connects Helix Agents to any LLM provider supported by the Vercel AI SDK. This is the recommended adapter for most applications.
When to Use
Good fit:
- Production applications
- Multiple provider support needed
- Streaming responses required
- Using OpenAI, Anthropic, Google, or other major providers
Not ideal for:
- Unit testing (use MockLLMAdapter instead)
- Custom/private LLM APIs not in Vercel AI SDK
Installation
npm install @helix-agents/llm-vercel aiAlso install provider packages for your chosen models:
# OpenAI
npm install @ai-sdk/openai
# Anthropic
npm install @ai-sdk/anthropic
# Google
npm install @ai-sdk/googleBasic Usage
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { JSAgentExecutor } from '@helix-agents/runtime-js';
import { InMemoryStateStore, InMemoryStreamManager } from '@helix-agents/store-memory';
import { openai } from '@ai-sdk/openai';
// Create adapter
const adapter = new VercelAIAdapter();
// Create executor
const executor = new JSAgentExecutor(
new InMemoryStateStore(),
new InMemoryStreamManager(),
adapter
);
// Define agent with Vercel AI SDK model
const agent = defineAgent({
name: 'assistant',
systemPrompt: 'You are a helpful assistant.',
llmConfig: {
model: openai('gpt-4o'),
temperature: 0.7,
},
});Supported Providers
The Vercel AI SDK supports many providers:
| Provider | Package | Example Model |
|---|---|---|
| OpenAI | @ai-sdk/openai | openai('gpt-4o') |
| Anthropic | @ai-sdk/anthropic | anthropic('claude-sonnet-4-20250514') |
@ai-sdk/google | google('gemini-1.5-pro') | |
| Cohere | @ai-sdk/cohere | cohere('command-r-plus') |
| Mistral | @ai-sdk/mistral | mistral('mistral-large-latest') |
| Amazon Bedrock | @ai-sdk/amazon-bedrock | Various models |
| Azure OpenAI | @ai-sdk/azure | Azure-hosted models |
See the Vercel AI SDK documentation for the full list.
Configuration
Model Configuration
const agent = defineAgent({
name: 'my-agent',
systemPrompt: 'You are a helpful assistant.',
llmConfig: {
// Required: The model to use
model: openai('gpt-4o'),
// Generation parameters
temperature: 0.7, // 0-2, higher = more creative
maxOutputTokens: 4096, // Maximum tokens to generate
topP: 0.95, // Nucleus sampling
topK: 40, // Top-k sampling
// Penalties
presencePenalty: 0, // Reduce repetition of topics
frequencyPenalty: 0, // Reduce repetition of tokens
// Control
stopSequences: ['END'], // Stop generation at these sequences
seed: 12345, // For deterministic outputs
// Reliability
maxRetries: 3, // Retry on transient failures
// Prompt caching (automatic provider-specific optimization)
caching: 'auto',
},
});Provider-Specific Options
Important: Reasoning features require AI SDK provider packages v3+:
@ai-sdk/openai@^3.0.0@ai-sdk/anthropic@^3.0.0Earlier v2.x versions use
specificationVersion: "v2"which triggers compatibility mode in AI SDK v6, stripping reasoning features.
Enable features specific to certain providers:
// OpenAI o-series reasoning
const agent = defineAgent({
name: 'reasoning-agent',
systemPrompt: 'Solve complex problems step by step.',
llmConfig: {
model: openai('o1'),
providerOptions: {
openai: {
reasoningSummary: 'detailed',
reasoningEffort: 'high',
},
},
},
});
// Anthropic extended thinking
const agent = defineAgent({
name: 'thinking-agent',
systemPrompt: 'Think through problems carefully.',
llmConfig: {
model: anthropic('claude-sonnet-4-20250514'),
providerOptions: {
anthropic: {
thinking: {
type: 'enabled',
budgetTokens: 10000,
},
},
},
},
});Dynamic Configuration
Override LLM config based on agent state:
const agent = defineAgent({
name: 'adaptive-agent',
stateSchema: z.object({
complexity: z.enum(['simple', 'complex']),
stepCount: z.number(),
}),
llmConfig: {
model: openai('gpt-4o-mini'),
temperature: 0.5,
},
llmConfigOverride: (customState, stepCount) => {
// Use more powerful model for complex tasks
if (customState.complexity === 'complex') {
return {
model: openai('gpt-4o'),
temperature: 0.2,
maxOutputTokens: 8192,
};
}
// Increase temperature over time for variety
if (stepCount > 5) {
return { temperature: 0.8 };
}
return {};
},
});Prompt Caching
Prompt caching reduces cost and latency by reusing cached prompt prefixes across LLM calls. Set caching: 'auto' in your agent's llmConfig to enable automatic caching:
const agent = defineAgent({
name: 'cached-agent',
systemPrompt: 'You are a helpful assistant with detailed instructions...',
llmConfig: {
model: anthropic('claude-sonnet-4-20250514'),
caching: 'auto',
},
});The framework automatically detects the provider and applies the appropriate caching strategy. No provider-specific code is needed in your agent definition.
How It Works
When caching: 'auto' is set, the framework calls applyCacheBreakpoints() before each LLM call. This pure function inspects the model's provider metadata and applies provider-specific optimizations:
Anthropic (Claude) - Places cache_control: { type: 'ephemeral' } markers on:
- The last system message (caches the system prompt)
- The last tool definition (caches the tool schema)
- The conversation boundary (caches older conversation history)
// What happens automatically under the hood for Anthropic:
// messages[0].providerOptions = { anthropic: { cacheControl: { type: 'ephemeral' } } }
// tools[lastIndex].providerOptions = { anthropic: { cacheControl: { type: 'ephemeral' } } }OpenAI (GPT-4o, o1, o3, etc.) - Sets a promptCacheKey provider option derived from the session ID, enabling cache affinity for repeated conversations within the same session:
// What happens automatically for OpenAI:
// providerOptions: { openai: { promptCacheKey: sessionId } }Google Gemini - No action needed. Gemini uses automatic prefix caching built into the API. The framework detects Google/Vertex providers and skips annotation.
xAI/Grok - Sets the x-grok-conv-id header from the session ID for conversation-level cache routing:
// What happens automatically for xAI:
// headers: { 'x-grok-conv-id': sessionId }Cache Token Tracking
Cache hit/miss metrics flow through the standard token usage pipeline:
// In afterLLMCall hook
hooks: {
afterLLMCall: (payload, ctx) => {
if (payload.usage) {
console.log(`Prompt tokens: ${payload.usage.promptTokens}`);
console.log(`Cached tokens: ${payload.usage.cachedTokens}`); // Cache hits
console.log(`Cache writes: ${payload.usage.cacheWriteTokens}`); // New cache entries
}
},
}Cache tokens also appear in:
- Stream chunks:
step_endchunks includecachedTokensandcacheWriteTokensin their usage field - Usage tracking: The
TokenCountsrollup includescachedandcacheWritefields - Langfuse tracing: Mapped to
cache_read_input_tokensandcache_creation_input_tokens
Custom Cache Control
For advanced use cases, you can set providerOptions directly on messages, content parts, and tools instead of using caching: 'auto':
// Manual Anthropic cache control on a specific message
const messages: Message[] = [
{
role: 'system',
content: 'Expensive system prompt...',
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } },
},
},
{ role: 'user', content: 'Hello' },
];When using manual providerOptions, omit caching: 'auto' to avoid the framework overwriting your markers.
Streaming
The adapter supports real-time streaming:
// Streaming happens automatically in execute()
const handle = await executor.execute(agent, 'Research AI agents');
// Get the stream
const stream = await handle.stream();
if (stream) {
for await (const chunk of stream) {
switch (chunk.type) {
case 'text_delta':
process.stdout.write(chunk.delta);
break;
case 'thinking':
console.log('[Thinking]', chunk.content);
break;
case 'tool_start':
console.log(`[Tool: ${chunk.toolName}]`);
break;
}
}
}Chunk Mapping
The adapter maps Vercel AI SDK stream parts to framework chunks:
| Vercel AI SDK | Framework | Notes |
|---|---|---|
text-delta | text_delta | Generated text tokens |
reasoning-delta | thinking | Reasoning/thinking content |
tool-input-start | tool_start | Tool call begins |
tool-call | tool_start | Complete tool call |
tool-result | tool_end | Tool result |
error | error | Generation error |
Thinking/Reasoning Content
Both Anthropic and OpenAI support reasoning features:
Anthropic Extended Thinking
const agent = defineAgent({
name: 'claude-thinker',
llmConfig: {
model: anthropic('claude-sonnet-4-20250514'),
providerOptions: {
anthropic: {
thinking: {
type: 'enabled',
budgetTokens: 10000, // Token budget for thinking
},
},
},
},
});
// Thinking content streams via 'thinking' chunks
for await (const chunk of stream) {
if (chunk.type === 'thinking') {
console.log('[Claude thinking...]', chunk.content);
}
}OpenAI Reasoning
const agent = defineAgent({
name: 'o1-reasoner',
llmConfig: {
model: openai('o1'),
providerOptions: {
openai: {
reasoningSummary: 'detailed', // or 'concise'
reasoningEffort: 'high', // or 'medium', 'low'
},
},
},
});Message Conversion
The adapter converts framework messages to Vercel AI SDK format:
Framework → Vercel AI SDK
// Framework format
const messages: Message[] = [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello' },
{
role: 'assistant',
content: 'I will search for that.',
toolCalls: [{ id: 'tc1', name: 'search', arguments: { q: 'test' } }],
},
{
role: 'tool',
toolCallId: 'tc1',
toolName: 'search',
content: JSON.stringify({ results: [] }),
},
];
// Automatically converted to Vercel AI SDK ModelMessage[]The conversion handles:
- System, user, and assistant messages
- Tool calls in assistant messages
- Tool results in tool messages
- Mixed text + tool call content
Tool Conversion
Framework tools (with Zod schemas) are converted to Vercel AI SDK tools:
// Framework tool
const searchTool = defineTool({
name: 'search',
description: 'Search the web',
inputSchema: z.object({
query: z.string(),
limit: z.number().optional(),
}),
execute: async (input, ctx) => {
// ...
},
});
// Automatically converted to Vercel AI SDK tool format
// The Zod schema is passed directly (AI SDK 5.x supports Zod)Error Handling
The adapter handles errors gracefully:
const adapter = new VercelAIAdapter({
logger: console, // Optional: log warnings
});
// Errors are returned as ErrorStepResult, not thrown
const result = await adapter.generateStep(input);
if (result.type === 'error') {
console.error('LLM error:', result.error.message);
// Framework handles this appropriately
}Retry Configuration
Configure retries for transient failures:
const agent = defineAgent({
llmConfig: {
model: openai('gpt-4o'),
maxRetries: 5, // Retry up to 5 times on transient errors
},
});Logger Integration
Pass a custom logger for debug output:
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
const logger = {
debug: (msg: string) => console.debug(`[DEBUG] ${msg}`),
info: (msg: string) => console.info(`[INFO] ${msg}`),
warn: (msg: string) => console.warn(`[WARN] ${msg}`),
error: (msg: string) => console.error(`[ERROR] ${msg}`),
};
const adapter = new VercelAIAdapter({ logger });Complete Example
import { defineAgent, defineTool } from '@helix-agents/core';
import { JSAgentExecutor } from '@helix-agents/runtime-js';
import { InMemoryStateStore, InMemoryStreamManager } from '@helix-agents/store-memory';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
// Create adapter
const adapter = new VercelAIAdapter();
// Define tool
const searchTool = defineTool({
name: 'web_search',
description: 'Search the web for information',
inputSchema: z.object({
query: z.string().describe('Search query'),
}),
outputSchema: z.object({
results: z.array(z.string()),
}),
execute: async (input) => {
// Simulate search
return { results: [`Result for: ${input.query}`] };
},
});
// Define agent
const ResearchAgent = defineAgent({
name: 'researcher',
description: 'Researches topics using web search',
systemPrompt: `You are a research assistant.
Use the web_search tool to find information.
Summarize your findings clearly.`,
tools: [searchTool],
outputSchema: z.object({
summary: z.string(),
sources: z.array(z.string()),
}),
llmConfig: {
model: openai('gpt-4o'),
temperature: 0.3,
maxOutputTokens: 2048,
},
});
// Create executor
const executor = new JSAgentExecutor(
new InMemoryStateStore(),
new InMemoryStreamManager(),
adapter
);
// Execute
async function main() {
const handle = await executor.execute(
ResearchAgent,
'What are the latest developments in AI agents?'
);
// Stream output
const stream = await handle.stream();
if (stream) {
for await (const chunk of stream) {
if (chunk.type === 'text_delta') {
process.stdout.write(chunk.delta);
}
}
}
// Get result
const result = await handle.result();
console.log('\n\nResult:', result.output);
}
main();Limitations
Model-Specific Features
Not all features work with all models:
- Thinking/reasoning: Only Anthropic Claude and OpenAI o-series
- Tool calling: Most models, but check provider docs
- JSON mode: Provider-specific implementation
Token Counting
The adapter doesn't provide token counting. Use provider SDKs directly for token estimation.
Image/Multimodal
The framework supports file uploads (images, PDFs, etc.) via the files field in AgentInput. Files are converted to ContentPart[] alongside the text message and passed to the LLM. This works across all runtimes (JS, Temporal, Cloudflare).
await executor.execute(
agent,
{
message: 'Describe this image',
files: [
{
data: base64EncodedData,
mediaType: 'image/png',
filename: 'screenshot.png', // optional
},
],
},
{ sessionId }
);Next Steps
- LLM Overview - Understanding the adapter interface
- Custom Adapters - Building your own adapter
- Streaming - Real-time streaming deep dive