Tracing & Observability
Tracing provides visibility into agent execution for debugging, performance analysis, and cost tracking. Helix Agents integrates with Langfuse for comprehensive LLM observability.
Overview
Tracing captures:
- Agent Runs - Full execution lifecycle with timing and status
- LLM Calls - Model, tokens, latency, prompts and responses
- Token Tracking - Standard, reasoning, and cached token usage
- Tool Executions - Arguments, results, and timing
- Sub-Agent Calls - Nested traces with parent-child relationships
- Metadata - User attribution, session grouping, custom tags
Why Trace?
- Debugging - Understand why an agent behaved a certain way
- Performance - Identify slow LLM calls or inefficient tool usage
- Cost Tracking - Monitor token usage across users and features
- Quality - Evaluate agent outputs and improve prompts
- Compliance - Audit trail of LLM interactions
Runtime Compatibility
Tracing is fully stateless — no shared in-memory state between hooks. This means it works identically across all runtimes:
- JS Runtime (
JSAgentExecutor) — hooks run in-process - Temporal Runtime (
TemporalAgentExecutor) — hooks run as activities, potentially on different worker pods - Cloudflare Runtime — hooks run in Workers/Durable Objects
No configuration changes are needed when switching runtimes. Define hooks once, use them everywhere.
Quick Start
1. Install the Package
npm install @helix-agents/tracing-langfuse @langfuse/tracing @langfuse/otel @opentelemetry/api @opentelemetry/sdk-trace-base2. Set Up Langfuse
Create a Langfuse account and get your API keys:
# .env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...3. Add Hooks to Your Agent
import { createLangfuseHooks } from '@helix-agents/tracing-langfuse';
import { defineAgent, JSAgentExecutor } from '@helix-agents/sdk';
// Create hooks (auto-reads credentials from env)
const { hooks, flush } = createLangfuseHooks();
// Use with agent
const agent = defineAgent({
name: 'my-agent',
hooks,
systemPrompt: 'You are a helpful assistant.',
llmConfig: { model: { provider: 'openai', name: 'gpt-4o' } },
});
// Execute
const handle = await executor.execute(agent, 'Hello!');
const result = await handle.result;
// Flush in serverless (optional in long-running processes)
await flush();4. View Traces in Langfuse
Open your Langfuse dashboard to see:
- Trace timeline with all observations
- Token usage and costs
- Latency breakdown
- Error details
Configuration
Basic Options
const { hooks } = createLangfuseHooks({
// Credentials (optional if using env vars)
publicKey: 'pk-lf-...',
secretKey: 'sk-lf-...',
baseUrl: 'https://cloud.langfuse.com', // or self-hosted URL
// Version tag for filtering
release: '1.0.0',
// Default tags for all traces
defaultTags: ['production', 'v2'],
// Default metadata for all traces
defaultMetadata: {
service: 'chat-api',
team: 'platform',
},
// Step grouping
groupByStep: true, // Group observations by agent step (default: true)
// Environment label
environment: 'production',
// Debug logging
debug: false,
});Data Capture Options
Control what data is sent to Langfuse:
const { hooks } = createLangfuseHooks({
// Agent state snapshots (may be large)
includeState: false,
// Full conversation messages (may contain PII)
includeMessages: false,
// Tool arguments (default: true)
includeToolArgs: true,
// Tool results (may be large)
includeToolResults: false,
// LLM prompts (default: true)
includeGenerationInput: true,
// LLM responses (default: true)
includeGenerationOutput: true,
});Privacy
For production systems handling PII, consider disabling includeMessages, includeGenerationInput, and includeGenerationOutput to avoid logging sensitive user data.
Metadata & Tagging
Metadata enables filtering and attribution in Langfuse.
Passing Metadata at Execution
await executor.execute(agent, input, {
// User attribution
userId: 'user-123',
// Session grouping (e.g., conversation threads)
sessionId: 'conversation-456',
// Tags for filtering
tags: ['premium', 'mobile'],
// Custom key-value metadata
metadata: {
environment: 'production',
region: 'us-west-2',
feature: 'chat',
},
});Using the Context Builder
For better ergonomics, use the fluent builder:
import { tracingContext } from '@helix-agents/tracing-langfuse';
const context = tracingContext()
.user('user-123')
.session('conversation-456')
.tags('premium', 'mobile')
.environment('production')
.version('1.0.0')
.metadata('region', 'us-west-2')
.build();
await executor.execute(agent, input, context);Typed Metadata
For common metadata patterns, use typed interfaces:
import { createTracingMetadata } from '@helix-agents/tracing-langfuse';
const metadata = createTracingMetadata({
environment: 'production',
version: '1.0.0',
service: 'chat-api',
region: 'us-west-2',
tier: 'premium',
source: 'mobile',
});
await executor.execute(agent, input, { metadata });Token Tracking
Token usage is tracked automatically on every LLM generation. Beyond standard input/output tokens, the integration captures extended token types:
- Reasoning tokens - Used by models with chain-of-thought reasoning (OpenAI o1/o3, Claude with extended thinking)
- Cached tokens - Served from prompt cache (Anthropic prompt caching, OpenAI cached context)
- Cache write tokens - Tokens written to create new cache entries (Anthropic
cache_creation_input_tokens)
These are sent to Langfuse in the v4 usageDetails format:
| Framework field | Langfuse field | Description |
|---|---|---|
promptTokens | input | Input tokens |
completionTokens | output | Output tokens |
totalTokens | total | Total tokens |
reasoningTokens | reasoning_tokens | Reasoning/thinking tokens |
cachedTokens | cache_read_input_tokens | Tokens served from cache |
cacheWriteTokens | cache_creation_input_tokens | Tokens written to cache |
No configuration is needed — when your LLM reports these token types in its usage response, they automatically appear in Langfuse. This enables accurate cost tracking for reasoning models and prompt caching.
Prompt Caching
Enable prompt caching with caching: 'auto' in your agent's llmConfig. Cache read tokens reduce cost (typically 90% discount), while cache write tokens have a small surcharge on the first request. See the Prompt Caching guide for details.
Step Grouping
The groupByStep option (default: true) groups observations under step spans, making it easy to see what happened in each iteration of the agent loop:
const { hooks } = createLangfuseHooks({
groupByStep: true, // default
});With groupByStep: true:
graph TB
subgraph Trace ["trace: my-agent"]
subgraph Step1 ["span: step-1"]
G1["generation: llm.generation<br/><i>model: gpt-4o, tokens: 1234</i>"]
T1["span: tool:search"]
T2["span: tool:calculate"]
end
subgraph Step2 ["span: step-2"]
G2["generation: llm.generation"]
SA["span: agent:sub-agent"]
end
endWith groupByStep: false, all observations are flat under the root trace:
graph TB
subgraph Trace ["trace: my-agent"]
G1["generation: llm.generation"]
T1["span: tool:search"]
T2["span: tool:calculate"]
G2["generation: llm.generation"]
SA["span: agent:sub-agent"]
end
G1 --> T1 --> T2 --> G2 --> SAStep grouping is useful for multi-step agents where you want to clearly see the boundary between each LLM call and its resulting tool executions.
Trace Hierarchy
- Trace - Root container, represents the full agent run
- Generation - LLM call with model, tokens, timing
- Span - Tool or sub-agent execution
Trace Update Consolidation
Trace-level attributes (name, input, output, userId, sessionId, tags) are updated only on the root span of each agent trace. This avoids redundant updateTrace calls from child spans (generations, tool spans) that would send duplicate data to Langfuse. The result is cleaner traces and fewer API calls.
If you use the onAgentTraceCreated lifecycle hook to set trace metadata, those attributes apply to the root span and are inherited by all child observations in Langfuse.
Lifecycle Hooks
Customize observations with lifecycle hooks:
onAgentTraceCreated
Called when the root trace is created:
const { hooks } = createLangfuseHooks({
onAgentTraceCreated: ({ runId, agentName, hookContext, updateTrace }) => {
// Add environment info
updateTrace({
metadata: {
nodeVersion: process.version,
environment: process.env.NODE_ENV,
},
});
},
});onGenerationCreated
Called when an LLM generation is created. Use for logging, metrics, or side effects. The updateGeneration callback is a no-op in the current architecture; use extractAttributes for per-observation metadata instead.
const { hooks } = createLangfuseHooks({
onGenerationCreated: ({ model, observationId }) => {
const provider = model?.includes('gpt') ? 'openai' : 'anthropic';
console.log(`Generation ${observationId} using ${provider}`);
},
});onToolCreated
Called when a tool span is created. Use for logging, metrics, or side effects. The updateTool callback is a no-op in the current architecture; use extractAttributes for per-observation metadata instead.
const { hooks } = createLangfuseHooks({
onToolCreated: ({ toolName, toolCallId }) => {
const category = toolName.startsWith('db_') ? 'database' : 'external';
console.log(`Tool ${toolCallId}: ${toolName} (${category})`);
},
});onObservationEnding
Called before any observation ends:
const { hooks } = createLangfuseHooks({
onObservationEnding: ({ type, observationId, durationMs, success, error }) => {
if (!success) {
console.error(`${type} failed after ${durationMs}ms:`, error);
}
},
});Custom Attribute Extraction
Extract attributes from hook context for all observations:
const { hooks } = createLangfuseHooks({
extractAttributes: (context) => ({
stepCount: String(context.stepCount),
hasParent: String(!!context.parentSessionId),
// Access execution metadata
region: context.metadata?.region,
}),
});Sub-Agent Tracing
Sub-agents automatically inherit tracing context:
const researchAgent = defineAgent({
name: 'researcher',
// ... config
});
const orchestrator = defineAgent({
name: 'orchestrator',
hooks, // Langfuse hooks
tools: [
createSubAgentTool({
name: 'research',
agent: researchAgent,
description: 'Delegate research tasks',
}),
],
});In Langfuse, you'll see:
graph TB
subgraph Trace ["trace: orchestrator"]
G1["generation: llm.generation"]
subgraph SubAgent ["span: agent:researcher"]
SG["generation: llm.generation"]
ST["span: tool:search"]
end
end
G1 --> SubAgent
SG --> STSub-agents inherit userId, sessionId, tags, and metadata from the parent.
Serverless Considerations
Langfuse batches events and sends them asynchronously. In serverless environments, flush before the function returns:
// AWS Lambda / Vercel / Cloudflare Workers
export async function handler(event) {
const { hooks, flush } = createLangfuseHooks();
const agent = defineAgent({ hooks, ... });
const executor = new JSAgentExecutor({ ... });
const handle = await executor.execute(agent, event.message);
const result = await handle.result;
// IMPORTANT: Flush before returning
await flush();
return { statusCode: 200, body: JSON.stringify(result) };
}For graceful shutdown in long-running processes:
const { hooks, shutdown } = createLangfuseHooks();
process.on('SIGTERM', async () => {
await shutdown(); // Flushes and closes
process.exit(0);
});OpenTelemetry Integration
The Langfuse integration uses the v4 SDK (@langfuse/tracing + @langfuse/otel), built on OpenTelemetry. The package creates a locally-owned BasicTracerProvider with a LangfuseSpanProcessor to export traces. This provider never touches global OTEL state, so it coexists safely with other OTEL integrations in the same process.
Configure the span processor via top-level options:
const { hooks } = createLangfuseHooks({
publicKey: 'pk-lf-...',
secretKey: 'sk-lf-...',
flushAt: 512,
flushInterval: 5,
exportMode: 'batched',
});Self-Hosted Langfuse
To use a self-hosted Langfuse instance:
const { hooks } = createLangfuseHooks({
baseUrl: 'https://langfuse.your-company.com',
publicKey: 'pk-...',
secretKey: 'sk-...',
});Or via environment variables:
LANGFUSE_BASEURL=https://langfuse.your-company.com
LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...Troubleshooting
Traces Not Appearing
- Check credentials: Ensure
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYare set - Enable debug mode:
createLangfuseHooks({ debug: true }) - Flush in serverless: Call
await flush()before function returns - Check network: Verify connectivity to
cloud.langfuse.com
Missing Metadata
Metadata must be passed at execute() time, not in agent definition:
// WRONG: Agent definition doesn't support execution metadata
const agent = defineAgent({
metadata: { userId: '123' }, // This won't work!
});
// CORRECT: Pass at execution time
await executor.execute(agent, input, {
userId: '123',
metadata: { custom: 'value' },
});High Memory Usage
If tracing increases memory usage:
- Disable state capture:
includeState: false - Disable message capture:
includeMessages: false - Disable result capture:
includeToolResults: false
Next Steps
- API Reference - Full API documentation
- Hooks Guide - Learn about the hooks system
- Langfuse Docs - Langfuse platform documentation