Skip to content

Checkpoints

Checkpoints are complete snapshots of agent state saved after each step. They enable time-travel debugging, crash recovery, and branching execution.

What are Checkpoints?

A checkpoint contains:

  • Complete agent state (messages, custom state, step count)
  • Agent status at that point
  • Timestamp of creation
  • Unique versioned ID

Checkpoints are created automatically after each step completes. They're stored in your state store alongside the agent state.

Checkpoint IDs

Checkpoint IDs follow a versioned format:

cpv1-{sessionId}-s{stepCount}-t{timestamp}-{random6hex}

Example: cpv1-session-abc123-s5-t1703123456789-a1b2c3

The format includes:

  • cpv1 - Version prefix for forward compatibility
  • sessionId - The session this checkpoint belongs to
  • s5 - Step count when created
  • t... - Timestamp in milliseconds
  • Random suffix for uniqueness

Listing Checkpoints

Get all checkpoints for a session:

typescript
const checkpoints = await stateStore.listCheckpoints(sessionId);

for (const meta of checkpoints.items) {
  console.log(`Step ${meta.stepCount}: ${meta.id}`);
  console.log(`  Status: ${meta.status}`);
  console.log(`  Created: ${new Date(meta.timestamp)}`);
}

Pagination

For sessions with many steps, use pagination:

typescript
// First page
const page1 = await stateStore.listCheckpoints(sessionId, {
  limit: 10,
  offset: 0,
});

console.log(`Showing ${page1.items.length} of ${page1.total}`);

// Next page
if (page1.hasMore) {
  const page2 = await stateStore.listCheckpoints(sessionId, {
    limit: 10,
    offset: 10,
  });
}

Retrieving a Checkpoint

Get a specific checkpoint with full state:

typescript
// Get latest checkpoint
const latest = await stateStore.getLatestCheckpoint(sessionId);
if (latest) {
  console.log(`Latest at step ${latest.stepCount}`);
  console.log('Messages:', latest.state.messages.length);
  console.log('Custom state:', latest.state.customState);
}

// Get specific checkpoint
const checkpoint = await stateStore.getCheckpoint(checkpointId);
if (checkpoint) {
  console.log('State:', checkpoint.state);
}

Time-Travel

Resume from any checkpoint to "time-travel" to that state:

typescript
// List checkpoints
const checkpoints = await stateStore.listCheckpoints(sessionId);

// Pick an earlier checkpoint (e.g., step 3)
const targetCheckpoint = checkpoints.items.find((c) => c.stepCount === 3);

if (targetCheckpoint) {
  // Resume from that checkpoint
  const newHandle = await handle.resume({
    mode: 'from_checkpoint',
    checkpointId: targetCheckpoint.id,
  });

  // Agent continues from step 3
  const result = await newHandle.result();
}

Use Cases

  1. Debugging - Replay from a specific step to understand behavior
  2. Branching - Fork execution from a historical point
  3. Rollback - Undo recent steps if something went wrong
  4. What-if analysis - Try different inputs from the same state

Checkpoint Metadata

CheckpointMeta is a lightweight view for listing:

typescript
interface CheckpointMeta {
  id: string; // Checkpoint ID
  sessionId: string; // Session this checkpoint belongs to
  stepCount: number; // Step count when created
  timestamp: number; // Creation time (ms since epoch)
  status: AgentStatus; // Status at checkpoint time
}

The full Checkpoint includes the complete state plus recovery coordination fields:

typescript
interface Checkpoint<TState, TOutput> {
  id: string;
  sessionId: string; // Session this checkpoint belongs to
  stepCount: number;
  timestamp: number;
  state: AgentState<TState, TOutput>; // Full agent state
  messageCount: number; // Message count at checkpoint (for recovery coordination)
  streamSequence: number; // Stream sequence at checkpoint (for resumption)
}

Recovery Coordination Fields

The messageCount and streamSequence fields enable coordinated recovery after crashes or interrupts:

  • messageCount: Number of messages at this checkpoint. Used to truncate orphaned messages that were created after the checkpoint but before a crash.
  • streamSequence: Stream position at this checkpoint. Used to resume streaming from the correct position and clean up orphaned stream chunks.

These fields ensure that messages, stream chunks, and checkpoints stay synchronized during crash recovery. When resuming from a checkpoint, the runtime uses these values to:

  1. Truncate messages beyond messageCount (removing orphaned messages)
  2. Clean up stream chunks beyond the checkpoint's step (removing orphaned chunks)
  3. Resume streaming from the correct sequence position

Storage Considerations

Size

Each checkpoint stores the complete agent state, including all messages. For agents with:

  • Long conversations
  • Large custom state
  • Many steps

Storage can grow significantly. Plan your retention accordingly.

Retention

Configure TTL for automatic cleanup:

typescript
// Redis store with 7-day retention
const stateStore = new RedisStateStore({
  host: 'localhost',
  ttl: 86400 * 7, // 7 days in seconds
});

Cleanup

For manual cleanup, delete a session and its checkpoints:

typescript
// Delete a session's data (including checkpoints)
await stateStore.deleteSession(sessionId);

Checkpoint Parsing

Parse checkpoint IDs to extract components:

typescript
import { parseCheckpointId, generateCheckpointId } from '@helix-agents/core';

// Parse an existing ID
const parsed = parseCheckpointId('cpv1-session-123-s5-t1703123456789-a1b2c3');
if (parsed) {
  console.log(parsed.version); // 1
  console.log(parsed.sessionId); // 'session-123'
  console.log(parsed.stepCount); // 5
  console.log(parsed.timestamp); // 1703123456789
  console.log(parsed.random); // 'a1b2c3'
}

// Generate a new ID
const newId = generateCheckpointId('session-456', 10);
// Returns: 'cpv1-session-456-s10-t{timestamp}-{random}'

Stream Events

A checkpoint_created event is emitted when checkpoints are saved:

typescript
for await (const chunk of stream) {
  if (chunk.type === 'checkpoint_created') {
    console.log(`Checkpoint saved: ${chunk.checkpointId}`);
    console.log(`At step: ${chunk.stepCount}`);
  }
}

StateStore Methods

Checkpoint-related StateStore methods:

MethodDescription
getCheckpoint(checkpointId)Get full checkpoint by ID
getLatestCheckpoint(sessionId)Get most recent checkpoint
listCheckpoints(sessionId, options?)List checkpoint metadata with pagination

These methods are implemented by all state stores (Memory, Redis, Cloudflare D1).

Best Practices

1. Use Checkpoints for Debugging

When an agent behaves unexpectedly:

typescript
// List checkpoints to find where things went wrong
const checkpoints = await stateStore.listCheckpoints(sessionId);

for (const cp of checkpoints.items) {
  const full = await stateStore.getCheckpoint(cp.id);
  console.log(`Step ${cp.stepCount}: ${full?.state.messages.length} messages`);
}

2. Implement Rollback UI

Let users undo agent actions:

typescript
async function rollbackToStep(sessionId: string, targetStep: number) {
  const checkpoints = await stateStore.listCheckpoints(sessionId);
  const target = checkpoints.items.find((c) => c.stepCount === targetStep);

  if (!target) {
    throw new Error(`No checkpoint at step ${targetStep}`);
  }

  const handle = await executor.getHandle(agent, sessionId);
  return handle?.resume({
    mode: 'from_checkpoint',
    checkpointId: target.id,
  });
}

3. Monitor Storage Growth

Track checkpoint storage for capacity planning:

typescript
const checkpoints = await stateStore.listCheckpoints(sessionId);
console.log(`Session has ${checkpoints.total} checkpoints`);

// For Redis, check memory usage
// For D1, check row counts

Next Steps

Released under the MIT License.