State Stores
This guide covers the SessionStateStore interface and the v7 contracts every implementation must satisfy. For end-user custom-state patterns (defining schemas, reading/writing state inside tools), see State Management.
Overview
The state store is the single durable surface for an agent's session data. v7's stateless-suspension redesign elevated this to a load- bearing role: every HITL pause writes its full continuation context into SessionState.suspensionContext so the next process to wake up on the session can pick up where the previous one left off, even across restarts and machine moves.
In v7, every in-tree state store implements the full SessionStateStore interface atomically. Third-party stores can delegate to a default non-atomic fallback (with a warning log) until they upgrade.
What changed in v7
If you maintain a custom SessionStateStore implementation or operate sessions at the SQL level, the v7 changes that matter:
- New required method:
saveStateAndPromoteStaging. Atomic write-and-promote that replaces the v6 two-call dance. - Forward-only schema migrations added to all in-tree stores — Postgres V5, D1 V8, DO SQLite V4. They add a
suspension_contextcolumn and indexes onpendingClientToolCallsfor efficient expiration queries. See Storage migrations. compareAndSetStatusreturn shape. Old:Promise<boolean>. New: discriminated{ ok: true; newVersion } | { ok: false; currentStatus; currentVersion }. The single most-commonly-tripped v7 breaking change.- New
SessionStatefields —suspendedAwaitingChildren,suspendedStepId,tracingContext,expiresAt. Custom stores must persist all of them, even if the columns store JSON. expiredSessionCleanup— operator-driven helper for reaping sessions whoseexpiresAtis in the past.
If you are upgrading from v6, read the v6 to v7 migration guide end-to-end before deploying. The rest of this page describes the v7 model.
saveStateAndPromoteStaging
SessionStateStore.saveStateAndPromoteStaging(sessionId, state, opts) atomically:
- Persists the full
SessionState(messages, custom state, suspensionContext, all the v7 fields). - Promotes any staged Immer patches into the canonical state.
- Bumps the session version.
In v7 this is the canonical write path used by the run loop after every step. The previous v6 flow — saveStaging followed by a separate commitStaging call — has a small window where a crash leaves staging written but unpromoted. The atomic primitive closes that window.
Implementing in a custom store
If you maintain a third-party SessionStateStore, you MUST implement saveStateAndPromoteStaging atomically — run both writes inside a single transaction (Postgres) or compare-and-swap (Redis/DO).
Earlier versions exported a defaultSaveStateAndPromoteStaging() helper from @helix-agents/core that performed the legacy two-call flow (non-atomic). That helper was removed in P3.R3-BC-FALLBACK: a sequential appendMessages → saveState → promoteStaging opens a small window where a crash between calls leaves staging written but unpromoted, which is exactly the corruption the atomic primitive was added to prevent. All five in-tree stores (memory, redis, postgres, D1, DO) implement the atomic version; custom stores must do the same.
compareAndSetStatus returns an object
The status-CAS API changed in v7 to surface what the store saw, not just whether the swap succeeded:
// v6
const ok = await store.compareAndSetStatus(sessionId, ['active'], 'paused');
if (ok) { ... }
// v7
const result = await store.compareAndSetStatus(
sessionId,
['active'],
'paused',
);
if (result.ok) {
console.log('promoted to version', result.newVersion);
} else {
console.log(
'lost CAS — store is at',
result.currentStatus,
'version',
result.currentVersion,
);
}Every call site in your codebase must update. The lossy boolean form is gone.
New SessionState fields
v7 adds four fields to SessionState. Custom stores that serialize state must round-trip all of them.
| Field | Type | Purpose |
|---|---|---|
suspensionContext | SuspensionContext | undefined | Continuation context for HITL pauses. Read on resume to restore loop. |
suspendedAwaitingChildren | SuspendedChildWait[] | undefined | Per-child waits for cascading sub-agent suspensions. |
suspendedStepId | string | undefined | The step ID a suspended_step_partial outcome is anchored to. |
tracingContext | TracingContext | undefined | Persisted Langfuse / OTel trace IDs so resume continues the same trace. |
expiresAt | number | undefined | Epoch ms TTL — read by expiredSessionCleanup. |
In Postgres, all four serialize into the state JSONB column (no schema migration needed beyond V5's suspension_context column for indexing). In D1 / DO SQLite they live in the state TEXT column.
Storage migrations
Every in-tree store ships a forward migration in v7. Apply migrations before rolling new code; new code reading old data is fine, but old code reading new data is undefined behavior.
| Package | Migration | Notes |
|---|---|---|
@helix-agents/store-postgres | V5 | Adds suspension_context JSONB, GIN index. |
@helix-agents/store-cloudflare (D1) | V8 | Adds suspension_context TEXT, JSON-path index. |
@helix-agents/store-cloudflare (DO) | V4 | Adds suspension_context TEXT to the DO SQLite. |
@helix-agents/store-redis | (none) | RedisJSON path-set; version bump only. |
@helix-agents/store-memory | (none) | In-memory; no migration needed. |
Verify the active migration version with:
SELECT version FROM __agents_migrations
ORDER BY version DESC
LIMIT 1;Postgres should show 5 or higher; D1 should show 8 or higher; the DO SQLite tier should show 4 or higher.
Forward-only
Rolling back from v7 to v6 after applying these migrations is unsafe by default. Sessions paused under v7 carry suspension context that v6 does not know how to read; resuming them under v6 silently loses the context. See the migration guide's rollback semantics for the recovery procedure.
Operator-driven session cleanup
v7 adds expiredSessionCleanup to @helix-agents/agent-server — a helper for reaping sessions whose expiresAt is in the past. The helper:
- Pages through
stateStore.listSessions()(configurable page size, default 200). - Loads each session and checks
expiresAt. - For each expired non-terminal session: enumerates owned workspace snapshots via the matching
WorkspaceProvider'ssnapshot.list/snapshot.deletecapability and deletes them (closes the R2 cost- amplification gap). - CAS's the session status to
'failed'with reason'session_expired'. Per-session failures (load errors, snapshot errors, CAS conflicts) are logged but do not abort the loop.
The framework does not run this automatically. Wire it into a scheduled job (cron, Cloudflare Alarm, k8s CronJob).
import { expiredSessionCleanup } from '@helix-agents/agent-server';
// Cloudflare Alarm handler
export default {
async scheduled(_event, env, _ctx) {
const summary = await expiredSessionCleanup({
stateStore,
workspaceProviders, // Map<providerId, WorkspaceProvider>
logger: consoleLogger,
});
console.log('cleanup summary', summary);
},
};The returned summary ({ detected, marked, alreadyTerminal, snapshotsDeleted, errors }) gives operators a per-run observability handle.
See also
- State Management — defining and using custom state
- Checkpoints — how snapshots layer on top of state
- v6 to v7 migration guide
- Storage Overview — reference for each store implementation