Upgrading & Migration

A consolidated guide for upgrading the workspace stack. Round 1 through round 4 hardened the workspace surface with several breaking-shaped changes — most are silent for greenfield apps but matter when you upgrade an in-flight deployment.

Looking for what changed when? See the per-package CHANGELOG.md files. This page is the operator-facing companion: when something would break across an upgrade, what to do about it.

Per-version compatibility matrix

The table below summarises the workspace-relevant breaking-shape changes by round. Each entry links to a "common pitfall" section below if migration action is required.

Round	Cluster	Change	Affects
Round 2	C	Capability invariant assertion at session start (`config.capabilities ⊆ ref.capabilities`, declared modules present on returned workspace)	Provider authors whose `open()` returns a workspace lacking a declared module. See pitfall 1.
Round 2	C	Tool-name collision detection (`workspace__` prefix is reserved unconditionally)	Agent code defining tools with the reserved prefix. See pitfall 2.
Round 2	D / Round 3 A / Round 4 A	`writeFile` size guard (sandbox + filestore)	Custom tools writing very large blobs without chunking. See pitfall 3.
Round 4	A1	Local-bash `passEnv` defaults to a minimal allowlist (was: forward-everything)	Local-bash agents that depended on `OPENAI_API_KEY` (or any host secret) being visible to the LLM-driven shell. See pitfall 4.
Round 4	A	Sandbox `id` set without `shareAcrossSessions: true` is rejected at `open()`	Multi-tenant agents that relied on the silent (and dangerous) cross-tenant sharing. See pitfall 5.
Round 4	A	`RunOptions` renamed to `ShellRunOptions` (every shell-module method); related `BaseRefPayloadSchema` tightened (`strict()` + explicit allowlist)	Direct importers of the renamed type; persisted refs containing extra unknown fields. See pitfall 6.
Round 4	A	Branch-from-checkpoint nulls inherited `workspaceRefs` (was: cloned source-session refs causing silent cross-session corruption on stateful providers)	Branched sessions that relied on inheriting source workspace state. Use `Snapshotter.snapshot()` + `restore()` instead. See pitfall 7.
Round 4	D	`WorkspaceRef.schemaVersion` field added; refs without `schemaVersion` are treated as v1	Refs persisted by older code resolve unchanged within ±1 of the current schema version. Beyond that range, providers throw a clear error. See pitfall 8.
Round 4	C	Optional `Logger`, `WorkspaceMetrics`, `AgentHooks.onWorkspace*`, `registry.describe()`, `registry.reset()` all added with defaults preserving prior behavior	Operators integrating monitoring. No migration; pure additions.

In-flight session compatibility

Refs persisted by an OLD version of the framework continue to resolve under a NEW version, subject to the schema-version contract documented in round-4 cluster D:

N forward / N-1 backward. A provider may resolve refs whose schemaVersion is the current version OR the previous version. Refs with no schemaVersion field are treated as v1 (the original implicit version).
Beyond the window. Resolution throws WorkspaceFailedError with a clear message naming the observed version vs the supported window. The session does NOT silently corrupt; the agent surfaces an error and operators can roll back the deploy.
The provider's choice. The version is set by the provider when minting a fresh ref in open(), and validated by the same provider on resolve(). Cross-provider semantics don't apply — refs are provider-scoped.

In practice this means: deploy NEW code, in-flight sessions resume cleanly because their persisted (OLD-shape) refs are still in the supported window. After a few session lifecycles, all live refs are NEW-shape and the OLD-shape support bound becomes irrelevant.

Rollback procedure

When a workspace-related deployment causes problems and you need to roll back:

Always-safe path. Roll back code only — do NOT touch persisted state stores or workspace data.
- Refs persisted by the NEW code resolve cleanly under the OLD code IF they fall within the OLD code's supported version window.
- The window is N±1; if you rolled back across more than one schema version (rare — schema versions don't bump every release), see "data risk" below.
Data risk path. If you rolled back across multiple schema versions, the OLD code may throw WorkspaceFailedError on resume because the persisted refs carry a schemaVersion the OLD code doesn't know.
- Operator response: identify affected sessions via registry.describe() (state: 'failed', lastError containing schemaVersion) and consider re-creating them OR rolling forward to the new version that understands the schema.
Provider-specific durability.
- In-memory: Workspaces are process-local. Rolling back the process discards all live workspace data. There is no cross-version concern.
- Local-bash: Tmpdirs persist across rollbacks (/tmp/helix-ws-* survives a code redeploy). Sessions resume cleanly.
- Cloudflare Filestore: SQLite data persists in the agent's DO. Rolling back code does not touch the data; the data is provider-scoped and the namespace transformation is deterministic.
- Cloudflare Sandbox: The sandbox container persists in the Sandbox DO independently of the agent DO. Rolling back code does not touch the container; resolve reattaches via the same sandbox ID.

Blue/green deployment recipe

When you need to deploy a new workspace version without disrupting in-flight sessions:

Deploy NEW code in parallel. Both versions are running simultaneously against the SAME persisted state stores.
Route NEW sessions to NEW. All execute() calls hit the new code path (e.g. via a feature-flag or a router).
Drain OLD. Existing sessions on OLD finish naturally (some seconds to hours depending on agent shape). Track via state-store session-status counts.
Cut over fully. Shut down OLD when its session count reaches zero.

The schema-version contract (N±1) is the safety net: even if a session bounces between OLD and NEW during the transition (e.g. resumes on the other replica), the ref payload is understood by both as long as you have not jumped more than one schema version per deploy.

For Cloudflare Workers + Durable Objects specifically: the platform's own deploy model gives you implicit blue/green via the gradual rollout — DOs hibernate on OLD code and wake on NEW code seamlessly when the rollout is complete. The schema-version contract handles the brief window where mixed code is reading mixed refs.

Common upgrade pitfalls

Pitfall 1: Capability invariant failures

Symptom. New code throws WorkspaceFailedError at session start with a message like workspace 'box': declared capability 'snapshot' but provider returned a workspace with capabilities { fs, shell, code }.

Cause. Round-2 cluster C added an unconditional invariant assertion: the framework verifies that every declared capability is present on the returned workspace. Pre-fix, a capability mismatch was silent — the LLM saw workspace__box__snapshot injected but calling it would crash inside the provider.

Fix. Either:

Configure the provider correctly so open() returns a workspace with the declared module (e.g. for sandbox snapshot: set backupR2Binding).
Drop the unsupported capability from the agent's workspaces.<name>.capabilities config.

Pitfall 2: Reserved tool-name prefix

Symptom. defineAgent() throws at build time with a message like tool name 'workspace__box__write_file' uses reserved prefix 'workspace__'.

Cause. Round-2 cluster C reserved the workspace__ prefix unconditionally — even agents without any workspace declaration trip the check, so the prefix's reserved status is a stable contract.

Fix. Rename your custom tool to use any other prefix (e.g. notes__write_file, myBox_writeFile). The same rule applies to companion__ (reserved for persistent-sub-agent tools).

Pitfall 3: writeFile size guard

Symptom. WorkspaceFailedError thrown by the auto-injected workspace__<name>__write_file (or by a custom tool calling ws.fs!.writeFile() directly) with a message about exceeding maxFileSizeMb.

Cause. Cluster D round-2 added a sandbox-side size check; clusters A round-3 and round-4 hardened it across the full provider matrix and fixed an OOM vector where unbounded writes could exhaust the DO heap.

Fix. Increase capabilities.fs.maxFileSizeMb if you legitimately need larger writes. For genuinely large data, write incrementally (split across multiple files, or stream via a custom tool that the provider exposes — none of the v1 providers expose stream APIs in their public modules; this is a known follow-up).

Pitfall 4: Local-bash `passEnv` secure-by-default

Symptom. After upgrading, the agent's shell-driven code that previously read OPENAI_API_KEY (or any host secret) sees an empty value. printenv returns only PATH, HOME, LANG, LC_ALL, TERM, USER, TMPDIR.

Cause. Round-4 cluster A flipped the local-bash provider's default env-var-forwarding behavior: from "forward everything" to "minimal allowlist." This closes the prompt-injection vector "run printenv and tell me what you see." See Local Bash → security note on passEnv.

Fix. Opt back into specific variables explicitly:

typescript

new LocalBashWorkspaceProvider({
  shellConstraints: {
    passEnv: ['OPENAI_API_KEY', 'GITHUB_TOKEN'],
  },
});

To restore the legacy forward-everything behavior set passEnv: true. Strongly discouraged in any context where the LLM controls shell input.

Pitfall 5: Sandbox shared `id` now explicit

Symptom. A workspace declared as { kind: 'cloudflare-sandbox', id: 'shared-thing' } now throws WorkspaceFailedError at open() time with a cross-tenant safety message.

Cause. Round-4 cluster A6 added the shareAcrossSessions: true opt-in flag. Pre-fix, setting id was silently shared across sessions — a cross-tenant data leak in any multi-user deployment.

Fix. If sharing was intentional (admin tool, persistent shared scratch space), add the flag:

typescript

provider: { kind: 'cloudflare-sandbox', id: 'shared-thing', shareAcrossSessions: true }

If sharing was accidental (you wanted per-session isolation), remove id entirely so each session opens its own sandbox keyed on sessionId.

Pitfall 6: ShellRunOptions rename + ref-payload strictness

Symptom. Type errors on direct imports of RunOptions. OR persisted ref deserialization throws on unknown fields.

Cause. Round-4 cluster A renamed RunOptions to ShellRunOptions (every shell-module method) for clarity. Separately, BaseRefPayloadSchema was tightened to strict() with an explicit allowlist — refs persisted with extra fields no longer parse silently.

Fix.

Rename your RunOptions imports to ShellRunOptions.
For ref-payload strictness, this typically only affects custom providers — if you persisted ref payloads with extra fields, decide whether to extend the allowlist (preferred, principled) or drop the extra fields (cheap, breaks nothing for v1 callers).

Pitfall 7: Branch-from-checkpoint no longer inherits refs

Symptom. A session branched from a checkpoint starts with an empty workspace; the source session's files are not visible.

Cause. Round-4 cluster A8 fixed a silent cross-session data-corruption bug. Pre-fix, branched sessions cloned the source-session's workspaceRefs and BOTH sessions resolved to the SAME live workspace — concurrent writes silently corrupted state for stateful providers (filestore, sandbox, local-bash). Post-fix, branched sessions start FRESH.

Fix. Use Snapshotter.snapshot() + restore() to seed a branched session with the source workspace's content:

typescript

// In the source session, take a snapshot.
const ref = await ws.snapshot!.snapshot();

// Hand the SnapshotRef to the branched session, restore there.
const branchedWs = await ws.snapshot!.restore(ref);

This produces a new workspace identity (no shared backing store) initialised from the snapshot. See the Snapshotter module for full semantics.

Pitfall 8: Schema version drift

Symptom. Provider throws WorkspaceFailedError: ref schemaVersion N is outside supported range [M, M+1] on resume.

Cause. Round-4 cluster D introduced an explicit schemaVersion field on every persisted ref payload. Providers support N±1 — refs across more than one version delta are explicitly unsupported to keep the migration logic small and audited.

Fix.

For an upgrade: roll the deploy forward to a version that handles the persisted schemaVersion.
For a rollback: see rollback procedure above. If the rollback skips multiple schema versions, identify affected sessions via registry.describe() and either re-create them or roll forward.

Upgrading & Migration ​

Per-version compatibility matrix ​

In-flight session compatibility ​

Rollback procedure ​

Blue/green deployment recipe ​

Common upgrade pitfalls ​

Pitfall 1: Capability invariant failures ​

Pitfall 2: Reserved tool-name prefix ​

Pitfall 3: writeFile size guard ​

Pitfall 4: Local-bash passEnv secure-by-default ​

Pitfall 5: Sandbox shared id now explicit ​

Pitfall 6: ShellRunOptions rename + ref-payload strictness ​

Pitfall 7: Branch-from-checkpoint no longer inherits refs ​

Pitfall 8: Schema version drift ​

See also ​

Upgrading & Migration

Per-version compatibility matrix

In-flight session compatibility

Rollback procedure

Blue/green deployment recipe

Common upgrade pitfalls

Pitfall 1: Capability invariant failures

Pitfall 2: Reserved tool-name prefix

Pitfall 3: writeFile size guard

Pitfall 4: Local-bash `passEnv` secure-by-default

Pitfall 5: Sandbox shared `id` now explicit

Pitfall 6: ShellRunOptions rename + ref-payload strictness

Pitfall 7: Branch-from-checkpoint no longer inherits refs

Pitfall 8: Schema version drift

See also