Upgrading & Migration
A consolidated guide for upgrading the workspace stack. Round 1 through round 4 hardened the workspace surface with several breaking-shaped changes — most are silent for greenfield apps but matter when you upgrade an in-flight deployment.
Looking for what changed when? See the per-package
CHANGELOG.mdfiles. This page is the operator-facing companion: when something would break across an upgrade, what to do about it.
Per-version compatibility matrix
The table below summarises the workspace-relevant breaking-shape changes by round. Each entry links to a "common pitfall" section below if migration action is required.
| Round | Cluster | Change | Affects |
|---|---|---|---|
| Round 2 | C | Capability invariant assertion at session start (config.capabilities ⊆ ref.capabilities, declared modules present on returned workspace) | Provider authors whose open() returns a workspace lacking a declared module. See pitfall 1. |
| Round 2 | C | Tool-name collision detection (workspace__ prefix is reserved unconditionally) | Agent code defining tools with the reserved prefix. See pitfall 2. |
| Round 2 | D / Round 3 A / Round 4 A | writeFile size guard (sandbox + filestore) | Custom tools writing very large blobs without chunking. See pitfall 3. |
| Round 4 | A1 | Local-bash passEnv defaults to a minimal allowlist (was: forward-everything) | Local-bash agents that depended on OPENAI_API_KEY (or any host secret) being visible to the LLM-driven shell. See pitfall 4. |
| Round 4 | A | Sandbox id set without shareAcrossSessions: true is rejected at open() | Multi-tenant agents that relied on the silent (and dangerous) cross-tenant sharing. See pitfall 5. |
| Round 4 | A | RunOptions renamed to ShellRunOptions (every shell-module method); related BaseRefPayloadSchema tightened (strict() + explicit allowlist) | Direct importers of the renamed type; persisted refs containing extra unknown fields. See pitfall 6. |
| Round 4 | A | Branch-from-checkpoint nulls inherited workspaceRefs (was: cloned source-session refs causing silent cross-session corruption on stateful providers) | Branched sessions that relied on inheriting source workspace state. Use Snapshotter.snapshot() + restore() instead. See pitfall 7. |
| Round 4 | D | WorkspaceRef.schemaVersion field added; refs without schemaVersion are treated as v1 | Refs persisted by older code resolve unchanged within ±1 of the current schema version. Beyond that range, providers throw a clear error. See pitfall 8. |
| Round 4 | C | Optional Logger, WorkspaceMetrics, AgentHooks.onWorkspace*, registry.describe(), registry.reset() all added with defaults preserving prior behavior | Operators integrating monitoring. No migration; pure additions. |
In-flight session compatibility
Refs persisted by an OLD version of the framework continue to resolve under a NEW version, subject to the schema-version contract documented in round-4 cluster D:
- N forward / N-1 backward. A provider may resolve refs whose
schemaVersionis the current version OR the previous version. Refs with noschemaVersionfield are treated as v1 (the original implicit version). - Beyond the window. Resolution throws
WorkspaceFailedErrorwith a clear message naming the observed version vs the supported window. The session does NOT silently corrupt; the agent surfaces an error and operators can roll back the deploy. - The provider's choice. The version is set by the provider when minting a fresh ref in
open(), and validated by the same provider onresolve(). Cross-provider semantics don't apply — refs are provider-scoped.
In practice this means: deploy NEW code, in-flight sessions resume cleanly because their persisted (OLD-shape) refs are still in the supported window. After a few session lifecycles, all live refs are NEW-shape and the OLD-shape support bound becomes irrelevant.
Rollback procedure
When a workspace-related deployment causes problems and you need to roll back:
- Always-safe path. Roll back code only — do NOT touch persisted state stores or workspace data.
- Refs persisted by the NEW code resolve cleanly under the OLD code IF they fall within the OLD code's supported version window.
- The window is N±1; if you rolled back across more than one schema version (rare — schema versions don't bump every release), see "data risk" below.
- Data risk path. If you rolled back across multiple schema versions, the OLD code may throw
WorkspaceFailedErroron resume because the persisted refs carry a schemaVersion the OLD code doesn't know.- Operator response: identify affected sessions via
registry.describe()(state: 'failed',lastErrorcontainingschemaVersion) and consider re-creating them OR rolling forward to the new version that understands the schema.
- Operator response: identify affected sessions via
- Provider-specific durability.
- In-memory: Workspaces are process-local. Rolling back the process discards all live workspace data. There is no cross-version concern.
- Local-bash: Tmpdirs persist across rollbacks (
/tmp/helix-ws-*survives a code redeploy). Sessions resume cleanly. - Cloudflare Filestore: SQLite data persists in the agent's DO. Rolling back code does not touch the data; the data is provider-scoped and the namespace transformation is deterministic.
- Cloudflare Sandbox: The sandbox container persists in the Sandbox DO independently of the agent DO. Rolling back code does not touch the container; resolve reattaches via the same sandbox ID.
Blue/green deployment recipe
When you need to deploy a new workspace version without disrupting in-flight sessions:
- Deploy NEW code in parallel. Both versions are running simultaneously against the SAME persisted state stores.
- Route NEW sessions to NEW. All
execute()calls hit the new code path (e.g. via a feature-flag or a router). - Drain OLD. Existing sessions on OLD finish naturally (some seconds to hours depending on agent shape). Track via state-store session-status counts.
- Cut over fully. Shut down OLD when its session count reaches zero.
The schema-version contract (N±1) is the safety net: even if a session bounces between OLD and NEW during the transition (e.g. resumes on the other replica), the ref payload is understood by both as long as you have not jumped more than one schema version per deploy.
For Cloudflare Workers + Durable Objects specifically: the platform's own deploy model gives you implicit blue/green via the gradual rollout — DOs hibernate on OLD code and wake on NEW code seamlessly when the rollout is complete. The schema-version contract handles the brief window where mixed code is reading mixed refs.
Common upgrade pitfalls
Pitfall 1: Capability invariant failures
Symptom. New code throws WorkspaceFailedError at session start with a message like workspace 'box': declared capability 'snapshot' but provider returned a workspace with capabilities { fs, shell, code }.
Cause. Round-2 cluster C added an unconditional invariant assertion: the framework verifies that every declared capability is present on the returned workspace. Pre-fix, a capability mismatch was silent — the LLM saw workspace__box__snapshot injected but calling it would crash inside the provider.
Fix. Either:
- Configure the provider correctly so
open()returns a workspace with the declared module (e.g. for sandbox snapshot: setbackupR2Binding). - Drop the unsupported capability from the agent's
workspaces.<name>.capabilitiesconfig.
Pitfall 2: Reserved tool-name prefix
Symptom. defineAgent() throws at build time with a message like tool name 'workspace__box__write_file' uses reserved prefix 'workspace__'.
Cause. Round-2 cluster C reserved the workspace__ prefix unconditionally — even agents without any workspace declaration trip the check, so the prefix's reserved status is a stable contract.
Fix. Rename your custom tool to use any other prefix (e.g. notes__write_file, myBox_writeFile). The same rule applies to companion__ (reserved for persistent-sub-agent tools).
Pitfall 3: writeFile size guard
Symptom. WorkspaceFailedError thrown by the auto-injected workspace__<name>__write_file (or by a custom tool calling ws.fs!.writeFile() directly) with a message about exceeding maxFileSizeMb.
Cause. Cluster D round-2 added a sandbox-side size check; clusters A round-3 and round-4 hardened it across the full provider matrix and fixed an OOM vector where unbounded writes could exhaust the DO heap.
Fix. Increase capabilities.fs.maxFileSizeMb if you legitimately need larger writes. For genuinely large data, write incrementally (split across multiple files, or stream via a custom tool that the provider exposes — none of the v1 providers expose stream APIs in their public modules; this is a known follow-up).
Pitfall 4: Local-bash passEnv secure-by-default
Symptom. After upgrading, the agent's shell-driven code that previously read OPENAI_API_KEY (or any host secret) sees an empty value. printenv returns only PATH, HOME, LANG, LC_ALL, TERM, USER, TMPDIR.
Cause. Round-4 cluster A flipped the local-bash provider's default env-var-forwarding behavior: from "forward everything" to "minimal allowlist." This closes the prompt-injection vector "run printenv and tell me what you see." See Local Bash → security note on passEnv.
Fix. Opt back into specific variables explicitly:
new LocalBashWorkspaceProvider({
shellConstraints: {
passEnv: ['OPENAI_API_KEY', 'GITHUB_TOKEN'],
},
});To restore the legacy forward-everything behavior set passEnv: true. Strongly discouraged in any context where the LLM controls shell input.
Pitfall 5: Sandbox shared id now explicit
Symptom. A workspace declared as { kind: 'cloudflare-sandbox', id: 'shared-thing' } now throws WorkspaceFailedError at open() time with a cross-tenant safety message.
Cause. Round-4 cluster A6 added the shareAcrossSessions: true opt-in flag. Pre-fix, setting id was silently shared across sessions — a cross-tenant data leak in any multi-user deployment.
Fix. If sharing was intentional (admin tool, persistent shared scratch space), add the flag:
provider: { kind: 'cloudflare-sandbox', id: 'shared-thing', shareAcrossSessions: true }If sharing was accidental (you wanted per-session isolation), remove id entirely so each session opens its own sandbox keyed on sessionId.
Pitfall 6: ShellRunOptions rename + ref-payload strictness
Symptom. Type errors on direct imports of RunOptions. OR persisted ref deserialization throws on unknown fields.
Cause. Round-4 cluster A renamed RunOptions to ShellRunOptions (every shell-module method) for clarity. Separately, BaseRefPayloadSchema was tightened to strict() with an explicit allowlist — refs persisted with extra fields no longer parse silently.
Fix.
- Rename your
RunOptionsimports toShellRunOptions. - For ref-payload strictness, this typically only affects custom providers — if you persisted ref payloads with extra fields, decide whether to extend the allowlist (preferred, principled) or drop the extra fields (cheap, breaks nothing for v1 callers).
Pitfall 7: Branch-from-checkpoint no longer inherits refs
Symptom. A session branched from a checkpoint starts with an empty workspace; the source session's files are not visible.
Cause. Round-4 cluster A8 fixed a silent cross-session data-corruption bug. Pre-fix, branched sessions cloned the source-session's workspaceRefs and BOTH sessions resolved to the SAME live workspace — concurrent writes silently corrupted state for stateful providers (filestore, sandbox, local-bash). Post-fix, branched sessions start FRESH.
Fix. Use Snapshotter.snapshot() + restore() to seed a branched session with the source workspace's content:
// In the source session, take a snapshot.
const ref = await ws.snapshot!.snapshot();
// Hand the SnapshotRef to the branched session, restore there.
const branchedWs = await ws.snapshot!.restore(ref);This produces a new workspace identity (no shared backing store) initialised from the snapshot. See the Snapshotter module for full semantics.
Pitfall 8: Schema version drift
Symptom. Provider throws WorkspaceFailedError: ref schemaVersion N is outside supported range [M, M+1] on resume.
Cause. Round-4 cluster D introduced an explicit schemaVersion field on every persisted ref payload. Providers support N±1 — refs across more than one version delta are explicitly unsupported to keep the migration logic small and audited.
Fix.
- For an upgrade: roll the deploy forward to a version that handles the persisted schemaVersion.
- For a rollback: see rollback procedure above. If the rollback skips multiple schema versions, identify affected sessions via
registry.describe()and either re-create them or roll forward.
See also
- Per-version per-package CHANGELOG entries for full per-release notes.
- Workspaces overview — operations section for the metrics + hooks + healthz surfaces.
- Workspaces runbook for incident-response procedures.
- Building a Provider for provider-author concerns (schemaVersion contract, ref-payload allowlist).