Cloudflare Sandbox Workspace
The CloudflareSandboxWorkspace is the full-featured Cloudflare provider — a real Linux container (Workers Container, Firecracker microVM) backing a complete fs + shell + code + snapshot workspace surface. Use this when your agent needs to execute untrusted code, run shell commands, or snapshot state.
When to use
- Agents that need to execute code (Python, JavaScript) the LLM produces.
- Agents that need a real shell (
grep,find,git,npm install). - Agents that need to snapshot state for branch/restore patterns.
- Any production workload where untrusted input might end up in shell commands or code.
If you only need durable file storage (no shell, no code), use the lighter Cloudflare Filestore instead.
Capabilities supported
| Capability | Supported |
|---|---|
fs | ✅ |
shell | ✅ (with real-time stdout/stderr streaming) |
code | ✅ (Python, JavaScript; optional persistent contexts) |
snapshot | ✅ (R2-backed; restore creates NEW sandbox) |
All four capabilities are supported, so capability-mismatch errors are unlikely on this provider — but the same WorkspaceFailedError at session start applies if your config declares a capability the provider hasn't been configured for (e.g. snapshot without backupR2Binding). See the error-model table on the workspaces overview.
Optional peer dependency
@cloudflare/sandbox is heavy (Cloudflare Containers + sandbox runtime). It's an OPTIONAL peer dep on @helix-agents/runtime-cloudflare — users who only want the filestore workspace pay zero install cost.
npm install @cloudflare/sandbox@0.8.11The version is pinned EXACTLY (no caret) because the package is currently experimental and the API has been moving in 0.x. Bump consciously when you need a new version.
Wrangler setup
The sandbox lives in its OWN Durable Object (separate from the agent DO). You declare the Sandbox DO + the container binding + the agent DO + (optionally) an R2 bucket for backups.
# Agent DO (your AgentServer subclass)
[[durable_objects.bindings]]
name = "AGENTS"
class_name = "MyAgentServer"
# Sandbox DO (re-exported from @cloudflare/sandbox)
[[durable_objects.bindings]]
name = "SANDBOX"
class_name = "Sandbox"
# Container binding (Workers Container)
[[containers]]
class_name = "Sandbox"
image = "./Dockerfile" # User-supplied; copy from @cloudflare/sandbox repo
max_instances = 5
[[migrations]]
tag = "v1"
new_sqlite_classes = ["MyAgentServer", "Sandbox"]
# Optional: R2 bucket for snapshot/restore
[[r2_buckets]]
binding = "BACKUPS"
bucket_name = "my-sandbox-backups"The Dockerfile must be supplied by you — copy the reference from the @cloudflare/sandbox repo. It supports several variants (default, python, opencode, desktop).
Worker re-export
Wrangler needs the Sandbox class to be exported from your Worker entry:
// worker.ts
export { Sandbox } from '@cloudflare/sandbox';
export { MyAgentServer } from './my-agent-server.js';If you want preview URLs from inside the sandbox, also call proxyToSandbox in your fetch handler — see the Cloudflare Sandbox SDK docs for details.
Provider config
interface CloudflareSandboxWorkspaceConfig {
kind: 'cloudflare-sandbox';
/**
* Override the sandbox ID. Defaults to the session ID.
*
* SECURITY (round-4 A6): an explicit `id` shared across sessions causes
* EVERY session opening this workspace to attach to the SAME container.
* That's a silent cross-tenant data leak — session A's writes are visible
* to session B's reads. Setting `id` REQUIRES the companion `shareAcrossSessions: true`
* flag; otherwise `open()` throws. See "Cross-session sharing" below.
*/
id?: string;
/**
* Opt-in acknowledgement that an explicit `id` is intentionally shared
* across sessions. Default `false`. When `id` is set and this flag is
* not `true`, `open()` throws with a clear cross-tenant warning.
*/
shareAcrossSessions?: boolean;
/** R2 binding name for backups. Required if capabilities.snapshot is true. */
backupR2Binding?: string;
/** Hostname for preview URLs (reserved; not surfaced in v1 modules). */
hostname?: string;
/** When true, close() calls sandbox.destroy(). Default: false (relies on sleepAfter). */
destroyOnClose?: boolean;
/**
* Idle timeout. When set, forwarded to @cloudflare/sandbox's sleepAfter.
* When unset, the provider does NOT pass the option through and the bundled
* @cloudflare/sandbox SDK applies its own default (currently 10 minutes for
* the bundled version — check the sandbox SDK release notes if the precise
* default matters to you).
*/
sleepAfter?: string | number;
/** Working directory inside the container. Default: '/workspace'. */
workspaceDir?: string;
/** Directory snapshot() archives. Defaults to workspaceDir. */
snapshotDir?: string;
/** Env vars forwarded into the container. */
envVars?: Record<string, string>;
/** Languages exposed by the code interpreter. Default: ['python', 'javascript']. */
codeLanguages?: readonly string[];
/** Whether the code interpreter supports persistent contexts. Default: false. */
codeStateful?: boolean;
}Provider wiring
import {
AgentRegistry,
createAgentServer,
CloudflareSandboxWorkspaceProvider,
} from '@helix-agents/runtime-cloudflare';
import type { WorkspaceProvider } from '@helix-agents/core';
import type { Sandbox } from '@cloudflare/sandbox';
export { Sandbox } from '@cloudflare/sandbox';
interface Env {
AGENTS: DurableObjectNamespace;
SANDBOX: DurableObjectNamespace<Sandbox>;
BACKUPS?: R2Bucket;
OPENAI_API_KEY: string;
}
export const MyAgentServer = createAgentServer<Env>({
llmAdapter: (env) => /* ... */,
agents: registry,
workspaceProviders: (env) =>
new Map<string, WorkspaceProvider>([
[
'cloudflare-sandbox',
new CloudflareSandboxWorkspaceProvider({
namespace: env.SANDBOX,
backupBuckets: env.BACKUPS ? { BACKUPS: env.BACKUPS } : undefined,
}),
],
]),
});The provider takes { namespace, backupBuckets?, shellConstraints?, maxGlobalConcurrentOpens? } — the namespace points at the Sandbox DO, backupBuckets is a name → bucket map used to resolve config.backupR2Binding at open() time, shellConstraints carries the SAME allowlist + maxDuration policy as the agent's per-workspace capabilities.shell config (round-4 A2: defense-in-depth so direct ws.shell.run() calls from custom user tools honor the same policy as the auto-injected run tool), and maxGlobalConcurrentOpens (round-5 B2) bounds concurrent open() and resolve() calls into THIS provider across ALL sessions sharing it.
Tuning concurrency (round-5 B2)
Two layered semaphores bound concurrent opens:
- Per-session —
WorkspaceRegistryDeps.maxConcurrentOpens(orJSAgentExecutor/DurableObjectAgentConfig.workspaceMaxConcurrentOpens). Bounds opens for ONE session — useful when an agent declares many workspaces against the same provider. - Per-tenant / per-process —
CloudflareSandboxWorkspaceProviderOptions.maxGlobalConcurrentOpens. Bounds opens ACROSS all sessions sharing the provider instance. Set to match the Sandbox DO binding'smax_instances(often 50). Without it, a tenant with 1000 active sessions ×maxConcurrentOpens: 5collectively fires up to 5000 concurrent opens — far past the binding's quota, triggering cascading transient errors and amplifying load via retries.
Set both. The per-session limit ensures fairness across workspaces in one session; the per-tenant limit prevents tenant-wide back-pressure failures when many sessions burst-open simultaneously (notably during a CF deployment rollout — every DO's first agent op resolves all its workspace refs).
Cross-session sharing (round-4 A6 — cross-tenant footgun)
By default each session opens its own sandbox (sandbox ID = sessionId). Setting config.id to a fixed value means EVERY session opening this workspace attaches to the SAME container — fine for an admin tool that wants persistent state across users, but a silent cross-tenant data leak in any multi-user deployment.
The shareAcrossSessions: true flag is required to opt in:
// REJECTED at open() — shared id without the explicit flag.
{ kind: 'cloudflare-sandbox', id: 'shared-sandbox' }
// ACCEPTED — explicit acknowledgement that this is intentional.
{ kind: 'cloudflare-sandbox', id: 'shared-sandbox', shareAcrossSessions: true }When the flag is set, open() ALSO emits a logger.warn('cloudflare-sandbox: shared workspace id detected', ...) for the audit trail. Pre-fix, the cross-session sharing was silent — code review couldn't catch the misconfig because it was indistinguishable from the default. Post-fix, the explicit-opt-in pattern makes accidental tenancy bleeds impossible.
The same pattern applies to CloudflareFileStoreWorkspaceConfig.namespace (see cloudflare-filestore.md).
Lifecycle
open()— callsgetSandbox(namespace, sandboxId)to obtain a stub. ConfiguressleepAfterandenvVarsif specified. Constructs all four module adapters (fs, shell, code, snapshot) regardless of capabilities — declared capabilities drive tool injection, not module construction.resolve()— re-attaches viagetSandbox(namespace, sandboxId). Sandbox DO is INDEPENDENT of the agent DO, so sessions survive agent-DO hibernation cleanly: agent wakes, callsresolve(), gets a stub to the same persistent sandbox.close()— by default a no-op (the sandbox idle-shuts-down viasleepAfter). WithdestroyOnClose: true, callssandbox.destroy()to permanently tear the container down (one-shot agent run pattern).
Cost notes
- Container cold start is ~2–3 seconds on first request (Firecracker microVM boot).
sleepAftercontrols when the container suspends after idle. Default'10m'. Lower values save money; higher values reduce cold-start latency.destroyOnClose: truekills the container at session end. Use for one-shot workloads (agent runs once, never resumed). Defaultfalse(preserve container for fast resume).
Concurrency: max_instances interaction (round-4 cluster C)
The max_instances knob on the [[containers]] Wrangler binding caps how many container instances a single Sandbox DO can hold concurrently. With max_instances = 5, an agent that declares 100 workspaces and runs workspaceOpenStrategy: 'eager' will hit cascading failures — the registry's openAll() fires 100 concurrent opens via Promise.all, but only 5 can land at once.
The fix: set workspaceMaxConcurrentOpens on the executor to match the binding's max_instances:
export const MyAgentServer = createAgentServer<Env>({
workspaceProviders: (env, ctx) => new Map([[/* ... */]]),
workspaceMaxConcurrentOpens: 5, // match max_instances
});The registry then funnels opens through a semaphore — at most N in flight at any moment. This keeps an agent declaring many workspaces from cascade-failing the binding's quota.
Snapshot semantics
snapshot() calls sandbox.createBackup({ dir: snapshotDir }) which archives the directory to R2.
restore(ref) and branch(ref) create a NEW sandbox ID ({originId}-restored-{shortId} or -branch-), obtain a stub to that new sandbox, and call restoreBackup on it. Both return a fresh WorkspaceRef pointing at the new sandbox — the Snapshotter module treats snapshots as forks rather than mutations.
The original sandbox is unchanged after a restore/branch. See the Snapshotter module for the full semantics.
backupR2Binding is required when declaring capabilities.snapshot: true. Without it, snapshot() throws at call time.
Code interpreter
Two modes:
- Stateless (
codeStateful: false, default): eachrunCodecall is independent. The LLM seesworkspace__<name>__run_code(language, code). - Stateful (
codeStateful: true): persistent Jupyter-style contexts. The LLM seescreate_code_context,run_in_code_context,delete_code_contexttools too — variables persist acrossrun_in_code_contextcalls within a context.
codeLanguages declares which languages the LLM may request. Default: ['python', 'javascript']. The container image must support whatever languages you declare.
Auto-injected tools
All four module surfaces:
- fs:
read_file,write_file,edit_file,ls,glob,grep,stat,mkdir,rm— see FileSystem module. - shell:
run— see Shell module. - code:
run_code, pluscreate_code_context/run_in_code_context/delete_code_contextwhencodeStateful: true— see CodeInterpreter module. - snapshot:
snapshot,restore,branch— see Snapshotter module.
Observability
The provider accepts an optional Logger from @helix-agents/core so workspace-side events surface in your logging pipeline:
import { consoleLogger } from '@helix-agents/core';
new CloudflareSandboxWorkspaceProvider({
namespace: env.SANDBOX,
backupBuckets: env.BACKUPS ? { BACKUPS: env.BACKUPS } : undefined,
logger: consoleLogger, // pino, winston, or any { info, warn, error } shape
});Defaults to silent (noopLogger). The provider currently emits info/warn entries during sandbox lifecycle transitions and is wired so future security-boundary additions surface without an API change.
Using the workspace from a custom tool
import { defineTool } from '@helix-agents/core';
import { z } from 'zod';
const runPython = defineTool({
name: 'count_lines',
parameters: z.object({ path: z.string() }),
execute: async (input, ctx) => {
const ws = await ctx.workspaces!.get('box');
if (!ws.code) throw new Error('box workspace requires code capability');
const result = await ws.code.runCode(
'python',
`print(sum(1 for _ in open(${JSON.stringify(input.path)})))`,
);
return { exitCode: result.exitCode, outputs: result.outputs };
},
});See the shared pattern on the overview page — await on get() is required, and the ! non-null assertion on ctx.workspaces is appropriate when the agent declares a workspace.
Inspecting a workspace
The container's filesystem is queryable via the shell, so the simplest path is a custom debug tool:
const inspect = defineTool({
name: 'inspect_workspace',
parameters: z.object({ path: z.string().default('/workspace') }),
execute: async (input, ctx) => {
const ws = await ctx.workspaces!.get('box');
const result = await ws.shell!.run(`ls -la ${input.path}`);
return { listing: new TextDecoder().decode(result.stdout) };
},
});There is no out-of-band path to peek at a running container's filesystem — the sandbox lives inside the Sandbox DO; ws.shell.run('ls /workspace') (or any other shell command) is the supported inspection surface.
Mid-run inspection (active sessions)
Inspecting an ACTIVE session needs care: the agent may be writing while you read.
- Recommended for active sessions. The custom debug-tool path above (in-agent) — the read happens inside the same step the agent owns, so no race.
- From operator code with binding access. If you have direct access to the Sandbox DO namespace (operator console, admin endpoint), you can call the underlying Sandbox DO directly:typescriptUse a read-only command (
const sandbox = getSandbox(env.SANDBOX, sandboxId); // Pick the read-only RPC the @cloudflare/sandbox SDK exposes // for shell-like inspection (e.g. an `exec`-style or `run`-style call). const result = await sandbox.run('ls -la /workspace');ls,cat,grep) for safety — write operations would race the agent. - Container-state safety. Even read-only ops cause the container to wake from
sleepAfterhibernation, briefly changing its state visible to the agent. For most workloads this is fine; for cost-sensitive flows where every wake matters, prefer the in-agent debug tool. - For after-completion inspection. Either approach is safe; the agent has stopped writing.
Capacity & performance
These are approximate ranges; benchmark for your workload.
| Dimension | Approximate range | Notes |
|---|---|---|
| Container cold start | ~2–3 s | Firecracker microVM boot. First request after wake. |
| Warm fs/shell op latency | ~50ms (single-digit-tens) | Round-trip into the Sandbox DO + container. |
Code interpreter runCode | Varies | Dominated by language runtime startup unless codeStateful: true keeps a context warm. |
| Concurrent containers per Sandbox DO | max_instances (Wrangler binding cap; commonly 5) | The hard upper bound. |
| Concurrent workspaces per agent | Effectively unbounded (logical) | The constraint is max_instances, not the agent layer. |
| Snapshot size | R2-limited | createBackup archives snapshotDir; archive size depends on workload. |
max_instances is the real bound. When an agent declares many workspaces, the Sandbox DO binding's max_instances caps how many containers can co-exist. Coordinate with Cluster C's workspaceMaxConcurrentOpens — see the next section.
Path scoping
All workspace fs operations are scoped to workspaceDir (default /workspace). Round-4 cluster A enforces this scoping in the FileSystem adapter:
- All FS methods require paths inside
workspaceDir. - Out-of-scope paths throw
WorkspaceFailedError("path X is outside workspace root Y"). - Path normalization:
..segments and symlinks are resolved before the scope check. - Custom tools using
ws.fs!.readFile()(etc.) get the same scoping — the adapter is the same instance.
The shell capability does NOT enforce path scoping (a shell command is the user's escape hatch). Combining shell: true with untrusted input means the sandbox boundary is your security boundary; the workspaceDir scope is the FS boundary, not the container boundary.
Restart behavior
When an agent DO restarts (deployment rollout, eviction, code reload), the workspace is re-attached lazily on the first agent operation that triggers provider.resolve() for each persisted ref. For an agent DO with N persisted sandbox refs, the first operation post-restart issues up to N parallel getSandbox(NS, sandboxId) RPCs.
Thundering-herd risk during rollout. A platform-wide deployment rollout simultaneously restarts many agent DOs; each DO's first operation initiates its own resolve burst. Multiplied across many DOs, this is a classic thundering-herd against the Sandbox DO namespace.
The per-resolve cost is meaningful here (potentially a Sandbox DO wake), so the herd amplitude matters.
Recommended mitigation.
- Set
workspaceMaxConcurrentOpensto the binding'smax_instances(e.g. 5). The registry funnels resolves through a semaphore — at most N concurrent per DO. This caps the per-DO burst and gives the upstream Sandbox DO time to absorb each resolve. - Combine with
WorkspaceMetricsto alert on resolve-latency spikes during rollout windows.
Filed as follow-up: registry-side jitter on the first lazy resolve after recovery to spread the per-DO burst across a few hundred ms — would smooth the rollout further without operator action.
Operator visibility into hibernated containers (round-5 D12)
The framework does NOT enumerate hibernated containers. registry.describe() only surfaces the workspaces declared by an ACTIVE session — once a session ends and the registry closes, the underlying Sandbox DO may continue to hold a hibernated container (subject to sleepAfter) but the framework has no view of it.
Where to look. Use Cloudflare-side observability for hibernated container counts:
- Cloudflare dashboard. The Workers Container view shows DO instance counts and per-instance state. Hibernated containers count toward your
max_instancesbudget until they're destroyed. - DO RPC (advanced). If you have direct access to the Sandbox DO namespace, calling
getSandbox(NS, sandboxId).status()(or whatever read-only RPC the SDK version exposes) returns the container's current state. Loop the namespace's known IDs to enumerate. destroyOnClose: trueis the only framework-side knob that ensures containers don't persist past session end — set it for one-shot agent runs.
Filed as known follow-up: a registry-level
listHibernatedContainers()helper that surfaces orphaned containers. Until it lands, operator visibility into hibernated containers lives in the CF dashboard, not the framework.
Limitations
- Workflows runtime not supported. Workspaces require the DO runtime (
createAgentServer); the Workflows runtime now fails fast at agent registration when workspaces are declared. See the Workflows runtime page. - Cross-DO sharing not supported. Sandboxes are session-scoped; one session = one sandbox.
- Reserved modules absent.
Desktop,Git,Netare reserved in core types but not implemented in v1. The library supports them; integration is deferred. - Hibernated container enumeration not surfaced. See Operator visibility into hibernated containers above.