Skip to content

Cloudflare Sandbox Workspace

The CloudflareSandboxWorkspace is the full-featured Cloudflare provider — a real Linux container (Workers Container, Firecracker microVM) backing a complete fs + shell + code + snapshot workspace surface. Use this when your agent needs to execute untrusted code, run shell commands, or snapshot state.

When to use

  • Agents that need to execute code (Python, JavaScript) the LLM produces.
  • Agents that need a real shell (grep, find, git, npm install).
  • Agents that need to snapshot state for branch/restore patterns.
  • Any production workload where untrusted input might end up in shell commands or code.

If you only need durable file storage (no shell, no code), use the lighter Cloudflare Filestore instead.

Capabilities supported

CapabilitySupported
fs
shell✅ (with real-time stdout/stderr streaming)
code✅ (Python, JavaScript; optional persistent contexts)
snapshot✅ (R2-backed; restore creates NEW sandbox)

All four capabilities are supported, so capability-mismatch errors are unlikely on this provider — but the same WorkspaceFailedError at session start applies if your config declares a capability the provider hasn't been configured for (e.g. snapshot without backupR2Binding). See the error-model table on the workspaces overview.

Optional peer dependency

@cloudflare/sandbox is heavy (Cloudflare Containers + sandbox runtime). It's an OPTIONAL peer dep on @helix-agents/runtime-cloudflare — users who only want the filestore workspace pay zero install cost.

bash
npm install @cloudflare/sandbox@0.8.11

The version is pinned EXACTLY (no caret) because the package is currently experimental and the API has been moving in 0.x. Bump consciously when you need a new version.

Wrangler setup

The sandbox lives in its OWN Durable Object (separate from the agent DO). You declare the Sandbox DO + the container binding + the agent DO + (optionally) an R2 bucket for backups.

toml
# Agent DO (your AgentServer subclass)
[[durable_objects.bindings]]
name = "AGENTS"
class_name = "MyAgentServer"

# Sandbox DO (re-exported from @cloudflare/sandbox)
[[durable_objects.bindings]]
name = "SANDBOX"
class_name = "Sandbox"

# Container binding (Workers Container)
[[containers]]
class_name = "Sandbox"
image = "./Dockerfile"        # User-supplied; copy from @cloudflare/sandbox repo
max_instances = 5

[[migrations]]
tag = "v1"
new_sqlite_classes = ["MyAgentServer", "Sandbox"]

# Optional: R2 bucket for snapshot/restore
[[r2_buckets]]
binding = "BACKUPS"
bucket_name = "my-sandbox-backups"

The Dockerfile must be supplied by you — copy the reference from the @cloudflare/sandbox repo. It supports several variants (default, python, opencode, desktop).

Worker re-export

Wrangler needs the Sandbox class to be exported from your Worker entry:

typescript
// worker.ts
export { Sandbox } from '@cloudflare/sandbox';
export { MyAgentServer } from './my-agent-server.js';

If you want preview URLs from inside the sandbox, also call proxyToSandbox in your fetch handler — see the Cloudflare Sandbox SDK docs for details.

Provider config

typescript
interface CloudflareSandboxWorkspaceConfig {
  kind: 'cloudflare-sandbox';
  /**
   * Override the sandbox ID. Defaults to the session ID.
   *
   * SECURITY (round-4 A6): an explicit `id` shared across sessions causes
   * EVERY session opening this workspace to attach to the SAME container.
   * That's a silent cross-tenant data leak — session A's writes are visible
   * to session B's reads. Setting `id` REQUIRES the companion `shareAcrossSessions: true`
   * flag; otherwise `open()` throws. See "Cross-session sharing" below.
   */
  id?: string;
  /**
   * Opt-in acknowledgement that an explicit `id` is intentionally shared
   * across sessions. Default `false`. When `id` is set and this flag is
   * not `true`, `open()` throws with a clear cross-tenant warning.
   */
  shareAcrossSessions?: boolean;
  /** R2 binding name for backups. Required if capabilities.snapshot is true. */
  backupR2Binding?: string;
  /** Hostname for preview URLs (reserved; not surfaced in v1 modules). */
  hostname?: string;
  /** When true, close() calls sandbox.destroy(). Default: false (relies on sleepAfter). */
  destroyOnClose?: boolean;
  /**
   * Idle timeout. When set, forwarded to @cloudflare/sandbox's sleepAfter.
   * When unset, the provider does NOT pass the option through and the bundled
   * @cloudflare/sandbox SDK applies its own default (currently 10 minutes for
   * the bundled version — check the sandbox SDK release notes if the precise
   * default matters to you).
   */
  sleepAfter?: string | number;
  /** Working directory inside the container. Default: '/workspace'. */
  workspaceDir?: string;
  /** Directory snapshot() archives. Defaults to workspaceDir. */
  snapshotDir?: string;
  /** Env vars forwarded into the container. */
  envVars?: Record<string, string>;
  /** Languages exposed by the code interpreter. Default: ['python', 'javascript']. */
  codeLanguages?: readonly string[];
  /** Whether the code interpreter supports persistent contexts. Default: false. */
  codeStateful?: boolean;
}

Provider wiring

typescript
import {
  AgentRegistry,
  createAgentServer,
  CloudflareSandboxWorkspaceProvider,
} from '@helix-agents/runtime-cloudflare';
import type { WorkspaceProvider } from '@helix-agents/core';
import type { Sandbox } from '@cloudflare/sandbox';
export { Sandbox } from '@cloudflare/sandbox';

interface Env {
  AGENTS: DurableObjectNamespace;
  SANDBOX: DurableObjectNamespace<Sandbox>;
  BACKUPS?: R2Bucket;
  OPENAI_API_KEY: string;
}

export const MyAgentServer = createAgentServer<Env>({
  llmAdapter: (env) => /* ... */,
  agents: registry,
  workspaceProviders: (env) =>
    new Map<string, WorkspaceProvider>([
      [
        'cloudflare-sandbox',
        new CloudflareSandboxWorkspaceProvider({
          namespace: env.SANDBOX,
          backupBuckets: env.BACKUPS ? { BACKUPS: env.BACKUPS } : undefined,
        }),
      ],
    ]),
});

The provider takes { namespace, backupBuckets?, shellConstraints?, maxGlobalConcurrentOpens? } — the namespace points at the Sandbox DO, backupBuckets is a name → bucket map used to resolve config.backupR2Binding at open() time, shellConstraints carries the SAME allowlist + maxDuration policy as the agent's per-workspace capabilities.shell config (round-4 A2: defense-in-depth so direct ws.shell.run() calls from custom user tools honor the same policy as the auto-injected run tool), and maxGlobalConcurrentOpens (round-5 B2) bounds concurrent open() and resolve() calls into THIS provider across ALL sessions sharing it.

Tuning concurrency (round-5 B2)

Two layered semaphores bound concurrent opens:

  • Per-sessionWorkspaceRegistryDeps.maxConcurrentOpens (or JSAgentExecutor / DurableObjectAgentConfig.workspaceMaxConcurrentOpens). Bounds opens for ONE session — useful when an agent declares many workspaces against the same provider.
  • Per-tenant / per-processCloudflareSandboxWorkspaceProviderOptions.maxGlobalConcurrentOpens. Bounds opens ACROSS all sessions sharing the provider instance. Set to match the Sandbox DO binding's max_instances (often 50). Without it, a tenant with 1000 active sessions × maxConcurrentOpens: 5 collectively fires up to 5000 concurrent opens — far past the binding's quota, triggering cascading transient errors and amplifying load via retries.

Set both. The per-session limit ensures fairness across workspaces in one session; the per-tenant limit prevents tenant-wide back-pressure failures when many sessions burst-open simultaneously (notably during a CF deployment rollout — every DO's first agent op resolves all its workspace refs).

Cross-session sharing (round-4 A6 — cross-tenant footgun)

By default each session opens its own sandbox (sandbox ID = sessionId). Setting config.id to a fixed value means EVERY session opening this workspace attaches to the SAME container — fine for an admin tool that wants persistent state across users, but a silent cross-tenant data leak in any multi-user deployment.

The shareAcrossSessions: true flag is required to opt in:

typescript
// REJECTED at open() — shared id without the explicit flag.
{ kind: 'cloudflare-sandbox', id: 'shared-sandbox' }

// ACCEPTED — explicit acknowledgement that this is intentional.
{ kind: 'cloudflare-sandbox', id: 'shared-sandbox', shareAcrossSessions: true }

When the flag is set, open() ALSO emits a logger.warn('cloudflare-sandbox: shared workspace id detected', ...) for the audit trail. Pre-fix, the cross-session sharing was silent — code review couldn't catch the misconfig because it was indistinguishable from the default. Post-fix, the explicit-opt-in pattern makes accidental tenancy bleeds impossible.

The same pattern applies to CloudflareFileStoreWorkspaceConfig.namespace (see cloudflare-filestore.md).

Lifecycle

  • open() — calls getSandbox(namespace, sandboxId) to obtain a stub. Configures sleepAfter and envVars if specified. Constructs all four module adapters (fs, shell, code, snapshot) regardless of capabilities — declared capabilities drive tool injection, not module construction.
  • resolve() — re-attaches via getSandbox(namespace, sandboxId). Sandbox DO is INDEPENDENT of the agent DO, so sessions survive agent-DO hibernation cleanly: agent wakes, calls resolve(), gets a stub to the same persistent sandbox.
  • close() — by default a no-op (the sandbox idle-shuts-down via sleepAfter). With destroyOnClose: true, calls sandbox.destroy() to permanently tear the container down (one-shot agent run pattern).

Cost notes

  • Container cold start is ~2–3 seconds on first request (Firecracker microVM boot).
  • sleepAfter controls when the container suspends after idle. Default '10m'. Lower values save money; higher values reduce cold-start latency.
  • destroyOnClose: true kills the container at session end. Use for one-shot workloads (agent runs once, never resumed). Default false (preserve container for fast resume).

Concurrency: max_instances interaction (round-4 cluster C)

The max_instances knob on the [[containers]] Wrangler binding caps how many container instances a single Sandbox DO can hold concurrently. With max_instances = 5, an agent that declares 100 workspaces and runs workspaceOpenStrategy: 'eager' will hit cascading failures — the registry's openAll() fires 100 concurrent opens via Promise.all, but only 5 can land at once.

The fix: set workspaceMaxConcurrentOpens on the executor to match the binding's max_instances:

typescript
export const MyAgentServer = createAgentServer<Env>({
  workspaceProviders: (env, ctx) => new Map([[/* ... */]]),
  workspaceMaxConcurrentOpens: 5, // match max_instances
});

The registry then funnels opens through a semaphore — at most N in flight at any moment. This keeps an agent declaring many workspaces from cascade-failing the binding's quota.

Snapshot semantics

snapshot() calls sandbox.createBackup({ dir: snapshotDir }) which archives the directory to R2.

restore(ref) and branch(ref) create a NEW sandbox ID ({originId}-restored-{shortId} or -branch-), obtain a stub to that new sandbox, and call restoreBackup on it. Both return a fresh WorkspaceRef pointing at the new sandbox — the Snapshotter module treats snapshots as forks rather than mutations.

The original sandbox is unchanged after a restore/branch. See the Snapshotter module for the full semantics.

backupR2Binding is required when declaring capabilities.snapshot: true. Without it, snapshot() throws at call time.

Code interpreter

Two modes:

  • Stateless (codeStateful: false, default): each runCode call is independent. The LLM sees workspace__<name>__run_code(language, code).
  • Stateful (codeStateful: true): persistent Jupyter-style contexts. The LLM sees create_code_context, run_in_code_context, delete_code_context tools too — variables persist across run_in_code_context calls within a context.

codeLanguages declares which languages the LLM may request. Default: ['python', 'javascript']. The container image must support whatever languages you declare.

Auto-injected tools

All four module surfaces:

Observability

The provider accepts an optional Logger from @helix-agents/core so workspace-side events surface in your logging pipeline:

typescript
import { consoleLogger } from '@helix-agents/core';

new CloudflareSandboxWorkspaceProvider({
  namespace: env.SANDBOX,
  backupBuckets: env.BACKUPS ? { BACKUPS: env.BACKUPS } : undefined,
  logger: consoleLogger,  // pino, winston, or any { info, warn, error } shape
});

Defaults to silent (noopLogger). The provider currently emits info/warn entries during sandbox lifecycle transitions and is wired so future security-boundary additions surface without an API change.

Using the workspace from a custom tool

typescript
import { defineTool } from '@helix-agents/core';
import { z } from 'zod';

const runPython = defineTool({
  name: 'count_lines',
  parameters: z.object({ path: z.string() }),
  execute: async (input, ctx) => {
    const ws = await ctx.workspaces!.get('box');
    if (!ws.code) throw new Error('box workspace requires code capability');
    const result = await ws.code.runCode(
      'python',
      `print(sum(1 for _ in open(${JSON.stringify(input.path)})))`,
    );
    return { exitCode: result.exitCode, outputs: result.outputs };
  },
});

See the shared pattern on the overview pageawait on get() is required, and the ! non-null assertion on ctx.workspaces is appropriate when the agent declares a workspace.

Inspecting a workspace

The container's filesystem is queryable via the shell, so the simplest path is a custom debug tool:

typescript
const inspect = defineTool({
  name: 'inspect_workspace',
  parameters: z.object({ path: z.string().default('/workspace') }),
  execute: async (input, ctx) => {
    const ws = await ctx.workspaces!.get('box');
    const result = await ws.shell!.run(`ls -la ${input.path}`);
    return { listing: new TextDecoder().decode(result.stdout) };
  },
});

There is no out-of-band path to peek at a running container's filesystem — the sandbox lives inside the Sandbox DO; ws.shell.run('ls /workspace') (or any other shell command) is the supported inspection surface.

Mid-run inspection (active sessions)

Inspecting an ACTIVE session needs care: the agent may be writing while you read.

  • Recommended for active sessions. The custom debug-tool path above (in-agent) — the read happens inside the same step the agent owns, so no race.
  • From operator code with binding access. If you have direct access to the Sandbox DO namespace (operator console, admin endpoint), you can call the underlying Sandbox DO directly:
    typescript
    const sandbox = getSandbox(env.SANDBOX, sandboxId);
    // Pick the read-only RPC the @cloudflare/sandbox SDK exposes
    // for shell-like inspection (e.g. an `exec`-style or `run`-style call).
    const result = await sandbox.run('ls -la /workspace');
    Use a read-only command (ls, cat, grep) for safety — write operations would race the agent.
  • Container-state safety. Even read-only ops cause the container to wake from sleepAfter hibernation, briefly changing its state visible to the agent. For most workloads this is fine; for cost-sensitive flows where every wake matters, prefer the in-agent debug tool.
  • For after-completion inspection. Either approach is safe; the agent has stopped writing.

Capacity & performance

These are approximate ranges; benchmark for your workload.

DimensionApproximate rangeNotes
Container cold start~2–3 sFirecracker microVM boot. First request after wake.
Warm fs/shell op latency~50ms (single-digit-tens)Round-trip into the Sandbox DO + container.
Code interpreter runCodeVariesDominated by language runtime startup unless codeStateful: true keeps a context warm.
Concurrent containers per Sandbox DOmax_instances (Wrangler binding cap; commonly 5)The hard upper bound.
Concurrent workspaces per agentEffectively unbounded (logical)The constraint is max_instances, not the agent layer.
Snapshot sizeR2-limitedcreateBackup archives snapshotDir; archive size depends on workload.

max_instances is the real bound. When an agent declares many workspaces, the Sandbox DO binding's max_instances caps how many containers can co-exist. Coordinate with Cluster C's workspaceMaxConcurrentOpens — see the next section.

Path scoping

All workspace fs operations are scoped to workspaceDir (default /workspace). Round-4 cluster A enforces this scoping in the FileSystem adapter:

  • All FS methods require paths inside workspaceDir.
  • Out-of-scope paths throw WorkspaceFailedError("path X is outside workspace root Y").
  • Path normalization: .. segments and symlinks are resolved before the scope check.
  • Custom tools using ws.fs!.readFile() (etc.) get the same scoping — the adapter is the same instance.

The shell capability does NOT enforce path scoping (a shell command is the user's escape hatch). Combining shell: true with untrusted input means the sandbox boundary is your security boundary; the workspaceDir scope is the FS boundary, not the container boundary.

Restart behavior

When an agent DO restarts (deployment rollout, eviction, code reload), the workspace is re-attached lazily on the first agent operation that triggers provider.resolve() for each persisted ref. For an agent DO with N persisted sandbox refs, the first operation post-restart issues up to N parallel getSandbox(NS, sandboxId) RPCs.

Thundering-herd risk during rollout. A platform-wide deployment rollout simultaneously restarts many agent DOs; each DO's first operation initiates its own resolve burst. Multiplied across many DOs, this is a classic thundering-herd against the Sandbox DO namespace.

The per-resolve cost is meaningful here (potentially a Sandbox DO wake), so the herd amplitude matters.

Recommended mitigation.

  • Set workspaceMaxConcurrentOpens to the binding's max_instances (e.g. 5). The registry funnels resolves through a semaphore — at most N concurrent per DO. This caps the per-DO burst and gives the upstream Sandbox DO time to absorb each resolve.
  • Combine with WorkspaceMetrics to alert on resolve-latency spikes during rollout windows.

Filed as follow-up: registry-side jitter on the first lazy resolve after recovery to spread the per-DO burst across a few hundred ms — would smooth the rollout further without operator action.

Operator visibility into hibernated containers (round-5 D12)

The framework does NOT enumerate hibernated containers. registry.describe() only surfaces the workspaces declared by an ACTIVE session — once a session ends and the registry closes, the underlying Sandbox DO may continue to hold a hibernated container (subject to sleepAfter) but the framework has no view of it.

Where to look. Use Cloudflare-side observability for hibernated container counts:

  • Cloudflare dashboard. The Workers Container view shows DO instance counts and per-instance state. Hibernated containers count toward your max_instances budget until they're destroyed.
  • DO RPC (advanced). If you have direct access to the Sandbox DO namespace, calling getSandbox(NS, sandboxId).status() (or whatever read-only RPC the SDK version exposes) returns the container's current state. Loop the namespace's known IDs to enumerate.
  • destroyOnClose: true is the only framework-side knob that ensures containers don't persist past session end — set it for one-shot agent runs.

Filed as known follow-up: a registry-level listHibernatedContainers() helper that surfaces orphaned containers. Until it lands, operator visibility into hibernated containers lives in the CF dashboard, not the framework.

Limitations

  • Workflows runtime not supported. Workspaces require the DO runtime (createAgentServer); the Workflows runtime now fails fast at agent registration when workspaces are declared. See the Workflows runtime page.
  • Cross-DO sharing not supported. Sandboxes are session-scoped; one session = one sandbox.
  • Reserved modules absent. Desktop, Git, Net are reserved in core types but not implemented in v1. The library supports them; integration is deferred.
  • Hibernated container enumeration not surfaced. See Operator visibility into hibernated containers above.

Source

Released under the MIT License.