Skip to content

Skills

Skills give an agent a library of specialized capabilities — workflows, runbooks, reference protocols — without paying for all of them on every request. They implement Anthropic's Agent Skills pattern (3-level progressive disclosure) on Helix's append-only, cacheable substrate.

Why skills

A capable agent often needs many specialized playbooks: "how to process a PDF", "how to run the deploy runbook", "the schema for the billing API". Stuffing every playbook into the system prompt works, but it costs tokens on every single request — even the turns where none of them are relevant — and it bloats the context the model has to reason over.

Skills solve this by disclosing capability progressively:

  • The model always sees a small catalog — just each skill's name + description (tens of tokens each).
  • It loads a skill's full instructions only when a task actually matches.
  • It reads a skill's bundled resource files only when those instructions point at them.

Because every load is append-only (catalog in the cached system-prompt prefix; bodies and files arrive as tool results), enabling skills does not invalidate prompt caching. You get a large capability library at near-zero idle token cost.

Skills disclose content; they do not execute code. Level-3 "scripts" are readable text — running them is delegated to the agent's own shell/workspace tools. This keeps the feature decoupled from any execution environment, so it works on every runtime.

The 3-level progressive-disclosure model

mermaid
graph TB
    L1["Level 1 — Catalog (always resident)<br/>name + description for every skill<br/>rendered into the system prompt"]
    L2["Level 2 — Body (loaded on demand)<br/>full skill instructions<br/>returned by load_skill"]
    L3["Level 3 — Resource files (read on demand)<br/>references/ scripts/ assets/<br/>returned by read_skill_file"]
    L1 -->|"model matches a task<br/>calls load_skill(name)"| L2
    L2 -->|"body points at a file<br/>calls read_skill_file(skill, path)"| L3
  1. Level 1 — catalog (always resident). Every skill's name + description is rendered into a ## Skills section appended to the system prompt. The catalog is deterministically sorted by name so it is byte-stable within a session — it lives in the cached prefix and never invalidates it. The catalog carries a load-bearing guardrail ("A Skill is NOT a tool — call load_skill"), because models otherwise try to invoke skill names as if they were tools.
  2. Level 2 — body (loaded on demand). The full skill body is loaded by the auto-injected load_skill tool, which returns the body as the tool result — the model acts on it in the same continuation, with no wasted round-trip. Because tool results only ever append to history, this is append-only and cache-safe. load_skill also emits an informational skill_loaded custom stream event.
  3. Level 3 — resource files (read on demand). Bundled reference/script/asset files are read by the auto-injected read_skill_file tool, which supports optional startLine/endLine ranges and a path-traversal guard.

Providers

Skills resolve to plain data behind a small async interface (SkillProvider). Two providers ship in v1.

ProviderPackageBackingRuns on
inCodeSkillProvider@helix-agents/coreTypeScript data bundled with the agentEverywhere (Workers-safe)
fileSystemSkillProvider@helix-agents/skill-fsSKILL.md directories on disk (Anthropic format)Node only

inCodeSkillProvider (the "plain data" mode)

Skills are TypeScript data bundled with the agent. Dependency-free and Workers-safe (no node:fs). Use this on Cloudflare Workers, or anywhere you want skills version-controlled alongside your agent code. You rarely call inCodeSkillProvider directly — passing an array of skill definitions to skills does it for you (see below).

fileSystemSkillProvider (Node only)

Reads Anthropic-format SKILL.md directories from disk. Node only (uses node:fs/promises + yaml); not usable on Cloudflare Workers. Install the package:

bash
npm install @helix-agents/skill-fs

See the @helix-agents/skill-fs reference for options and behavior.

Want to author skills as remote packages but still ship them through the Workers-safe in-code provider? See Loading remote skill packages (build-time bake) below.

Loading remote skill packages (build-time bake)

The two providers above cover skills bundled in your own code and skills read from local disk. To pull in skills published as remote packages — a git repo, or a Claude plugin marketplace (e.g. hyperframes) — use the build-time baker, @helix-agents/skill-cli.

The model is resolve + pin at build time, then ship through the in-code provider:

  1. You declare remote sources, pinned to a version, in a helix.skills.json manifest.
  2. helix-skills sync fetches the selected skills and writes a generated TypeScript module exporting SkillDefinition[] — plus a committed lockfile with a per-skill sha256 integrity hash.
  3. You import { skills } from that module and pass it to defineAgent({ skills }).

Because the output is plain in-code data, there is zero runtime fetch and no node:fs — the baked skills run everywhere, including Cloudflare Workers. The CLI itself is Node-only and runs only at build time; it is never imported by your agent.

bash
npm i -D @helix-agents/skill-cli

helix.skills.json:

json
{
  "skills": {
    "hyperframes": {
      "type": "git",
      "url": "https://github.com/heygen-com/hyperframes.git",
      "version": "0.6.70",
      "include": ["hyperframes", "hyperframes-media"]
    }
  }
}
bash
npx helix-skills sync          # bake → src/skills.generated.ts + helix.skills.lock
npx helix-skills sync --check  # CI: fail if the lockfile would change
typescript
import { defineAgent } from '@helix-agents/core';
import { skills } from './src/skills.generated';

const agent = defineAgent({
  name: 'video-agent',
  systemPrompt: 'You are a video-composition assistant.',
  llmConfig: { model },
  skills, // baked in-code skills — zero runtime fetch, Workers-safe
});

Recommended policy: gitignore the generated src/skills.generated.ts (it is a regenerable build artifact), commit helix.skills.lock, and run helix-skills sync --check in CI so manifest/lockfile drift fails the build. The manifest also supports claude-marketplace sources and version / ref / sha pinning — see the @helix-agents/skill-cli reference for the full manifest + lockfile schemas, the programmatic API (bakeSkills, parseManifest, GitRepoSource, ClaudeMarketplaceSource), and the v1 limitations.

Defining skills

Set AgentConfig.skills to either a SkillProvider or an array of in-code SkillDefinitions. The array form is sugar for inCodeSkillProvider.

In-code (array sugar)

typescript
import { defineAgent } from '@helix-agents/core';

const agent = defineAgent({
  name: 'assistant',
  systemPrompt: 'You are a helpful assistant.',
  llmConfig: { model },
  skills: [
    {
      name: 'pdf-processing',
      description:
        'Extract text and tables from PDFs, fill forms, merge documents. Use when working with PDF files.',
      body: '# PDF processing\n…full instructions…',
    },
  ],
});

When skills is present, the framework appends the catalog to the system prompt and auto-injects load_skill + read_skill_file into the tool list. An empty/unset skills is a total no-op.

defineSkill (validation helper)

defineSkill(def) validates a skill definition (name rules, non-empty body) and returns it unchanged. Useful for defining skills in their own modules with eager validation:

typescript
import { defineSkill } from '@helix-agents/core';

export const deployRunbook = defineSkill({
  name: 'deploy-runbook',
  description:
    'Step-by-step production deploy + rollback procedure. Use when deploying or rolling back a release.',
  body: '# Deploy runbook\n1. …',
  resources: {
    'references/rollback.md': '# Rollback\n…',
    // A lazy loader is also allowed (string | () => string | Promise<string>):
    'scripts/healthcheck.sh': () => readFileSync('./hc.sh', 'utf8'),
  },
});

const agent = defineAgent({ /* … */ skills: [deployRunbook] });

defineAgent() also validates the array form at build time — it throws on an invalid name/description/body or a duplicate skill name.

Filesystem (one line)

typescript
import { fileSystemSkillProvider } from '@helix-agents/skill-fs';

const agent = defineAgent({
  // …
  skills: fileSystemSkillProvider({ roots: ['./skills'] }),
});

The SKILL.md format

fileSystemSkillProvider scans each root for <root>/<skill-name>/SKILL.md. Each SKILL.md is YAML frontmatter + a markdown body, optionally accompanied by references/, scripts/, and assets/ files in the same directory.

skills/
└── pdf-processing/
    ├── SKILL.md
    ├── references/
    │   └── forms.md
    └── scripts/
        └── extract.py
markdown
---
name: pdf-processing
description: Extract text and tables from PDFs, fill forms, merge documents. Use when working with PDF files.
license: MIT
---

# PDF processing

Full instructions for working with PDFs…

Frontmatter fields:

FieldRequiredNotes
nameYesLowercase a-z / 0-9 with single hyphens; ≤64 chars; must equal the directory name.
descriptionYes≤1024 chars. Triggers-only — see writing good descriptions.
licenseNoFree-form string.
compatibilityNoFree-form string (≤500 chars).
metadataNoRecord<string, string>.
allowed-toolsNoOpen-standard field; parsed and carried in v1 but not enforced.

Any file under the skill directory other than SKILL.md is surfaced as a Level-3 resource (skill-relative path). See the @helix-agents/skill-fs reference for the resource read behavior (binary refusal, 64 KB cap, line ranges, traversal guard).

The auto-injected tools

When an agent declares skills, two tools are auto-injected. Their names are reserved — user tools cannot shadow them, and skill names use [a-z0-9-] (no underscores) so they can never collide.

load_skill

load_skill({ name: "pdf-processing" })

Returns the full skill body as the tool result, wrapped in a <skill name="…">…</skill> block (plus a <skill_resources> listing if the skill bundles files). Emits an informational skill_loaded custom stream event ({ name }, surfaced as a data-skill_loaded AI-SDK event) that consumers MAY render. An unknown name returns a not-found message listing the available skills (it does not throw).

read_skill_file

read_skill_file({ skill: "pdf-processing", path: "references/forms.md", startLine: 1, endLine: 40 })

Returns the file contents wrapped in a <skill_file … untrusted="true"> block with a note instructing the model to treat the content as untrusted data. startLine/endLine are optional, 1-indexed, inclusive.

Preloaded skills

Some skills are relevant on every turn (a house style guide, a domain glossary, the one runbook this agent exists to run). For those, skip lazy loading and inject the body up front with preloadSkills.

AgentConfig.preloadSkills?: string[] injects the named skills' full bodies into the system prompt on every step — always in context, no load_skill call needed. The bodies render as an ### Active Skills (already loaded) block inside the same deterministically-sorted, cache-stable fragment as the catalog, so the prefix stays byte-stable per session.

typescript
defineAgent({
  // …
  skills: [deployRunbook, pdfProcessing],
  preloadSkills: ['deploy-runbook'], // body always in context
});

Behavior:

  • Preloaded skills also appear in the loadable catalog marked loaded="true" — a static marker (cache-safe, decided at config time) that tells the model not to reload them, while load_skill remains a recovery path (e.g. if history compaction later drops the system-prompt-injected body, load_skill can re-fetch it).
  • Each name must resolve in the agent's provider. Unknown names warn-and-skip at resolution time — one bad name never crashes the agent or breaks the rest. (defineAgent() additionally throws at build time if a preloadSkills name isn't among the in-code skills array, or if a name is duplicated.)
  • Sub-agents do NOT inherit a parent's preloadSkills (nor its skills) — a sub-agent uses skills only if its own config declares them.

When to preload vs lazy-load. Preload a skill when it is relevant to (nearly) every turn and the body is small enough to justify always-on cost. Lazy-load when relevance is occasional — the catalog entry is cheap, and the model loads the body only when a task matches.

Writing good descriptions

The catalog description is the only thing the model sees before deciding to load a skill, so it is selection metadata, not documentation. Write it as a trigger, not a summary of the workflow.

  • What + "Use when…". State what the skill does, then the conditions that should trigger it. Example: "Extract text and tables from PDFs, fill forms, merge documents. Use when working with PDF files."
  • Triggers only. Do NOT summarize the step-by-step workflow — that belongs in the body, which the model gets after loading.
  • Third person. Describe the skill, not the model ("Extracts…", not "You should extract…").
  • Name rules. Skill name is lowercase a-z/0-9 with single hyphens between segments (no leading/trailing/double hyphen, no underscores), ≤64 chars, and must not contain the words anthropic/claude. For the filesystem provider, the name must equal the skill's directory name.

Cross-runtime support

All skills logic lives in shared core, plus a one-line per-run catalog-resolution hook at each runtime's message-build call site (the same place memory retrieval resolves — where IO / non-determinism is allowed). The two tools therefore work on all runtimes; the catalog is threaded on JS / Temporal / Cloudflare / DBOS.

RuntimeIn-code providerFilesystem provider
JS
Temporal✅ (resolve catalog + tool reads in activities)
DBOS✅ (resolve in steps)
Cloudflare DO / Workflows❌ (no node:fs — use the in-code provider)

fileSystemSkillProvider works wherever node:fs exists — i.e. NOT Cloudflare Workers; use the in-code provider there (or a future workspace/D1-backed provider). On Temporal/DBOS, filesystem IO must run where IO is allowed (the per-step activity/step), never in workflow code; in-code catalogs are deterministic data and need no special handling.

Cache behavior

Skill loading is purely additive by construction:

  • The Level-1 catalog is a stable, deterministically-sorted system-prompt fragment that is NEVER annotated with per-skill loaded-state (no "✓ loaded" marks at runtime) — annotating it would make the cached prefix volatile and bust the cache on every load. "Already loaded" handling lives entirely in the load_skill result, never in the catalog. (Preloaded skills' loaded="true" marker is static — decided at config time — so it does not break stability.)
  • Level-2 bodies and Level-3 resources arrive append-only as tool results, covered by the existing rolling cache breakpoints. There are zero changes to the cache strategy or the LLM adapter.

The only inherited caveat (identical to memory injection): on the turn a body lands, the rolling latest-turn breakpoint sits on/after it, so that one breakpoint won't cache-hit across that turn boundary; the system/tools/previous-turn breakpoints still do.

Limitations (v1)

  1. Re-loading a skill returns the body again. ToolContext exposes no transcript access, so the load_skill tool cannot dedup on its own — v1 returns the body on every call (correct + cache-safe; rarely triggered because the model sees its own prior load_skill results). collectLoadedSkillNames(messages) ships as the building block for programmatic dedup, but the dispatch-layer short-circuit is deferred.
  2. The filesystem traversal guard is lexical. It rejects resolved paths that escape the skill dir but does NOT follow symlinks — safe for operator-provisioned skill directories.
  3. fs staleness re-scan detects root-entry add/remove, not in-place edits. Editing an existing skill's files in place won't be picked up until restart or a touch of the root directory.
  4. No token budget. There is no cap on catalog size or preloaded-body size (future work).

Next steps

Released under the MIT License.