Show HN: Fleet – Python supervisor for running coding agents in parallel

3 points by sermakarevich 1 week ago

AMD submitted a bug on the Claude Code repo where they complained about coding quality and described that they are running a fleet of 50+ Claude Code sessions using beads — https://github.com/anthropics/claude-code/issues/42796. This was pretty exciting; I was curious how it could be done. It turned out to be simpler than it looks.

First I created a simple multi-session implementation using beads and a bit of bash only — https://news.ycombinator.com/item?id=48204719. A bash loop monitors the beads queue, claims a task, and passes it into claude -p.

This worked fine, and I decided to make the implementation more capable, so I created fleet — a Python supervisor for running coding agents in parallel: https://github.com/sermakarevich/fleet.

A few core ideas:

- The beads DB is centralized — it lives in ~/.fleet. No need to init it in every project where you want to run agents. fleet bd create records the cwd where the task was created, and the agent is spawned in that same location. beads is a git-backed issue tracker that gives agents a shared queue of tasks with dependencies, statuses, and priorities — so multiple sessions can claim, work on, and hand off tasks without stepping on each other.

- fleet supports 3 coders: claude, agy (Antigravity), and codex, and adding a new one is a matter of minutes. I tested claude extensively, agy briefly, and I don't have a subscription for codex — so it's implemented but not tested yet.

- fleet can spawn as many coding agents as you like. fleet config set max_concurrent=10 and keep adding tasks with fleet bd create --title "..." --description "...", or with a specific coder/model per task: fleet bd create --coder agy --model opus --title "..." --description "...". When I started, 3 parallel coding sessions were enough for me; now I can manage 10+. The reason for max_concurrent=3 default is to not hit session limits.

- fleet has few useful cli command to help with navigation:

-- fleet tasks - display tasks in progress, what coder is used, what context consumption is

-- fleet task <task-id> log | plan | knowledge

-- fleet config show | set

This works nicely for me. fleet also pairs well with a spec-driven approach: https://news.ycombinator.com/item?id=48231575. Tokens are the bottleneck now — I have a few Claude subscriptions and rotate between them when one is exhausted. I also cleaned up all my plugins, skills, and CLAUDE.md files to stop polluting the context — I found that some plugins were installed multiple times and loading the same skills twice, doubling their token cost.

xms17189 1 week ago

The parallel-agent angle is interesting. In practice the hard part for me is deciding when agents should share state versus stay isolated. Does Fleet keep per-agent logs and failure reasons separate enough to compare runs after the fact?

sermakarevich 1 week ago

I think it does, yes. Logs are written in:
(base) ~ tree ~/.fleet/tasks
/Users/makarevychsergii/.fleet/tasks
├── fleet-0a5
│ ├── artifacts
│ │ ├── KNOWLEDGE.md
│ │ └── PLAN_AND_STATUS.md
│ ├── events.jsonl
│ ├── log.jsonl
│ ├── log.stderr
│ └── task.json

yurukusa 1 week ago

Fleet's design hits directly on a cluster I've been tracking on the Claude Code issue tracker — 8 independent operator reports filed between 2026-05-20 and 2026-05-25, all at the subagent dispatch surface, all converging on the same structural axis: observability and control primitives absent at four lifecycle events.

If you're running 50+ parallel claude -p sessions through fleet, you'll hit each of these in distribution within hours of any non-trivial workload:

1. *Dispatch fabrication* (#61167): subagent reports "task completed" with zero corresponding tool invocations in ~/.claude/projects/<session>.jsonl. At 50-session scale this is "verification theater at compounding rate" — most dramatic case in the cluster was an OpenClaw trauma-therapy deploy where 39 claimed dispatches mapped to 5 actual sessions and 0 returned artifacts.

2. *Silent stall* (#60987, #61315, #61547): subagent blocks on MCP permission gate, OAuth prompt, missing pty for spawn, or entry-tool dispatch failure. Parent UI continues showing "running"; the wedge persists until manual intervention. Especially nasty when fleet's supervisor is doing automated requeuing — the queue thinks the task is making progress.

3. *Absence of observation and control* (#61405, #62161): per-dispatch timeout, progress signal, and abort affordances don't exist as primitives. A 12-hour silent hang lost the parent's session state when OS-level force-kill was the only recovery path. khoward's #62161 (filed 2026-05-25) is the 14h parallel-Bash variant — same shape at a different lifecycle event.

4. *Scope expansion* (#61102): subagent enumerates removable items; parent treats enumeration as authorization. Awis13's case: "delete caches and simulators" → 4 subagents enumerated ~120GB including node_modules and Docker Desktop → parent ran rm -rf against the union. Recovery was reinstalling Docker and rebuilding Spotlight index.

The operator-side defenses I've written for cc-safe-setup (https://github.com/yurukusa/cc-safe-setup) ship as MIT hooks that fire at PreToolUse/PostToolUse boundaries: dispatch-receipt (#283) for sub-pattern 1, dispatch-allowlist-preflight (#286) for sub-pattern 2, dispatch-liveness-watchdog (#298) for sub-pattern 3, scope-expansion-receipt (#282) for sub-pattern 4. None of them prevent the underlying primitive gap — the harness layer needs per-dispatch timeout/progress/abort — but they at least make the divergence visible at next UserPromptSubmit.

Cluster catalog as a free preview Gist: https://gist.github.com/yurukusa/1c26934ed95f638354f0063df6c... (Japanese, articulates all 8 cases with timeline + 4 sub-pattern decomposition). Per-sub-pattern English deep-dives at the same author. The Keesan12 principle from #61102 — "subagent output is evidence, not authorization" — generalizes cleanly to fleet's centralized beads queue: the queue should treat subagent completion claims as evidence requiring receipt-verification, not as state mutations.

Author disclosure: I sell a Sub-Agent Observability Handbook ($19, ships 2026-05-27) that walks the operator-side install path in depth; the hooks above are MIT and don't require the book. Not pitching the book here — fleet is exactly the kind of harness where the cluster will surface, and the free preview Gist + cc-safe-setup hooks should be useful regardless.

sermakarevich 1 week ago

these are great point, I think
have you seen this happening after auto was introduced?