Claude Code Mastery7 / 12

Multi-Agent Pipelines

Chaining sub-agents, running them in parallel, and the patterns for 'review-while-coding' without losing your mind. Where Claude Code starts to feel like a small engineering org.

Published May 8, 20264 min readHaythem Rehouma · Claude Mastery

Multi-agent is the buzzword everyone slaps on a slide. It also happens to be where Claude Code gets genuinely interesting — when used surgically.

The shape that works: a small pipeline of bounded sub-agents, each doing one thing, with an explicit handoff. The shape that fails: "swarm of agents debating the architecture."

Let's get tactical.

The three patterns that actually ship

1. Linear pipeline (the bread and butter)

test-writer  →  test-fixer  →  code-reviewer  →  release-bot

Each step has one input and one output. Failures stop the pipeline. This is 80% of what teams use.

2. Fan-out / fan-in

When a task is naturally parallel — translating 5 files, generating tests for 12 modules, scanning logs from 8 services — fan it out.

            ┌─ translator(es) ─┐
            ├─ translator(fr) ─┤
spawner ──> ├─ translator(ar) ─┤── merger
            ├─ translator(pt) ─┤
            └─ translator(de) ─┘

You spawn N specialised sub-agents in parallel, then a merger sub-agent reconciles the outputs (deduplicates, picks the highest-confidence variant, writes a single PR).

3. Critic loop

writer  ↔  critic

Writer produces. Critic scores against a rubric. Writer revises. Stop when the critic gives ≥ a threshold or after N rounds.

This pattern shines for:

Documentation rewrites.
Migration plans.
Refactor proposals where "is this clean?" is the gate.

The critic must be a different sub-agent from the writer. Same agent self-criticising is theatre.

Where multi-agent stops paying off

After a year, here is my honest take on the diminishing returns:

2 agents: Big leap. Writer + reviewer is a real win.
3-4 agents: Useful for clear pipelines (test-writer → fixer → reviewer).
5+ agents: Marginal at best. Coordination cost > delegation gain.
"Swarm of 10 agents debating": A demo, not a workflow.

If your pipeline has more than 4 sub-agents, ask whether half of them could be regular shell commands or Makefile targets.

Concrete: a "PR factory" pipeline

Goal: take a Linear ticket, ship a PR.

1. ticket-reader     → parses the Linear ticket, outputs a /feature prompt
2. implementer       → writes the code
3. test-writer       → writes / updates tests
4. test-fixer        → if any test fails, fix the code (not the test)
5. code-reviewer     → reviews the diff, either SHIP / FIX-FIRST / REWRITE
6. release-bot       → drafts PR description + changelog entry
7. (human)           → reviews the diff and pushes

Each step is a sub-agent in .claude/agents/. The human step at the end is non-negotiable — that is where git push happens.

Run-time on a typical feature: 4-12 minutes wall clock. Human review at the end: 5-15 minutes. End-to-end: a feature an hour, sustainable.

How to actually invoke a pipeline

Two flavours.

Manual stepping (recommended at first)

> /agents implementer
> Goal: ... Constraints: ... DoD: ... Files: ...

# wait, review

> /agents test-writer
> Write tests for the new code.

# wait, review

> /agents code-reviewer
> Review the diff.

You stay in control. Slow but safe.

Orchestrated via a slash command

.claude/commands/pr-factory.md:

1. Spawn `implementer` with the user-provided goal.
2. Wait for completion. If implementer fails, abort.
3. Spawn `test-writer` on the diff.
4. Spawn `test-fixer` until tests pass or 3 retries reached.
5. Spawn `code-reviewer` on the final diff.
6. If verdict != SHIP, surface to user and stop.
7. Otherwise spawn `release-bot` for PR description.
8. Print a one-line summary and stop. Never push.

Then:

> /pr-factory
> Goal: <fill> Constraints: <fill> DoD: <fill> Files: <fill>

You hit one command. The pipeline runs. The push is still a human keystroke.

Parallel patterns — when to use them

Run agents in parallel when:

The work is independent (translating 5 files, summarising 8 PRs).
You can write a deterministic merger (concat, dedup, pick-highest-score).
You can budget the cost (parallel = more API calls, faster wall clock).

Avoid parallel when:

Tasks have dependencies (test-fixer needs the implementer's diff).
The merge step is fuzzy ("which architecture do we like more?"). That is not a merge — that is a human decision.

The single most useful trick: explicit handoffs

Each sub-agent ends its turn by emitting a structured handoff:

status: ok | needs-human | failed
artifacts:
  - path: src/cache.ts
  - path: tests/cache.test.ts
notes: "Implemented LRU + TTL. All tests green."
next: test-writer | code-reviewer | done

The next agent reads this handoff and knows exactly where it is. No re-deriving context, no "wait, what was the goal again?"

Standardising the handoff is the single highest-leverage thing you can do once you have 3+ sub-agents. It is the multi-agent equivalent of a well-typed function signature.

Next article: Building Complete Features — taking everything from this and Articles 3-7 and walking through a real ticket-to-PR session, command by command.