I built four AI orchestrators in a month. Claude Code is already good enough to obsolete every one of them.

BTT, FSD, ASL, and finally spec/aspect-driven development. Each one had less scaffolding than the last, until the scaffolding was gone. Not because I got clever. Because the agent underneath stopped needing a harness.

Why build orchestrators at all

There’s a claim going around that a $500k/year engineer should be burning $250k/year in tokens. I don’t know if the exact number is right, but the shape of it is. If AI can do most of the typing, you should be using a lot of it. At the time I started, Claude Code on its own couldn’t spend that — one loop, one agent, one task at a time. If I wanted to spend seriously on tokens, I needed orchestration.

But tokens on what? How? You still have to do software engineering. Tradeoffs. System design. Maintenance — whatever that means in an agentic world. Orchestration isn’t the goal; it’s the vehicle. The goal is: keep doing engineering, with more leverage than one person could ever have alone.

Each orchestrator was an attempt to answer “what does leverage actually look like?”

BTT: the flexible workflow engine

I’ve written BTT up in detail in earlier posts, so briefly: it’s a workflow engine for AI agents. You describe workflows in YAML — steps, transitions, feedback loops. The simplest useful shape is “implement → review → done, or loop back on revise.” State carries across steps. The reviewer sees prior attempts.

BTT’s virtue was flexibility. Anything you could express as a graph of steps, you could run. Its vice was that flexibility wasn’t the thing doing the work.

FSD: the full closed loop

FSD is still my favorite of the four. The human provides a vision. The orchestrator breaks it into engineering tasks, writes prompts for the agents that will run them, and kicks off a set of loops:

  • Implementer loop — takes tasks, implements them, merges them.
  • Infrastructure loop — on merge, builds and deploys to staging. On QA pass, promotes to prod. Watches prod logs for failures.
  • QA loop — on staging deploy, runs tests against staging and signals pass or fail.
  • Triager — on prod failure or QA input, decomposes the problem back into engineering tasks.

All of that sits behind a web UI where humans can drop in new features. Incoming merges interrupt the pipeline, pause what’s in flight, and restart the right steps. Full closed loop — vision in one end, deployed code out the other, with feedback making its way back through triage.

I still think FSD could be great with a few tweaks. What pushed me off it was that the workflows were hardcoded. Every piece of routing between loops was code I had to maintain. And I couldn’t answer the question “does the hardcoding of this particular workflow shape actually improve the output?”

ASL: pure event-driven rigor

ASL was a reaction to FSD’s fragility around state. Pure event-driven, artifact-tracked, durable. Every input and every output was an artifact with an ID. Consumers tracked their last known offset. You could kill the process at any point, bring it back up, and everything picked up where it left off — consumers knew exactly which artifact IDs they needed and produced new artifacts and events when they finished.

It was rigorous. It was also a lot of code. Ten lines of workflow semantics wrapped in a thousand lines of event infrastructure. And it still had FSD’s underlying problem: the workflow shape was hardcoded. The event backbone was new. The engineering decisions the agents made were the same.

The realization

One review step or three. Implementer-then-reviewer-then-integrator, or implementer-then-reviewer. Different workflow topologies produced output of roughly the same quality, as long as the underlying agents were good. The Go code skill mattered. The prompts mattered. The workflow shape, within reason, did not.

Somewhere in the middle of ASL, Claude Code caught up. Spawning subagents, tracking state across steps, deciding when to review and when to ship, chaining tools, recovering from its own mistakes — all of that was stuff it could do on its own given good inputs. I’d been maintaining YAML and event schemas and loop definitions that were compensating for limitations the underlying agent no longer had.

The workflow was the ceremony. The skill was the substance. And the skill was already a first-class feature of the thing I was wrapping.

Spec/aspect-driven development

So I threw the harness out.

Spec/aspect-driven development is a control loop, not a workflow engine. I define how the system should behave (specs) and constraints on the system (aspects). Along with a rather extensive set of prompts, I hand those to Claude Code. Claude Code figures out what agents to spawn, runs the evidence that proves the code matches the spec, and reconciles the code to the specification.

Specs are Gherkin. Given/When/Then, nothing fancy:

Feature: Buying a generator
  Scenario: Purchase with sufficient funds
    Given I have 50 gold
    When I buy a gold mine
    Then I have 0 gold
    And my gold-per-second rate increases by 1

Aspects are markdown with frontmatter. The exact format doesn’t matter — what matters is that each one describes a cross-cutting property of the system and points at evidence that proves it holds:

---
name: core-has-no-external-imports
evidence: scripts/check-core-imports.sh
---
The `core/` package contains pure game logic. It must not import any
external dependencies — no network, no filesystem, no third-party
state. This keeps the core deterministically testable.

The evidence is the trick. For specs, it’s a test suite: this test proves this feature works. For aspects, it’s whatever tool can verify the constraint. Latency? A Go benchmark with a threshold. Import boundaries? grep, or CodeQL if you want to be fancy. Memory budget? A profiler run.

When something breaks, I don’t edit the workflow. I write a spec that buttons down the feature I want, or I add an aspect that captures the constraint I’m enforcing. Claude Code figures out the rest. The output is a well-documented, well-tested codebase that a human maintainer could pick up tomorrow.

There is no harness. There is no static prompt sequence. Claude Code is doing the orchestration, deciding what to spawn, running evidence, checking reconciliation.

What this changed about being an engineer

I’m still making tradeoffs. Still designing systems. Still maintaining them. The artifacts changed:

  • Design used to be API shapes. Now it’s aspects — what invariants must the system hold?
  • Maintenance used to be fixing the code. Now it’s updating the spec and letting the loop reconcile.
  • Reviewing used to be reading diffs. Now it’s reading specs and asking: is this the system I want?

The engineering hasn’t gotten easier. It’s gotten more abstract. I think the abstract parts — what should be true, what must not break — were always the load-bearing parts. The code was one particular implementation of that truth. Now I get to stop pretending otherwise.

I don’t know if spec/aspect is the end of this road. A month is not a long time, and I’ve been wrong three times already. But every time I tried to force structure onto the agent, the structure was the part I ended up deleting.

The thing that stuck was the part where I stopped. If you’re about to build an orchestrator, check first: what can Claude Code already do that your harness is about to re-implement? My guess is most of it.