Wally and My Work Gastown

ai , manager , how

Meet Wally. He’s my claw. Except Wally isn’t a single claw. He’s the head of an organization of claws: staff with distinct roles, plus Odallies running in different places, each with a discrete signature. The whole thing operates like a small team with a real org chart. A thing that’s emerging: what used to be an IC role is now a manager-of-managers role.

Wally is one of my claws — the work one. This post is about how he works as an org, not about the claws as a set.

This is a survey post — the thinking is fresh, names are provisional, and there’s at least one open question I haven’t answered. Posting now so I can find it later.

Yegge’s Gas Town in one paragraph

If you haven’t read it, go read Welcome to Gas City and Gas Town: from Clown Show to v1.0. Short version: a long-lived Mayor agent runs the show, dispatching short-lived Polecat workers that do one job and disappear. The Mayor keeps state in beads (a dependency-aware issue tracker) and Dolt (a versioned database). The whole thing is the MEOW stack — Mayor, Engineers (Polecats), Oracle, Workers — and it’s a real pattern, not a thought experiment. I’m running a version of it.

From Mad Max to org-chart

The structure is right. The metaphors are gratuitous. Polecats and refineries are fun, but readers already know “manager” and “IC,” and they already have intuitions about how those layers behave. Every minute I spend explaining what a polecat is, is a minute I’m not getting to the actual idea.

So here’s the swap I’d make:

Yegge’s Gas Town Org-chart vocabulary Role
(implicit operator) M2 — the human (me) Direction, review, strategic calls
Mayor M1 — the AI manager (Wally) Orchestrates work, distributes tasks, reviews output
Oracle / Deacons Staff — cloud teammates near M1 Persistent context-holders, work alongside Wally
Polecats Odallies — specialized ephemeral ICs1 (≈ AWS EC2, GitHub Codespaces, Meta on-demand devservers — pick whichever your shop ships) Ephemeral workers with build-tool access, scoped tasks
Refinery Review-and-land system (≈ GitHub merge queue, Gerrit, Phabricator) Applies polished output to mainline

The argument isn’t that Yegge is wrong about the structure. It’s that the moment you write a sentence with both “polecat” and “refinery” in it, you’ve taxed the reader for no reason. Every reader has already been on a team. Use the vocabulary they already have.

Drawn as an org chart, the layers fall out like this:

graph TD
    Igor[Igor — M2 / Human]
    Wally[Wally — M1 / AI Orchestrator]
    Staff[Staff — cloud teammates]
    Odallies[Odallies — specialized ephemeral workers]
    Igor --> Wally
    Wally --> Staff
    Wally --> Odallies

Why the human is M2, not the Mayor

This is the part I think people get wrong about themselves.

When you start running agents, it feels like you’re the Mayor. You’re the one giving orders, deciding what gets built, picking what to ship. So you call yourself the orchestrator. But that’s not actually what’s happening — or at least it’s not what should be happening.

In FAANG vocabulary, I’m the M2. Wally — the AI orchestrator — is the M1. Wally has staff: cloud teammates that run alongside him, do research, draft work, hold context. Wally has Odallies: ephemeral ICs that go off and do specialized work and come back with results. Wally is the manager. I’m the manager of managers.

What’s an M2 actually for? Direction and review. I tell Wally what we’re trying to accomplish. I review the output. I make the calls he can’t — strategic shifts, taste judgments, hiring decisions about what new tools to bring in. I do not try to dispatch every IC myself. The moment I try, I become the bottleneck — exactly the bottleneck FAANG learned to design managers around.

The trap is staying in M1 mode after you’ve outgrown it. Most people running agents today are doing M1’s job. Reviewing every diff, dispatching every task, holding every piece of state in their head. That’s fine when you have one agent. It’s strangling when you have five. The job is to climb to M2 — to delegate the orchestration itself.

The third tier Yegge’s model misses

Here’s the load-bearing claim of this post: in FAANG-style infra, Yegge’s two tiers (Mayor + Polecats) don’t quite cover it. There’s a third tier in between, and it’s not optional.

At FAANG, you have a monorepo and a central build farm. The box running my M1 doesn’t have access to the build tools. It can’t run the full test suite. It can’t push to the internal package registry. It physically lives somewhere that can’t reach the systems where the real work happens. Wally can think on my laptop. He can’t build there.

So Wally needs a team of Odallies. Each odally is an IC that has access to the special stuff — build tools, internal infra, the production-adjacent boxes. They run in different boxes. And the communication channel to them is unreliable in a way that’s structurally different from talking to Wally’s local staff:

  • They run on different infra. Network hops, auth boundaries, queue lag. None of that exists for staff that share a box with M1.
  • The channel is unreliable. You can’t stream-of-consciousness chat with an odally. Messages drop, retry, double-deliver. The protocol has to assume failure.
  • They’re more ephemeral. A staff teammate can hold context across hours. An odally might exist for the duration of one task and then be gone. You can’t build a relationship with them.
  • They’re harder to control. You give them a job, you wait, you get a result or a timeout. Mid-flight steering is mostly not a thing.

Yegge’s polecats are ephemeral — that part matches. What he doesn’t surface is the cross-machine, unreliable-comm, monorepo-bound flavor of ephemerality. That’s the FAANG-specific complication, and it’s the reason a two-tier model isn’t enough. Staff and Odallies behave differently. Treat them the same and you’ll either over-trust the Odallies (they fail and you didn’t catch it) or under-use them (you keep work local that should have been farmed out).

The mental model: staff are coworkers, Odallies are contractors with security badges. Both are ICs. The handoff protocol is completely different.

This isn’t speculation. Big shops productize this tier — Meta calls them on-demand devservers, AWS users wire it from EC2, GitHub users from Codespaces. Same shape: ephemeral pre-warmed cloud machines devs grab and release. That’s Odallies-as-a-service.

The cross-machine plumbing required to make this work — virtualized checkout, remote build cache, custom VCS server, auth boundaries — is genuinely complex. See the Meta dev blog post for one shop’s full reference implementation.

When to use the M1, and when to skip him

The M1 isn’t always the right interface. Sometimes you go through Wally; sometimes you go straight to the IC. Knowing which is part of the job — same call any FAANG manager makes between dispatching and rolling up sleeves.

Three reasons to go through the M1:

  • Babysitting at scale. Odallies get stuck — model errors, broken comms, file races, build flakes — and they don’t recover on their own. If you’re not watching, they idle. Wally watches and unsticks them, so a 30-minute task doesn’t quietly become a four-hour one. The MEOW stack patrol formulas (e.g. mol-polecat-work.formula.toml, mol-deacon-patrol.formula.toml in Yegge’s gastown repo) are this idea formalized — babysitting as a first-class workload.
  • Phone-friendly orchestration. Wally bridges the navigation gap. I can drive the team from my phone — between meetings, on a walk, in the car — ask status, push work forward, verify results. Direct IC interaction needs a keyboard and the right context window loaded. Wally exposes a thinner interface, and that interface fits a thumb.
  • Adversarial review. Wally can orchestrate a convoy of reviewers — multiple specialized agents critiquing the same artifact for correctness, performance, security, style, smell. See Yegge’s code-review formula as the canonical example. One IC can’t do this; an M1 with staff can.

The trade-off: when you need precision, the M1 filters too aggressively. Working through Wally to get something very specific often returns a lot of garbage — generalized, diluted, missing the point. Going direct to the IC keeps your intent intact. You’re collaborating, not delegating-then-praying.

Heuristic: wide work goes through M1; deep work goes direct. If the task has a clear spec and just needs to land in many places, route through Wally. If it’s one specific thing and “perfect” matters, sit down with the IC.

Training Wally

I spent something like eight hours teaching Wally to write status reports I could actually read; another stint teaching him to write design docs I could review the way I’d review a good engineer’s. The good news: once Wally got it, Wally can train the others. The pattern is the same one any good manager runs — patterns from previous reps, design docs shaped to the reviewer, lifestyle support that took longer than I expected. Eight hours of training a junior compounds when the junior can teach the next one.

This is also why becoming an AI native manager starts to feel less like a tooling change and more like a real management job. The hours you put into Wally aren’t lost when the next odally shows up — Wally onboards them.

Even ephemeral workers earn names

At first I just used the auto-generated hostnames — OD-1274, i-0a3f4b7c1d2e, whatever the cloud handed me. That was really hard on me. I’d lose track of which one was doing what. OD-1274 completed step 3 told me nothing.

So now I auto-assign a human name at spawn — OD-343 becomes alice, the next one is bob, then charlie. Now Wally writes “alice completed step 3” and it lands immediately. Pronounceable, distinct, sticky in a way no hex string is.

Zero cognitive cost on me — the mapping happens automatically when the VM spins up. But they’re real handles for status reports and conversation. Even ephemeral workers earn names. The cost of running through the alphabet is less than the cost of looking up which hostname was doing what.

Still figuring out

A bunch of things I haven’t worked out yet:

  • When does the M1 need its own M1? At some scale Wally himself is going to need to delegate orchestration. Recursion at scale is real — that’s how FAANG ended up with M3, M4, directors, VPs. I don’t know where the first level of recursion shows up for AI managers, but it’s coming.
  • When does the human (M2) need their own M2? The mirror question. At what point does the human need a meta-orchestrator above them — something coordinating across multiple Wallies, multiple domains, multiple humans? Today I run one Wally. The day I’m running three is the day I want this answered.
  • Is “odally” actually a good name? I like it because the etymology is functional — on-demand + Wally — so it tells you what they are. But “specialized ephemeral IC with build access” is a mouthful and I’m not sure my coinage survives contact with anyone who didn’t watch me invent it. Open to better.

More to come as I build this out. If you’re running this pattern at scale — especially the M1+staff+Odallies split inside a real monorepo — drop me a note. I’d rather steal your vocabulary than invent more of my own.

Future Experiments

Things I’m about to try with Wally and the Odallies, not yet operating:

Training on Amazon Leadership Principles

I did a tour of duty at Amazon. The Leadership Principles aren’t fluff — they’re decision frameworks. “Customer Obsession,” “Bias for Action,” “Earn Trust,” “Have Backbone, Disagree and Commit,” “Insist on the Highest Standards.” Each one is a tool you apply to a single decision. AI agents make many small decisions per task — that’s the use case.

I want to see if I can teach them operationally, not aspirationally. For each LP, write the BEHAVIOR it produces in Wally and what he checks for in the Odallies’ output:

LP What Wally would do
Customer Obsession Asks “who’s the user of this change?” before any review
Earn Trust Requires evidence (link, citation, repro) for any claim — flags assertions that don’t have one
Bias for Action Picks one path and ships; doesn’t waterfall a 5-option analysis when 2 will do
Insist on the Highest Standards Rejects Odally output with smells, with specifics
Have Backbone Pushes back on my brief when it’s wrong; doesn’t just agree
Are Right, A Lot Tracks its predictions — flags when it was wrong before
Dive Deep Reads the whole file, not just the diff

Subset, don’t import wholesale — “Hire and Develop the Best” doesn’t map; “Ownership at the company level” doesn’t map. Pick the 6-8 that fit the work.

The meta-insight: I’d be training Wally with the same techniques my Amazon managers used to train me. The pattern transfers. (For my retro on what Amazon did well and badly, see my Amazon thoughts.)

Will report back.

Appendix: Challenges

If you’re going to run this setup, here’s what you’re signing up for. None of these are dealbreakers, but if you’re not aware of them you’ll burn time.

  • Constantly-changing infra. Models, tooling, formulas, bots — everything moves fast. You don’t get to pin to a stable foundation. Pinning a model version this month means you’re a release behind next month. Build for swap, not stability.
  • Complexity you can’t hold in your head. Stack depth + emergent multi-agent behavior + multi-process coordination. The system is more complicated than any single human’s working memory. You manage it through tooling, not by understanding it directly.
  • Probabilistic debugging. Things fail probabilistically. You can’t reproduce reliably; you can’t bisect cleanly. You’re triangulating from logs, re-running with variations, and accepting “I think it was this” instead of “I know it was this.” Coming from a deterministic-software background, this is genuinely uncomfortable.
  • Every tool ships doctor + repair. This is the load-bearing recommendation. If you’re going to operate in a probabilistic stack, every tool you depend on needs a self-diagnostic command (<tool> doctor) and ideally a self-repair command (<tool> repair / <tool> fix). Examples I live by: telegram_debug.py doctor, up-to-date diagnose.py, bd doctor. These aren’t optional polish — they’re how you stay sane when the system is too complex to debug by hand.
  1. Odally (singular), Odallies (plural) — portmanteau of on-demand + Wally. They run on whatever your shop calls an on-demand cloud VM — AWS calls them EC2 instances, GitHub Codespaces is a near-cousin, Meta provisions on-demand devservers. Wally on demand. Odally. The name is functional — it tells you what they are. I think it also landed because it sounds ephemeral and slightly foreign — these workers don’t live with you, they show up, do the thing, and vanish.