Anton, chapter 9: Threads, spawn, and the cast

The chapter opens with Clara. She's a non-technical co-owner of the system now, and the memory entry I write for her is small but it changes the shape of the next ten days: respond simply, no jargon, escalate to me when needed. The system prompts pick up simplification rules and a cleaner fallback path. Real second user on real Anton, and every rough edge becomes a real complaint. That's the human reason most of what follows happens.

Threads

The biggest single change of the chapter lands on April 20: threads. Anton learns to do several things at once. Until now a run is a run: one conversation, one in-flight loop, and anything else has to wait. That works for a personal assistant talking to one person at a time. It does not work for a household where media triage is happening in the background, the syndic agent is reconciling invoices, and a school message comes in from a different group all at the same time. The plan is a thread registry primitive backed by Redis, channel and group and parent and thread IDs threaded through every agent context, a runAgent that registers itself and drains injections and child events and honors cancel, and an ingress layer that knows whether an incoming user message belongs to an active thread or starts a new run. On top of that sits a spawn_thread runtime tool so the agent itself can fan out, a live SSE event stream so the UI can show what each running thread is doing, and the hardening that any concurrency primitive needs to actually be safe: atomic inject-if-running to close the injection-loss race, finalize-drain, cascade cancel so killing a parent kills its children with no orphans, fan-out budget and TTL caps. By the end of the day one Anton can run multiple long-lived threads in parallel and a user can @mention into any of them without restarting anything.

Skills cleanup

Then a cleanup that's been waiting since chapter 6. Five commits over an afternoon delete @anton/skills entirely, the original skills package from chapter 1 that became a barrel after chapter 4's defineSkill rewrite and dead weight after the move to Deno. The new layout is three rules: skills live at skills/<domain>/<name>.skill.ts as first-class hot-reloadable units, skills/<domain>/_lib/ holds small stable helpers shared inside one domain, and a thin skills-shared facade exposes the narrow Node-side surface that worker and agent and transports need. The principle underneath is simple: libraries are boring and fixed, skills evolve. If a _lib helper needs editing to support a feature, that's the signal it wants to be a skill. The same instinct produces the storage decision tree on the same day: a documented rule for choosing between facts (free-form), collections (typed-shape), files (blobs), and the family vault. Last month's organic growth produced overlap between all four, and a decision tree is cheaper than a refactor.

Around the same window, the coder agent gets a three-tier write scope. Tier 1: prompts only. Tier 2: skills plus prompts. Tier 3: any code. The tier is set per invocation and the coder cannot escalate itself. That's what makes the self-improvement loop safe enough to leave running unattended: the loop fixing a prompt regression has no permission to rewrite agent infrastructure to do it.

Spawn and awakening

Spawn and awakening land on the 22nd, and they finish the federation story that started with the replication engine in chapter 5 and the mesh in chapter 6. Spawn is parent-side: it provisions infrastructure for a new clone, copies prompts over, seeds an identity, and registers the clone in the mesh. Awakening is clone-side: the new instance learns what it's for from its operator through a guided onboarding conversation, runs self-diagnostics, and keeps a mentor channel open back to the parent for questions. A clone isn't a docker stack any more. It's an Anton that wakes up, finds out who it is, and joins its peer.

The cast

The same day flips Gustav (local Gemma) to primary inference for every agent. It's a one-line change because of the work in chapter 4 that made prompts a single source of truth. The savings are real. Quality regressions are caught by the self-improvement loop and resolved through the cast: intent-gated escalation goes in, so the agent only reaches for a strong model when the intent of the request needs it (research, complex reasoning), and the cast formalizes models as characters. Each model is a named specialist with a prompt-defined personality and area of strength. The agent picks who to ask the way a person picks who on their team to email. The LiteLLM codenames (sunny, gizmo, gandalf, gustav, william) have been hinting at this since chapter 4. The cast makes it explicit: ask specialists by name, and William gets smarter.

A usable heartbeat

The heartbeat from chapter 7 gets the operational pass that makes it usable continuously. Idempotent memory writes plus a topics collection so re-running the same observation doesn't duplicate facts. Thread-aware so it doesn't interrupt active conversations. A loop that remembers what it just notified about and stays quiet rather than re-touching the same topic every tick. The outbound gateway from chapter 7 makes silencing silent bookkeeping events a one-place fix. Hallucinated-notification retry actually sends now instead of just removing the claim, and the simplified-response layer catches LaTeX and other artifacts before they reach Clara. The heartbeat ends the period as a usable proactive layer: observing, deciding when to speak, staying silent the rest of the time.

The SimplySyndic writes finally land too: a Playwright script that closes the loop on the syndic domain. Reading and reconciling has been working since chapter 7. Writing closes the loop. The next step is replacing Playwright with a deterministic HTTP path now that the read side has shown what the API looks like, and that spec is on the list, not done. Vaultwarden gets a documentation purge in the same window, the rule being that docs describe the current state only, no historical mentions of what an earlier version did.

Where Anton is at the end of all this: one identity, ten domain agents, a cast of named specialists, thread-aware concurrency. Every prompt in the database, so flipping the default model is one row edit. Local Gemma 4 NVFP4 as primary inference, cloud as fallback. Encrypted DB secrets and scoped Deno Worker permissions per skill. Mesh, replication, spawn, awakening for clones. A heartbeat that observes operational state and mostly stays quiet. A self-improvement loop with deploy tracking and regression detection. Two transports (WhatsApp, Telegram), both routing through one worker and one outbound gateway. A hundred-plus quality tests, full execution traces in Postgres, an LCARS dashboard for everything.

The plumbing that took six chapters to build is what most people would call boring infrastructure. That's fine. The substrate is built. The interesting behavior happens on top of it now: the cast, the heartbeat, the self-improvement loop, the long-running threads talking to each other and to us.

There is open work on the list. Doctolib syncing to calendar as a scheduled job. Auto-importing WhatsApp group members as users and mapping their JIDs to roles. Permission flows for new users at scale. The SimplySyndic write path migrating from Playwright to deterministic HTTP. Making Gemma 4 the assumed default everywhere it isn't yet. Open questions for the next stretch, not promises.