Anton, chapter 6: The mesh, the sandbox, and self-reflection

March 27, 2026 · 5 min read

Four days, around eighty commits, the densest stretch of the project. By the end of it most of what I'd call "the current architecture" has been decided. I'm going to skip the small commits and write down the five things that mattered.

The mesh

The first is the mesh. I want Anton instances to find each other. Not share a database, not share skills, not share secrets, just find each other and forward calls. I call it SCUT, Symmetric Cluster Universal Transport, because every node is the same shape and the relationship between two nodes is what scopes access. Probes for discovery, heartbeat for liveness, an invocation forwarder on top. The contract is simple: the instance is the identity, and the relation between instances is what you can ask for. This means a clone running for a different household can ask my Anton to run a media query without ever seeing the wine collection, the family vault, or the Plex credentials. Federation as relationships, not as shared infrastructure. The whole thing dedupes into a single @anton/mesh package once the protocol settles.

The sandbox

The second thing, and the one that takes the most work, is the sandbox. The Node skill-runner I built earlier has no runtime isolation. A skill can read any env var, exec anything, hit any URL. For a personal server this was tolerable. For a mesh of instances forwarding invocations to each other, it's not. So I rewrite the runner on Deno. Each skill runs in a Deno Worker with the minimum permissions it needs: --allow-env=K1,K2, --allow-net=specific.host, --allow-read=/specific/dir. Nothing more.

This rewrite spans roughly 25 commits over two days, and the reason it spans 25 commits is that Deno's strict execution model exposes every implicit assumption Node was letting me get away with. Bare specifier mappings, sloppy imports, npmrc handling, deno.json paths, Dockerfile fixes, transitive import map entries every existing skill quietly relied on. Then the env model: Deno.env.get/has/toObject become permission-scoped, so I have to walk every skill, audit its secretKeys, and turn previously-silent missing-key behavior into explicit errors. Each commit unblocks one more skill that didn't previously care about runtime isolation. By the end I have a two-boundary security model written down. Agent boundary, ReBAC, who can invoke which agent. Skill boundary, Deno permissions, what this code can do. Two boundaries, two questions, neither one swallows the other.

While I'm there I rip Vaultwarden out and replace it with an encrypted secrets table in Postgres. One thing to back up. Survives cloning. Decrypted only at the call site, listable and editable from the LCARS UI. The other half of secret hygiene is a thing I almost get wrong: secrets have to reach the Worker via postMessage, never through the parent process env. If the parent's env is populated, a Worker that asked for --allow-env=null could still exfiltrate it. The Deno permission model is only as honest as the boundary you actually defend.

The browser agent

The third beat is the browser. Doctolib, the syndic site, the consulate appointment monitor: each of them is currently a hardcoded Playwright script, copy-pasted intent and brittle selectors. I replace the pattern with one generic browser agent. Navigate, click, type, screenshot, evaluate. The LLM drives. One agent, many sites. The three domains migrate in three commits, three hardcoded scripts deleted in the same afternoon. The principle that comes out: explore agentic, build deterministic. The LLM is fantastic at finding the right button on a page it's never seen. It's overkill, and expensive, for the same flow you run twice a day. Use it to scout, then write down what it found.

The browser work is also where the request_input tool finally lands cleanly. The Doctolib 2FA pattern from the first weekend has been evolving for weeks: ask the user mid-flow for a code, suspend, resume. As a generalist primitive it belongs to the browser agent first, but the shape generalizes. Any tool can pause, ask the human something, and continue with the answer. It's a small mechanism. It feels right, the way something does when it solves a class of problem you didn't quite know how to name.

The family vault

The fourth beat is the family vault. A permission-aware document store on object storage in Frankfurt, with family versus personal visibility and explicit visibleTo overrides per file. The architecture deliberately avoids derived roles: the answer to "who can see this" is the document's own metadata, not a graph traversal. Vision-based extraction lands the same day, then batch vision extraction with a personal-visibility default, then LLM-based fact generation from the extracted documents with calendar expiry reminders for the things that expire. A rule I write down from the cost analysis: scout with the LLM, build deterministic extractors, don't brute-force vision on every file. Same lesson as the browser. The LLM is the scout, not the worker.

The Notion migration runs on the same vault. Family Notion workspace pulled into the new store. Three commits of Deno import friction before I just inline the Notion client to dodge an AWS SDK barrel that doesn't want to play. Characteristic moment of the week: rewriting one bare import is cheaper than letting the LLM brute-force around it.

Self-reflection

The fifth beat is the one I've been waiting to build for a while. Anton starts critiquing his own performance. A nightly review reads the trace history from the last day, classifies failures by type, and files a GitHub issue per cluster. The issue carries a label that drives a state machine: needs-triage to ready-to-fix to fixed-locally to deployed. The review is only possible because of the full execution traces from a few weeks ago. Without the traces, Anton would be reviewing his own outputs. With them, he's reviewing what the LLM was actually thinking at each step, what tools it called, what came back. The reviewer's job becomes possible because the substrate is honest.

A handful of structural commits land in the same window and are worth a sentence each. The subAgents/delegates duality from chapter 5 finally collapses: agents become the single entry point, everything goes through the same delegate registry, the parent stops knowing about graph types or command dispatch and just routes to delegates. The Invoke tab grows a permission filter, so you only see agents the selected user can actually call. Prompt injection gets a real trust model with content markers and a risk audit trail. Directives land as standing instructions for agent behavior, with a note to prune them periodically before they bloat. Mistral Large lands in the LiteLLM router as a third reasoning option. The US consulate appointment monitor becomes a scheduled job: scan six months, observe what comes back, retune to four. Observe first, tune second, the same rule that's been quietly threading through the rest of the week.

By the end of the four days Anton can call other Antons over an authenticated mesh. Skills run sandboxed with the minimum permissions they need. Secrets live in one encrypted table and never touch a subprocess env. Any website with a form is a browser-agent target. Documents have visibility metadata and the vault knows who can see what. And every night Anton reads his own day, decides what went wrong, and files the work to fix it. The two-boundary model, agent and skill, is the spine that holds the rest of it up.