Anton, chapter 7: Cost, fallbacks, syndic, heartbeat

Two weeks. The system is real enough now that the questions stop being about whether things work and start being about what they cost, what they leak, and what they do when I am not looking. Three threads run through the period. Cost discipline becomes a first-class concern. The syndic domain lands as the second real proof case. And a heartbeat starts ticking in the background, a survey loop that lets Anton observe himself between user requests.

Cost discipline

The cost work begins with one consequential commit. Every provider call now carries a per-request token budget, enforced by trimming history before the call rather than letting the provider 400 on us. Every LiteLLM call carries attribution metadata: agent, domain, user, request ID. And a shadow-call mechanism duplicates select calls to a cheaper model, logs the deltas, never affects production. The principle is simple: Anton needs to know what he costs, both to budget and to detect regressions. None of this is glamorous. It is the metabolism, the thing you only think about when something goes wrong, and I want it built before something does.

Memory consolidation moves from "every fact write" to a nightly batch with importance scoring, archival, and a health dashboard. The previous shape was contributing real money to the per-request bill and nobody had asked for it to run that often. A few days later, per-request token usage becomes a metric I can chart. Then OpenAI billing hits a wall mid-day and the system needs to keep working, so an auto-fallback to Sonnet or Gemini lands as a last-mile patch. The lesson keeps repeating: if a provider is your single point of failure, your system is your provider's reliability, not yours.

One night I clear out every hardcoded prompt fallback in the codebase. All agent prompts live in the database now. If the row is missing, the system fails loudly rather than silently using stale text. Three commits, one cleanup pass. The rule is the rule: one source of truth, fail loud when it is missing. It is the kind of cleanup that pays back every time I want to change an agent's behavior without a deploy. Validation gets a related fix the same week: delegates that report partial completion (because they ran out of their tool round budget) used to be treated as final answers by the parent. Now the parent detects budget exhaustion and re-invokes. A whole class of "Anton stopped halfway and didn't tell me" disappears.

The syndic domain

Then the syndic. The work that takes the most lines of code in the period is the condo management domain, and it ends up being the proof case for nearly every architectural rule from the previous chapters. Foundations first: schema, skills, a local email client, a file registry, a document ingestion pipeline. The principle that goes into MEMORY the same week: heavy off-Anton agents do the ingest, Anton runs the lightweight queries. Then doc extraction with Gemma 4 vision OCR over PDFs, classification, cleanup. Then gmail ingestion with attachment download and thread-based organization, scanning email attachments alongside Drive docs.

The interesting part is what happens next. The first cut of doc extraction was vision over every PDF. It is slow and it is expensive. A second pass replaces it: pandoc for .docx, pdftotext first for PDFs, vision reserved for the cases where text extraction returns garbage. Ten times faster on 90% of files. The lesson lands as a memory entry: scout with the LLM, build deterministic extractors, do not brute-force vision on every file. Same shape as the calendar saga from chapter 2 and the LangGraph excision from chapter 5, just at a different layer: figure out the cheap path, reserve the expensive path for what actually needs it.

Then the wiki builder, in the Karpathy LLM Wiki shape: documents become structured wiki sections so Anton can answer condo questions without re-reading every PDF. And then the swap I am most pleased with. SimplySyndic was being driven by the browser agent. A morning of reverse engineering reveals that every screen is just an HTTP call to a stable backend. The browser agent comes out, a direct HTTP sync goes in. No browser, no LLM in the loop. The rule that lands in MEMORY: explore agentic, build deterministic. The browser agent is the scout, not the worker. With the HTTP path in place, bank reconciliation auto-matches 99% of BRED CSV lines to SimplySyndic line items in one pass. Structured extractors for fund calls follow. The point is no longer "extract text from PDF" but "extract structured rows the rest of the system can query."

Scout, then build

The same rule gets stress-tested on April 7 with a four-hour spike on SNCF train departures via the browser agent. It works. Then I do the cost arithmetic and revert. A deterministic HTTP path exists for SNCF, and using the browser agent every morning is roughly 100× more expensive. The revert is itself the lesson: cost discipline beats cleverness. Scout agentic, build deterministic, applied to my own code two days after I wrote it down.

The self-improvement loop matures the same week. Smoke tests, deploy tracking, regression detection. Before this, the loop could file an issue but could not tell whether a fix had worked. Now deploys are tracked, smoke tests run on a deploy boundary, and regressions surfaced in the trace history get re-filed as issues linked to the deploy that introduced them. A loop that watches itself, with a memory long enough to notice when a fix did not stick.

Scheduled tasks get tightened. Three commits in two days close the notification bypass paths a scheduled task could use to send messages outside the normal gate. One gate-everything design, no exceptions. A "scheduled mode" prompt rule lands the same week: when the agent is running on a schedule rather than answering a user, output is stricter. No follow-up suggestions, no filler, follow the spec.

The heartbeat

Then the heartbeat. A survey loop runs in the background, checks operational state, and only notifies when there is something to act on. The rule is explicit and lives in the prompt: the heartbeat is survey-only, not domain-agent-invoking. It looks; it does not do. This is the quiet substrate I want for Anton having his own awareness of the system, separate from any user-initiated request.

The 14th adds a single outbound messaging gateway with an audit trail. Every outbound message, to WhatsApp, to Telegram, to a notification channel, goes through one path that logs sender, channel, recipient, content, and which agent or scheduled job emitted it. One chokepoint, one log. The same day I fix two small bugs that are themselves the signal of where the system is now: a heartbeat scratchpad serialization bug, and a year-extraction filer bug. Meta-bugs. The loop that watches for problems has its own problems. That is the kind of bug a small system never has.

What the two weeks teach me is that complexity has crossed a threshold. The system is now big enough that its operational concerns are first-class: cost, attribution, fallbacks, audit, self-observation. Syndic is the proof that the architectural rules from the earlier chapters hold up under the weight of a real second domain. Cost discipline is no longer a nice-to-have. And the heartbeat means Anton is, for the first time, doing something between requests, even if that something is just looking at himself.