Writing

Gemma 4 NVFP4 on the DGX Spark: 271 tok/s at 8 concurrent, native tool calling and reasoning

Native tool calling and reasoning mode on Gemma 4 NVFP4 over 128GB of unified memory.

Concurrency, awakening clones, and a cast of named model specialists working as a team.

How a billing failure turned into a local-LLM upgrade and a real fallback chain.

A longer companion to the manifesto on regenerative software.

Notes on building software that tries to last.

Two weeks of operational discipline: cost attribution, fallbacks, the syndic domain, and a survey heartbeat.

Federation, a Deno sandbox, an encrypted secrets table, and a nightly self-review loop.

Replacing a framework with a few hundred lines of runtime, and reshaping the system around two primitives.

A week of turning implicit conventions into explicit contracts: vLLM, LiteLLM, memory, traces, prompts as data.

Hardening the browser, lifting the work to the LLM, and turning domains into pluggable substrates.

Standing up a 30B-parameter LLM at 50+ tok/s on the NVIDIA DGX Spark, the technical journey.

A first full weekend of building turns Anton into something the family can actually use.

Building Anton, a personal agent OS for my family on a DGX Spark, day one.