Gemma 4 NVFP4 on the DGX Spark: 271 tok/s at 8 concurrent, native tool calling and reasoning
Native tool calling and reasoning mode on Gemma 4 NVFP4 over 128GB of unified memory.
Native tool calling and reasoning mode on Gemma 4 NVFP4 over 128GB of unified memory.
Concurrency, awakening clones, and a cast of named model specialists working as a team.
How a billing failure turned into a local-LLM upgrade and a real fallback chain.
A longer companion to the manifesto on regenerative software.
Notes on building software that tries to last.
Two weeks of operational discipline: cost attribution, fallbacks, the syndic domain, and a survey heartbeat.
Federation, a Deno sandbox, an encrypted secrets table, and a nightly self-review loop.
Replacing a framework with a few hundred lines of runtime, and reshaping the system around two primitives.
A week of turning implicit conventions into explicit contracts: vLLM, LiteLLM, memory, traces, prompts as data.
Hardening the browser, lifting the work to the LLM, and turning domains into pluggable substrates.
Standing up a 30B-parameter LLM at 50+ tok/s on the NVIDIA DGX Spark, the technical journey.
A first full weekend of building turns Anton into something the family can actually use.
Building Anton, a personal agent OS for my family on a DGX Spark, day one.