Blog
Essays and breakdowns on building products, design, AI orchestration, and everything in between. Sometimes polished, sometimes a work-in-progress.
Cloud APIs are great for deep reasoning. For everything else, a model on my own machine does the job for free.
When a model queues five edits against one file, working top-down is a bug. Here's the order that fixed it.
My agent's context window kept jumping from 22% to 60% in a single turn. The leak wasn't where I was looking.
A recent Llama.cpp update pushed me from 60 tokens per second to 80-plus on the same machine. Here's what I'd run and what I'd turn on.