Blog
Essays and breakdowns on building products, design, AI orchestration, and everything in between. Sometimes polished, sometimes a work-in-progress.
Every modern vision model already chunks images to understand them. Local models need that chunking made explicit, because they can't paper over a missed detail the way a frontier model can.
When your local model's tool call drops a required parameter and a long file almost gets thrown away.
When a model queues five edits against one file, working top-down is a bug. Here's the order that fixed it.
My agent's context window kept jumping from 22% to 60% in a single turn. The leak wasn't where I was looking.
A recent Llama.cpp update pushed me from 60 tokens per second to 80-plus on the same machine. Here's what I'd run and what I'd turn on.