Blog

Notes from the bench

Essays and breakdowns on building products, design, AI orchestration, and everything in between. Sometimes polished, sometimes a work-in-progress.

The Context Window Is a Trap

AI · Building · Process

The Context Window Is a Trap

We chased million-token context windows for years. The rot didn't get fixed. It just moved somewhere quieter.

· 4 min read
The Cloud Is a Liability

AI · Thoughts · Building

The Cloud Is a Liability

Every time a regulated firm pastes sensitive data into a cloud chatbot, it isn't using AI. It's leaking.

· 3 min read
The AI Scam Playbook

AI · Thoughts

The AI Scam Playbook

Your phone rings, it's your daughter's voice, and she's panicking. Except she never called.

· 6 min read
How AI Actually Works, Term by Term

AI · Thoughts · Notes

How AI Actually Works, Term by Term

Thirteen words decode how AI really works, from token to scaling laws. No math degree required.

· 1 min read
Mosaic: a pre-focus layer for local vision models

AI · Building · vision-models

Mosaic: a pre-focus layer for local vision models

Every modern vision model already chunks images to understand them. Local models need that chunking made explicit, because they can't paper over a missed detail the way a frontier model can.

· 7 min read
Catching the 7,000-character write

AI · Building · Process

Catching the 7,000-character write

When your local model's tool call drops a required parameter and a long file almost gets thrown away.

· 5 min read
The bottom-up edit rule

AI · Building · Process

The bottom-up edit rule

When a model queues five edits against one file, working top-down is a bug. Here's the order that fixed it.

· 4 min read
The 12,000-token message I didn't know I was sending

AI · Building · Process

The 12,000-token message I didn't know I was sending

My agent's context window kept jumping from 22% to 60% in a single turn. The leak wasn't where I was looking.

· 3 min read
How I'd set up LM Studio today

AI · Building · Notes

How I'd set up LM Studio today

A recent Llama.cpp update pushed me from 60 tokens per second to 80-plus on the same machine. Here's what I'd run and what I'd turn on.

· 4 min read