Blog · AI, Building, Process

The bottom-up edit rule

When a model queues five edits against one file, working top-down is a bug. Here's the order that fixed it.

Jarvis v2 TUI at session start. The Activity panel shows the small brain classifying intent, running the fusion step, and calling request_big_brain.
The small brain routes, the big brain builds. Multi-edit cycles run inside that big-brain call, and that's where ordering bit me.

The context

Frontier models can muscle through sloppy tools. Local models can't.

A frontier model with a 200K-token window and a few trillion parameters of horsepower can absorb a lot of badly-designed tooling. If line numbers shift between edits, it notices and re-reads the file. If a tool returns ambiguous state, it reasons about what probably happened. Headroom covers for a lot of sharp edges.

A 26B model running on a laptop doesn't have that headroom. 32K tokens, not 200K. Re-reading a 400-line file twice during a review cycle eats 15% of the window. Reasoning its way around "what probably happened" costs turns the model can't afford to spend before it forgets the task. Smaller models are also less forgiving of ambiguity in tool contracts. They follow the contract more literally, which is usually a feature, right up until the contract has a flaw.

So on Jarvis v2, the environment does the work the model can't. Every tool call has to produce predictable state. Every multi-step flow has to work deterministically, without the model having to remember what shifted underneath it. This post is about one flow where I got that contract wrong, and what it took to notice.

The setup

Read once, edit many.

Jarvis's big brain has a read_file tool that returns content with line numbers, and a replace_lines tool that takes start_line, end_line, and a replacement body. Normal flow: read the file once, queue several edits against that snapshot, apply them as a batch.

One edit works fine. Two edits usually work. Five edits against a real file is where it falls apart, and for a while I couldn't figure out why.

The failure mode

Line numbers shift the second you apply the first edit.

Say the model reads component.tsx and gets a 140-line snapshot back. The reviewer flagged three things: a function signature, a call site that depends on it, and some prop defaults further down. The big brain queues all three edits against the one snapshot before applying any of them.

Tool call queue component.tsx · top-down order

What the model sends to the executor before any apply step runs. Every call references line numbers from the same read_file snapshot.

  1. R read_file one-time snapshot
    path
    component.tsx
    returned 140 lines returned with line numbers.
  2. 1 replace_lines fix the function signature
    start_line
    45
    end_line
    48
    body
    "export function UserCard({ id, ...rest })" (4 lines in, 6 lines out)
  3. 2 replace_lines update the call site that uses it
    start_line
    82
    end_line
    85
    body
    "<UserCard id={user.id} />"
  4. 3 replace_lines refresh the prop defaults
    start_line
    120
    end_line
    123
    body
    "const DEFAULTS = { variant: 'compact' }"

All three reference line numbers from that same read_file snapshot. None of them know about each other.

Apply call 1. The replacement is six lines instead of four, so the file is now 142 lines. Line 82 is now line 84. Line 120 is now line 122.

Call 2 fires next. It targets line 82, but that content isn't there anymore. Either you hit a content-mismatch error, or worse, you silently overwrite the wrong thing because the lines near 82 look close enough to the snapshot to pass validation. Call 3 falls the same way.

One bad edit turned the review cycle into a doom loop. Reviewer flags three issues. Big brain queues three fixes. Two land in the wrong place. Reviewer sees new issues (because the file is now half-broken), loops again. Work that should take one pass was taking four.

The order you apply changes in matters as much as what the changes are.

— Lesson I kept relearning, one edit at a time

What I tried first

The wrong fixes.

Re-read the file before every edit. Works. Also burns tokens on a redundant snapshot for every single tool call, which piles up fast on large files.

Make edits atomic, one per tool call, agent decides when to batch. Works. Also slows down any legitimate multi-fix refactor and adds round-trip latency.

Switch to diff-style patches. Works on paper. Harder to prompt for correctly, especially when the model is reasoning about what to change rather than literally writing a diff.

All three are real answers. They just all have tradeoffs I didn't love for a flow that wanted to be fast.

The actual fix

Sort edits by start_line descending, apply bottom-up.

An edit at line 120 only shifts lines below line 120. Everything above is untouched. Apply that edit first, and the snapshot stays valid for every remaining edit that targets earlier lines.

Same three calls as before. Same snapshot. Different apply order.

Tool call queue component.tsx · bottom-up order

Reordered at apply time by descending start_line. The model can queue edits in any order it likes. The executor sorts them before running.

  1. 1 replace_lines bottom edit first applied
    start_line
    120
    end_line
    123
    body
    "const DEFAULTS = { variant: 'compact' }"
    returned Lines 1–119 unchanged. Targets above still valid.
  2. 2 replace_lines middle edit, snapshot still accurate applied
    start_line
    82
    end_line
    85
    body
    "<UserCard id={user.id} />"
    returned Target content matches snapshot exactly.
  3. 3 replace_lines top edit, nothing below ever touched it applied
    start_line
    45
    end_line
    48
    body
    "export function UserCard({ id, ...rest })"
    returned Target content matches snapshot exactly.

No re-reads. No atomic calls. No diff format. Just a sort on the edit queue before apply.

Diagram comparing top-down and bottom-up edit ordering. Top-down: first edit at line 45 shifts line numbers, causing edits at 82 and 120 to miss. Bottom-up: first edit at line 120 shifts no earlier lines, so edits at 82 and 45 remain valid.

Same three edits. Top-down lands 1 of 3 correctly. Bottom-up lands 3 of 3.

QA

The test that caught it.

The original symptom wasn't obvious. The TUI was logging edit applied events that looked fine. I only caught it when I diffed the output against what I expected and saw edits landing in the wrong places.

I wrote a fixture. A 150-line file and a queue of five edits at spaced-out line ranges. Ran it top-down, diffed the output, counted hits.

  • Top-down: 2 of 5 edits landed correctly. The other three either errored or overwrote nearby lines.
  • Bottom-up: 5 of 5 landed, every run.

This is the kind of test that doesn't come out of reading the code. You have to put the system through a realistic load and look at what it actually produced, not what it logged.

The one-line fix

A rule in the system prompt.

The fix was a single rule added to the big brain's system prompt (jarvis/llm/prompts.py:30):

When you queue multiple replace_lines edits against one read_file snapshot, sort them by start_line descending and apply bottom-up. Line-number shifts from earlier edits won't invalidate later ones this way.

I also added the sort on the apply side, so the model can queue edits in any order and the executor reorders before running. Belt and suspenders. The prompt tells the model the right way. The apply-side sort makes sure it can't get it wrong.

Single-edit cases are unaffected. Multi-edit review cycles now land every change on the first try.

The takeaway

Most LLM tool-call bugs look like prompt bugs until you look closer.

I spent an hour blaming the model for sloppy edits before I tested the assumption that any edit order would work. The model was fine. The tool contract I gave it wasn't.

When a tool call is technically correct but the outcome is wrong, look at the side effects of other calls in the same batch. Check if one call is changing state that a later call assumes. That's where the bug usually lives, and it won't show up in any single call's logs.

Share this post

Share on LinkedIn Share on X