AI coding agents need evidence-first review, not just cheaper routing

Author here.

This article came from looking at internal runs of an AI engineering workflow and noticing that evidence acquisition can happen surprisingly late in the process.

The two token observations in the article are not presented as a benchmark. They are just examples that led me to think the problem is not only “which model is cheaper”, but also “when does the system acquire evidence”.

I’m especially interested in criticism of the framing: whether evidence-first review is the right abstraction, or whether this should be treated purely as a routing / context-management problem.