I used to watch Claude Code like a helicopter parent. Every few minutes: Is it still on track? Did it actually write tests or just say it did? When it says “done,” is it actually done? The answer was often no. Claude Code would stop mid-task and claim completion. It would write tests that looked right but tested nothing. It would “refactor” code by replacing implementation with TODO comments. I couldn’t leave my desk. I couldn’t trust the output. I was reviewing everything anyway, which defea...