
Stop Letting AI Agents Guess Where Code Goes
A practical guide to structuring your codebase so AI agents stop creating architectural debt.

From Craftsman to Toolmaker: Where Value Concentrates Now
What Developers Actually Get Paid For Now

Mob Sessions Are Variance Insurance, Not Meetings
When Paying Three People for One Feature Is Actually Efficient
<100 subscribers


Stop Letting AI Agents Guess Where Code Goes
A practical guide to structuring your codebase so AI agents stop creating architectural debt.

From Craftsman to Toolmaker: Where Value Concentrates Now
What Developers Actually Get Paid For Now

Mob Sessions Are Variance Insurance, Not Meetings
When Paying Three People for One Feature Is Actually Efficient

Share Dialog
Share Dialog
Part 5 of 5: Organizational Structures for AI-Native Development
Every CTO I've talked to in the past year asks the same question: "We know AI changes how we build software. How do we actually make the shift?"
Then they describe the plan. Reorg the teams. Kill sprints company-wide. Mandate new tooling. Roll it out in Q2.
That's how you create resistance, confusion, and a convenient scapegoat when things go wrong.
The answer that actually works: deploy in pockets. Document what breaks. Let success spread organically.
Don't start with your revenue-generating core product. Don't start with the team supporting 10,000 enterprise customers. Don't start with the platform that every other team depends on.
Start with:
New product development that hasn't launched yet
Innovation groups exploring new markets
Internal tools with small user bases
Refactor projects that are greenfield rewrites
These environments share critical characteristics: failure is contained, iteration is expected, and success creates proof points without risking the business.
One team at a payments company ran this experiment on their internal admin dashboard. Five developers, no customers, complete autonomy. They killed sprints, implemented continuous integration to main, started mob sessions for integration work.
Each developer owned different features—one on reporting, one on user management, one on payment reconciliation. They mobbed when features intersected or when architectural decisions affected multiple areas. Otherwise, solo iteration with AI.
Three months later: 40% more features shipped, zero production incidents, team reported higher satisfaction. The core payment processing team? Still running two-week sprints because the risk profile is different.
That's the pattern. Innovation at the edges. Stability at the core. Don't force convergence.

Here's the playbook:
Week 1: Team selection and opt-in
This must be voluntary. Mandated transformation creates resentment. You want teams that are frustrated with current processes and hungry to try something different.
Pick 3-5 people maximum. Mix of senior and mid-level. At least one person with strong product sense. At least one who understands system architecture deeply.
These people will typically own different features individually. They collaborate through mob sessions when variance is high, help each other when blocked, and coordinate at integration checkpoints—but each person owns their features end-to-end. Five people can stay aligned informally. Eight people need coordination overhead that defeats the purpose.
Explicit conversation: "We're experimenting. Some things will break. We'll document what we learn. You can opt out anytime."
Week 2: Process teardown
Stop sprint planning. Stop daily standups. Stop planning poker.
Start continuous work intake. Start integration checkpoints (weekly). Start documenting variance patterns.
This feels chaotic initially. The chaos reveals where the old processes were actually providing value versus where they were just ceremony.
Weeks 3-8: Build and document
Ship features using the new patterns:
Continuous iteration instead of timeboxes
Solo vs mob based on variance assessment
Intent articulation with AI before starting
Integration checkpoints instead of status meetings
Document everything that breaks:
When did solo iteration fail and need coordination?
When did mob sessions waste time on straightforward work?
Where did lack of sprints create confusion?
What metrics help vs what metrics are noise?
This documentation is the actual output. Features are secondary. You're learning what works in your context.
Week 9: Evaluation and decision
Three outcomes are possible:
It's working better — Continue for another quarter, start documenting patterns for potential expansion.
Mixed results — Refine the approach, try another 8 weeks, be specific about what you're testing.
It's not working — Revert to previous process, document specifically what failed and why.
All three outcomes are valuable. The third one is not failure—it's data.
Ignore: Story points completed. Velocity trends. Individual feature counts. Lines of code written. These optimize for the old constraint (code production), not the new one (clarity and coordination).
Measure instead:
Calendar time to ship — From "we should build this" to "it's in production." This captures coordination overhead, rework cycles, and actual delivery speed.
Variance in delivery — Standard deviation in feature delivery time. Tightening variance signals better pattern matching—solo vs mob decisions getting more accurate.
Integration rework — How often do features need changes after initial implementation because they don't compose with other work? Decreasing rework signals better architectural boundaries.
Unplanned coordination events — How many times per week do people interrupt others for "quick questions"? Decreasing interruptions signals better autonomy.
Team retention and satisfaction — Lagging indicator, but matters for sustainability.
Track these weekly. Share them with the team. Adjust based on what the data reveals.
Six months in, you should see clear signals. The best one: skeptics asking "how do we try this" instead of "why would we try this." The worst one: teams complaining but not opting out—that suggests they feel pressured, not convinced.

Most transformation documentation ends up useless: vague platitudes, context-free metrics, or personal stories that don't generalize.
What works: decision journals and pattern libraries.
Decision journals capture context, decision, rationale, outcome, and learning. Example:
Context: Implementing payments integration, unclear requirements
Decision: Started with solo iteration (developer A)
Rationale: Seemed straightforward, Stripe integration is documented
Outcome: Hit architectural issues day 2, needed 3-day detour
Learning: Any feature touching payments needs mob session upfront
Pattern: Add "payment-related" as automatic mob trigger
After 20-30 of these, you start seeing: "When X conditions exist, Y coordination approach works better." That becomes your pattern library. Not a process manual—a set of heuristics that help teams make coordination decisions.
Two types of skepticism emerge:
"This won't work because [reasonable concern]" — Example: "Without sprints, how do we commit to deadlines?" Response: "Great question. Let's test it. We'll measure calendar time to delivery for 8 weeks. If it's slower or less predictable, we have data. If it's faster, we also have data." Engage the concern. Make it testable.
"This is just [dismissive label]" — Example: "This is just cowboy coding." Response: "We're running a time-boxed experiment with clear success criteria and documentation requirements. If it fails, we'll know why and revert."
Don't argue. Don't defend. Run the experiment.
The skeptics who won't engage with data aren't your audience. The ones asking genuine questions are potential advocates once you have proof points.
After a successful pocket deployment, the temptation is to roll it out company-wide immediately. Resist this.
Expand when you have 3+ successful pocket deployments with consistent patterns, other teams are actively asking to join, and leadership is asking "how do we scale this" not "should we do this."
Hold when only one team has tried it, results are context-specific, teams aren't asking to join (they're being voluntold), or you can't articulate clear patterns yet.
This feels slow. It is slow. That's intentional. Fast transformation creates backlash. Slow transformation creates sustainable change.
"This works for 50-person companies. We're 5,000 people."
You don't transform 5,000 people. You transform pockets of 3-5 and let it spread.
Phase 1: Permission structure. Executive sponsorship not to mandate change, but to protect experimental teams from process mandates. "These three teams are running AI-native experiments for Q1. They're exempt from sprint planning and story pointing. They'll present results in Q2."
Phase 2: Multiple independent pockets. Run 5-10 simultaneously across different products, risk profiles, and team compositions. Some will succeed. Some will fail. All will generate data.
Phase 3: Pattern synthesis. After 6 months, extract what worked across contexts, what was context-specific, what failed everywhere. This becomes your internal playbook—not "the Agile transformation playbook," your specific learnings from your specific context.
Phase 4: Opt-in expansion. Any team can adopt these patterns if they commit to the documentation requirements and measurement framework. Don't mandate. Don't set adoption targets. Let results drive adoption.
The failure mode for large organizations is treating this as a process rollout instead of a learning program. Hiring consultants, training 500 people on new ceremonies, setting adoption targets—that optimizes for coverage. You want to optimize for learning.
Running these experiments will surface problems. That's the point.
Calendar planning breaks when you stop estimating in story points. Career ladders need updates when individual feature ownership isn't the metric. Cross-team dependencies get messy when one team runs continuous flow and another runs two-week sprints. Reporting gets weird when executive dashboards built around velocity need new metrics.
These aren't reasons to stop. They're discoveries about where your current processes have hidden dependencies. Every broken process is an opportunity to design something better.
And the safety valve that makes all of this work: if a team tries it for 4 weeks and it's not working, they revert. No shame. No questions. Just document what didn't work. The teams that revert and explain why contribute as much as the teams that succeed.
Every transformation starts with a leader asking: "How do we get there from here?"
The answer that works is also the one that feels most unsatisfying: protect space for one team to experiment, document what they learn, let success spread organically. It feels slow. It feels uncontrolled.
It's also how lasting organizational change actually happens—not through mandates, but through proof points that make the old way feel obsolete.
One team. Eight weeks. Clear metrics. Honest documentation.
That's it. That's the whole strategy.
Series Recap
This is the final part of a 5-part series on building AI-native engineering organizations:
Process debt — why 95% of AI initiatives fail before the technology does
Intent articulation — the new bottleneck that replaced code production
From craftsman to toolmaker — where value concentrates now
Mob sessions as variance insurance — coordination patterns that match bandwidth to uncertainty
This piece — the practical roadmap for making the shift
The organizations that thrive in the AI-native era won't be the ones with the best processes. They'll be the ones that learned fastest from their own experiments.
If you've been following along, I'd love to hear: what's the first experiment you'd run? What's stopping you from starting it this week?
Part 5 of 5: Organizational Structures for AI-Native Development
Every CTO I've talked to in the past year asks the same question: "We know AI changes how we build software. How do we actually make the shift?"
Then they describe the plan. Reorg the teams. Kill sprints company-wide. Mandate new tooling. Roll it out in Q2.
That's how you create resistance, confusion, and a convenient scapegoat when things go wrong.
The answer that actually works: deploy in pockets. Document what breaks. Let success spread organically.
Don't start with your revenue-generating core product. Don't start with the team supporting 10,000 enterprise customers. Don't start with the platform that every other team depends on.
Start with:
New product development that hasn't launched yet
Innovation groups exploring new markets
Internal tools with small user bases
Refactor projects that are greenfield rewrites
These environments share critical characteristics: failure is contained, iteration is expected, and success creates proof points without risking the business.
One team at a payments company ran this experiment on their internal admin dashboard. Five developers, no customers, complete autonomy. They killed sprints, implemented continuous integration to main, started mob sessions for integration work.
Each developer owned different features—one on reporting, one on user management, one on payment reconciliation. They mobbed when features intersected or when architectural decisions affected multiple areas. Otherwise, solo iteration with AI.
Three months later: 40% more features shipped, zero production incidents, team reported higher satisfaction. The core payment processing team? Still running two-week sprints because the risk profile is different.
That's the pattern. Innovation at the edges. Stability at the core. Don't force convergence.

Here's the playbook:
Week 1: Team selection and opt-in
This must be voluntary. Mandated transformation creates resentment. You want teams that are frustrated with current processes and hungry to try something different.
Pick 3-5 people maximum. Mix of senior and mid-level. At least one person with strong product sense. At least one who understands system architecture deeply.
These people will typically own different features individually. They collaborate through mob sessions when variance is high, help each other when blocked, and coordinate at integration checkpoints—but each person owns their features end-to-end. Five people can stay aligned informally. Eight people need coordination overhead that defeats the purpose.
Explicit conversation: "We're experimenting. Some things will break. We'll document what we learn. You can opt out anytime."
Week 2: Process teardown
Stop sprint planning. Stop daily standups. Stop planning poker.
Start continuous work intake. Start integration checkpoints (weekly). Start documenting variance patterns.
This feels chaotic initially. The chaos reveals where the old processes were actually providing value versus where they were just ceremony.
Weeks 3-8: Build and document
Ship features using the new patterns:
Continuous iteration instead of timeboxes
Solo vs mob based on variance assessment
Intent articulation with AI before starting
Integration checkpoints instead of status meetings
Document everything that breaks:
When did solo iteration fail and need coordination?
When did mob sessions waste time on straightforward work?
Where did lack of sprints create confusion?
What metrics help vs what metrics are noise?
This documentation is the actual output. Features are secondary. You're learning what works in your context.
Week 9: Evaluation and decision
Three outcomes are possible:
It's working better — Continue for another quarter, start documenting patterns for potential expansion.
Mixed results — Refine the approach, try another 8 weeks, be specific about what you're testing.
It's not working — Revert to previous process, document specifically what failed and why.
All three outcomes are valuable. The third one is not failure—it's data.
Ignore: Story points completed. Velocity trends. Individual feature counts. Lines of code written. These optimize for the old constraint (code production), not the new one (clarity and coordination).
Measure instead:
Calendar time to ship — From "we should build this" to "it's in production." This captures coordination overhead, rework cycles, and actual delivery speed.
Variance in delivery — Standard deviation in feature delivery time. Tightening variance signals better pattern matching—solo vs mob decisions getting more accurate.
Integration rework — How often do features need changes after initial implementation because they don't compose with other work? Decreasing rework signals better architectural boundaries.
Unplanned coordination events — How many times per week do people interrupt others for "quick questions"? Decreasing interruptions signals better autonomy.
Team retention and satisfaction — Lagging indicator, but matters for sustainability.
Track these weekly. Share them with the team. Adjust based on what the data reveals.
Six months in, you should see clear signals. The best one: skeptics asking "how do we try this" instead of "why would we try this." The worst one: teams complaining but not opting out—that suggests they feel pressured, not convinced.

Most transformation documentation ends up useless: vague platitudes, context-free metrics, or personal stories that don't generalize.
What works: decision journals and pattern libraries.
Decision journals capture context, decision, rationale, outcome, and learning. Example:
Context: Implementing payments integration, unclear requirements
Decision: Started with solo iteration (developer A)
Rationale: Seemed straightforward, Stripe integration is documented
Outcome: Hit architectural issues day 2, needed 3-day detour
Learning: Any feature touching payments needs mob session upfront
Pattern: Add "payment-related" as automatic mob trigger
After 20-30 of these, you start seeing: "When X conditions exist, Y coordination approach works better." That becomes your pattern library. Not a process manual—a set of heuristics that help teams make coordination decisions.
Two types of skepticism emerge:
"This won't work because [reasonable concern]" — Example: "Without sprints, how do we commit to deadlines?" Response: "Great question. Let's test it. We'll measure calendar time to delivery for 8 weeks. If it's slower or less predictable, we have data. If it's faster, we also have data." Engage the concern. Make it testable.
"This is just [dismissive label]" — Example: "This is just cowboy coding." Response: "We're running a time-boxed experiment with clear success criteria and documentation requirements. If it fails, we'll know why and revert."
Don't argue. Don't defend. Run the experiment.
The skeptics who won't engage with data aren't your audience. The ones asking genuine questions are potential advocates once you have proof points.
After a successful pocket deployment, the temptation is to roll it out company-wide immediately. Resist this.
Expand when you have 3+ successful pocket deployments with consistent patterns, other teams are actively asking to join, and leadership is asking "how do we scale this" not "should we do this."
Hold when only one team has tried it, results are context-specific, teams aren't asking to join (they're being voluntold), or you can't articulate clear patterns yet.
This feels slow. It is slow. That's intentional. Fast transformation creates backlash. Slow transformation creates sustainable change.
"This works for 50-person companies. We're 5,000 people."
You don't transform 5,000 people. You transform pockets of 3-5 and let it spread.
Phase 1: Permission structure. Executive sponsorship not to mandate change, but to protect experimental teams from process mandates. "These three teams are running AI-native experiments for Q1. They're exempt from sprint planning and story pointing. They'll present results in Q2."
Phase 2: Multiple independent pockets. Run 5-10 simultaneously across different products, risk profiles, and team compositions. Some will succeed. Some will fail. All will generate data.
Phase 3: Pattern synthesis. After 6 months, extract what worked across contexts, what was context-specific, what failed everywhere. This becomes your internal playbook—not "the Agile transformation playbook," your specific learnings from your specific context.
Phase 4: Opt-in expansion. Any team can adopt these patterns if they commit to the documentation requirements and measurement framework. Don't mandate. Don't set adoption targets. Let results drive adoption.
The failure mode for large organizations is treating this as a process rollout instead of a learning program. Hiring consultants, training 500 people on new ceremonies, setting adoption targets—that optimizes for coverage. You want to optimize for learning.
Running these experiments will surface problems. That's the point.
Calendar planning breaks when you stop estimating in story points. Career ladders need updates when individual feature ownership isn't the metric. Cross-team dependencies get messy when one team runs continuous flow and another runs two-week sprints. Reporting gets weird when executive dashboards built around velocity need new metrics.
These aren't reasons to stop. They're discoveries about where your current processes have hidden dependencies. Every broken process is an opportunity to design something better.
And the safety valve that makes all of this work: if a team tries it for 4 weeks and it's not working, they revert. No shame. No questions. Just document what didn't work. The teams that revert and explain why contribute as much as the teams that succeed.
Every transformation starts with a leader asking: "How do we get there from here?"
The answer that works is also the one that feels most unsatisfying: protect space for one team to experiment, document what they learn, let success spread organically. It feels slow. It feels uncontrolled.
It's also how lasting organizational change actually happens—not through mandates, but through proof points that make the old way feel obsolete.
One team. Eight weeks. Clear metrics. Honest documentation.
That's it. That's the whole strategy.
Series Recap
This is the final part of a 5-part series on building AI-native engineering organizations:
Process debt — why 95% of AI initiatives fail before the technology does
Intent articulation — the new bottleneck that replaced code production
From craftsman to toolmaker — where value concentrates now
Mob sessions as variance insurance — coordination patterns that match bandwidth to uncertainty
This piece — the practical roadmap for making the shift
The organizations that thrive in the AI-native era won't be the ones with the best processes. They'll be the ones that learned fastest from their own experiments.
If you've been following along, I'd love to hear: what's the first experiment you'd run? What's stopping you from starting it this week?
No comments yet