Don't Boil the Ocean: A Roadmap for Leaders Who Want to Move Fast

No activity yet

Share Don't Boil the Ocean: A Roadmap for Leaders Who Want to Move Fast

Twitter Bluesky

Subscribe to Blockenberg

<100 subscribers

Subscribe to Blockenberg

<100 subscribers

Part 5 of 5: Organizational Structures for AI-Native Development

You've read the arguments. Process debt kills velocity. Intent articulation is the bottleneck. Value moved from code to constraints. Mob sessions beat distributed coordination for high-variance work.

Now the practical question: how do you actually transform an organization?

The answer that doesn't work: mandate it from the top. Kill all sprints Monday morning. Reorganize into new team structures. Roll out new processes company-wide.

That's how you create resistance, confusion, and a convenient scapegoat when things go wrong.

The answer that works: deploy in pockets. Document what breaks. Let success spread organically.

Start Where Failure Is Cheap

Don't start with your revenue-generating core product. Don't start with the team supporting 10,000 enterprise customers. Don't start with the platform that every other team depends on.

Start with:

New product development that hasn't launched yet
Innovation groups exploring new markets
Internal tools with small user bases
Refactor projects that are greenfield rewrites

These environments share critical characteristics: failure is contained, iteration is expected, and success creates proof points without risking the business.

One team at a payments company ran this experiment on their internal admin dashboard. Five developers, no customers, complete autonomy. They killed sprints, implemented continuous integration to main, started mob sessions for integration work.

Each developer owned different features—one on reporting, one on user management, one on payment reconciliation. They mobbed when features intersected or when architectural decisions affected multiple areas. Otherwise, solo iteration with AI.

Three months later: 40% more features shipped, zero production incidents, team reported higher satisfaction. The core payment processing team? Still running two-week sprints because the risk profile is different.

That's the pattern. Innovation at the edges. Stability at the core. Don't force convergence.

The Pocket Deployment Framework

Here's the playbook for running a successful pocket deployment:

Week 1: Team selection and opt-in

Critical: this must be voluntary. Mandated transformation creates resentment. You want teams that are frustrated with current processes and hungry to try something different.

Pick 3-5 people maximum. Mix of senior and mid-level. At least one person who can articulate intent well (product sense). At least one person who understands system architecture deeply.

Important: these people will typically own different features individually. They're not all working on the same thing. They collaborate through mob sessions when variance is high, help each other when blocked, and coordinate at integration checkpoints—but each person owns their features end-to-end.

The small team size enables this pattern. Five people can stay aligned informally. Eight people need coordination overhead that defeats the purpose.

Explicit conversation: "We're experimenting. Some things will break. We'll document what we learn. You can opt out anytime if it's not working."

Week 2: Process teardown

Stop sprint planning. Stop daily standups. Stop pointing poker.

Start continuous work intake. Start integration checkpoints (weekly). Start documenting variance patterns.

This feels chaotic initially. That's expected. The chaos reveals where the old processes were actually providing value versus where they were just ceremony.

Weeks 3-8: Build and document

Ship features using the new patterns:

Continuous iteration instead of timeboxes
Solo vs mob based on variance assessment
Intent articulation with AI before starting
Integration checkpoints instead of status meetings

Document everything that breaks:

When did solo iteration fail and need coordination?
When did mob sessions waste time on straightforward work?
Where did lack of sprints create confusion?
What metrics help vs what metrics are noise?

This documentation is the actual output. Features are secondary. You're learning what works in your context.

Week 9: Evaluation and decision

Three outcomes are possible:

It's working better - Team velocity increased, satisfaction high, clear patterns emerged. Recommendation: continue for another quarter, start documenting patterns for potential expansion.
Mixed results - Some things work, some don't. Recommendation: refine the approach, try another 8 weeks, be specific about what you're testing.
It's not working - Team is slower, confused, frustrated. Recommendation: revert to previous process, document specifically what failed and why.

All three outcomes are valuable. The third one is not failure—it's data.

What to Measure (And What to Ignore)

Traditional metrics don't capture the right signals.

Ignore:

Story points completed
Velocity trends
Individual feature counts
Lines of code written

These optimize for the old constraint (code production) not the new one (clarity and coordination).

Measure instead:

Calendar time to ship - How long from "we should build this" to "it's in production"? This captures coordination overhead, rework cycles, and actual delivery speed.

Variance in delivery - What's the standard deviation in feature delivery time? Tightening variance signals better pattern matching (solo vs mob decisions getting more accurate).

Integration rework - How often do features need changes after initial implementation because they don't compose with other work? Decreasing rework signals better architectural boundaries.

Unplanned coordination events - How many times per week do people need to interrupt others for "quick questions"? Decreasing interruptions signals better autonomy and clearer boundaries.

Team retention and satisfaction - Are people staying? Do they report higher satisfaction? This is a lagging indicator but matters for sustainability.

Track these weekly during the pocket deployment. Share them with the team. Adjust based on what the data reveals.

The Documentation That Actually Helps

Most transformation documentation ends up useless: vague platitudes about communication, context-free metrics claiming 30% improvement, or personal stories that don't generalize.

What works: decision journals and pattern libraries.

Decision journals capture:

Context: what was the situation?
Decision: what did we choose?
Rationale: why did we think this would work?
Outcome: what actually happened?
Learning: what would we do differently?

Example entry:

Context: Implementing payments integration, unclear requirements
Decision: Started with solo iteration (developer A)
Rationale: Seemed straightforward, Stripe integration is documented
Outcome: Hit architectural issues day 2, needed 3-day detour
Learning: Any feature touching payments needs mob session upfront due to compliance requirements we keep forgetting
Pattern: Add "payment-related" as automatic mob trigger

After 20-30 of these, patterns emerge. You start seeing: "When X conditions exist, Y coordination approach works better."

That becomes your pattern library. Not a process manual. A set of heuristics that help teams make coordination decisions.

How to Handle the Skeptics

Two types of skepticism emerge:

Type 1: "This won't work because [reasonable concern]"

Example: "Without sprints, how do we commit to deadlines?"

Response: "Great question. Let's test it. This team will try continuous flow for 8 weeks. We'll measure calendar time to delivery. If it's slower or less predictable, we have data. If it's faster, we also have data."

Engage the concern. Make it testable. Document the result.

Type 2: "This is just [dismissive label]"

Example: "This is just cowboy coding" or "Real engineers use sprints"

Response: "We're running a time-boxed experiment with clear success criteria and documentation requirements. If it fails, we'll know why and revert. If it succeeds, we'll have data on what worked."

Don't argue. Don't defend. Just run the experiment and let results speak.

The skeptics who won't engage with data aren't your audience. The skeptics asking genuine questions are potential advocates once you have proof points.

When to Expand, When to Hold

After a successful pocket deployment, the temptation is to roll it out company-wide immediately.

Resist this.

Expand when:

You have 3+ successful pocket deployments with documented patterns
The patterns are showing consistency across teams
Other teams are actively asking to join
Leadership is asking "how do we scale this" not "should we do this"

Hold when:

Only one team has tried it
Results are mixed or context-specific
Teams aren't asking to join (they're being voluntold)
You can't articulate clear patterns yet

The expansion should be opt-in, not mandated. Teams that want to try it should go through the same pocket deployment framework: 8 weeks, clear metrics, documentation requirements.

This feels slow. It is slow. That's intentional.

Fast transformation creates backlash. Slow transformation creates sustainable change.

The Large Organization Playbook

"This works for 50-person companies. We're 5,000 people. How does this apply?"

You don't transform 5,000 people. You transform pockets of 3-5 people and let it spread.

Phase 1: Permission structure

Create explicit space for teams to experiment. This requires executive sponsorship not to mandate change, but to protect teams from process mandates while they're experimenting.

"These three teams (12 people total) are running AI-native experiments for Q1. They're exempt from sprint planning and story pointing. They'll document learnings and present results in Q2."

Phase 2: Multiple independent pockets

Run 5-10 pocket deployments simultaneously across different parts of the org. Different products, different risk profiles, different team compositions.

Some will succeed. Some will fail. All will generate data.

Phase 3: Pattern synthesis

After 6 months, you have 5-10 documented experiments. Extract patterns:

What worked across contexts?
What was context-specific?
What failed everywhere?
What needs more testing?

This becomes your internal playbook. Not "the Agile transformation playbook" - your specific learnings from your specific context.

Phase 4: Opt-in expansion

Open it up: any team can adopt these patterns if they commit to the documentation requirements and measurement framework.

Don't mandate. Don't set adoption targets. Just make it available and let results drive adoption.

If it's genuinely better, teams will opt in. If they don't, you have a data problem—either the patterns don't work as broadly as you thought, or you haven't communicated the value clearly.

What Large Orgs Get Wrong

The failure mode for large organizations: treating this as a process rollout instead of a learning program.

Wrong approach:

Hire consultants to define "the AI-native process"
Train 500 people on new ceremonies
Set adoption metrics (80% of teams by Q3)
Mandate compliance

Right approach:

Protect space for teams to experiment
Document what works and what doesn't
Synthesize patterns from actual experience
Make adoption opt-in with clear benefits

The wrong approach optimizes for coverage. The right approach optimizes for learning.

Large orgs that get this right don't have uniform processes. They have documented patterns and teams choosing what fits their context.

The Things That Will Break (And Why That's Good)

Running these experiments will surface problems. That's the point.

Expect:

Calendar planning breaks - When you stop estimating in story points, the annual roadmap process doesn't work the same way. You need new ways to think about capacity and commitments.

Career ladders need updates - When individual feature ownership isn't the metric, how do you evaluate performance? How do you decide promotions?

Cross-team dependencies get messy - When one team is running continuous flow and another is in two-week sprints, integration timing gets complicated.

Reporting gets weird - Executive dashboards built around velocity and burndown charts need new metrics.

These aren't reasons to stop. These are valuable discoveries about where your current processes have hidden dependencies.

Document them. Figure out solutions. Sometimes the solution is "this team needs different boundaries." Sometimes it's "we need to rethink career ladders for the AI era."

Every broken process is an opportunity to design something better.

The Safety Valve That Makes It Work

If a team tries this for 4 weeks and it's not working, they revert. No shame. No questions. Just document what didn't work and move on.

This safety valve creates three critical outcomes:

People will try things they'd otherwise resist (because they can stop)
Feedback stays honest (no pretending it works to save face)
Negative data becomes as valuable as success stories

The teams that revert and document why contribute as much as the teams that succeed.

How to Know If It's Actually Working

Six months in, you should have clear signals:

Good signals:

Multiple teams asking to join without prompting
Documented patterns being referenced by teams you didn't train
Calendar time to ship decreasing across participating teams
Retention increasing in experimental teams
Skeptics asking "how do we try this" instead of "why would we try this"

Bad signals:

Teams complaining but not opting out (suggests they feel pressured)
No documented patterns emerging (suggests learning isn't happening)
Success stories but no specifics (suggests cargo culting)
Teams reverting without documenting why (suggests psychological safety issues)

Neutral signals:

Some teams succeeding, some failing (expected)
Mixed results within same team (normal during learning)
Debates about what to measure (healthy exploration)

The goal isn't universal adoption. The goal is documented learnings and sustainable improvement for teams where it fits.

The Final Question

Every transformation starts with a leader asking: "How do we get there from here?"

The answer that works: protect space for teams to experiment, document what they learn, let success spread organically.

This feels unsatisfying to leaders used to decisive action. It feels slow. It feels uncontrolled.

It's also how actual organizational change happens—not through mandates, but through proof points that make the old way feel obsolete.

Start with one team. Eight weeks. Clear metrics. Honest documentation.

If it works, you'll know. If it doesn't, you'll know why.

Either way, you'll have data instead of beliefs.

And data is how you build conviction for the next step.

Series Conclusion

We've covered process debt that makes AI initiatives fail, intent articulation as the new bottleneck, the economic shift from craftsman to toolmaker, coordination patterns that match bandwidth to variance, and now the practical roadmap for transformation.

The organizations that thrive in the AI-native era won't be the ones with the best processes. They'll be the ones that learned fastest from their own experiments.

Start experimenting.

This is Part 5 of a 5-part series on building AI-native engineering organizations.

Part 5 of 5: Organizational Structures for AI-Native Development

You've read the arguments. Process debt kills velocity. Intent articulation is the bottleneck. Value moved from code to constraints. Mob sessions beat distributed coordination for high-variance work.

Now the practical question: how do you actually transform an organization?

The answer that doesn't work: mandate it from the top. Kill all sprints Monday morning. Reorganize into new team structures. Roll out new processes company-wide.

That's how you create resistance, confusion, and a convenient scapegoat when things go wrong.

The answer that works: deploy in pockets. Document what breaks. Let success spread organically.

Start Where Failure Is Cheap

Don't start with your revenue-generating core product. Don't start with the team supporting 10,000 enterprise customers. Don't start with the platform that every other team depends on.

Start with:

New product development that hasn't launched yet
Innovation groups exploring new markets
Internal tools with small user bases
Refactor projects that are greenfield rewrites

These environments share critical characteristics: failure is contained, iteration is expected, and success creates proof points without risking the business.

That's the pattern. Innovation at the edges. Stability at the core. Don't force convergence.

The Pocket Deployment Framework

Here's the playbook for running a successful pocket deployment:

Week 1: Team selection and opt-in

Critical: this must be voluntary. Mandated transformation creates resentment. You want teams that are frustrated with current processes and hungry to try something different.

Pick 3-5 people maximum. Mix of senior and mid-level. At least one person who can articulate intent well (product sense). At least one person who understands system architecture deeply.

The small team size enables this pattern. Five people can stay aligned informally. Eight people need coordination overhead that defeats the purpose.

Explicit conversation: "We're experimenting. Some things will break. We'll document what we learn. You can opt out anytime if it's not working."

Week 2: Process teardown

Stop sprint planning. Stop daily standups. Stop pointing poker.

Start continuous work intake. Start integration checkpoints (weekly). Start documenting variance patterns.

This feels chaotic initially. That's expected. The chaos reveals where the old processes were actually providing value versus where they were just ceremony.

Weeks 3-8: Build and document

Ship features using the new patterns:

Continuous iteration instead of timeboxes
Solo vs mob based on variance assessment
Intent articulation with AI before starting
Integration checkpoints instead of status meetings

Document everything that breaks:

When did solo iteration fail and need coordination?
When did mob sessions waste time on straightforward work?
Where did lack of sprints create confusion?
What metrics help vs what metrics are noise?

This documentation is the actual output. Features are secondary. You're learning what works in your context.

Week 9: Evaluation and decision

Three outcomes are possible:

It's working better - Team velocity increased, satisfaction high, clear patterns emerged. Recommendation: continue for another quarter, start documenting patterns for potential expansion.
Mixed results - Some things work, some don't. Recommendation: refine the approach, try another 8 weeks, be specific about what you're testing.
It's not working - Team is slower, confused, frustrated. Recommendation: revert to previous process, document specifically what failed and why.

All three outcomes are valuable. The third one is not failure—it's data.

What to Measure (And What to Ignore)

Traditional metrics don't capture the right signals.

Ignore:

Story points completed
Velocity trends
Individual feature counts
Lines of code written

These optimize for the old constraint (code production) not the new one (clarity and coordination).

Measure instead:

Calendar time to ship - How long from "we should build this" to "it's in production"? This captures coordination overhead, rework cycles, and actual delivery speed.

Variance in delivery - What's the standard deviation in feature delivery time? Tightening variance signals better pattern matching (solo vs mob decisions getting more accurate).

Integration rework - How often do features need changes after initial implementation because they don't compose with other work? Decreasing rework signals better architectural boundaries.

Unplanned coordination events - How many times per week do people need to interrupt others for "quick questions"? Decreasing interruptions signals better autonomy and clearer boundaries.

Team retention and satisfaction - Are people staying? Do they report higher satisfaction? This is a lagging indicator but matters for sustainability.

Track these weekly during the pocket deployment. Share them with the team. Adjust based on what the data reveals.

The Documentation That Actually Helps

Most transformation documentation ends up useless: vague platitudes about communication, context-free metrics claiming 30% improvement, or personal stories that don't generalize.

What works: decision journals and pattern libraries.

Decision journals capture:

Context: what was the situation?
Decision: what did we choose?
Rationale: why did we think this would work?
Outcome: what actually happened?
Learning: what would we do differently?

Example entry:

Context: Implementing payments integration, unclear requirements
Decision: Started with solo iteration (developer A)
Rationale: Seemed straightforward, Stripe integration is documented
Outcome: Hit architectural issues day 2, needed 3-day detour
Learning: Any feature touching payments needs mob session upfront due to compliance requirements we keep forgetting
Pattern: Add "payment-related" as automatic mob trigger

After 20-30 of these, patterns emerge. You start seeing: "When X conditions exist, Y coordination approach works better."

That becomes your pattern library. Not a process manual. A set of heuristics that help teams make coordination decisions.

How to Handle the Skeptics

Two types of skepticism emerge:

Type 1: "This won't work because [reasonable concern]"

Example: "Without sprints, how do we commit to deadlines?"

Engage the concern. Make it testable. Document the result.

Type 2: "This is just [dismissive label]"

Example: "This is just cowboy coding" or "Real engineers use sprints"

Response: "We're running a time-boxed experiment with clear success criteria and documentation requirements. If it fails, we'll know why and revert. If it succeeds, we'll have data on what worked."

Don't argue. Don't defend. Just run the experiment and let results speak.

The skeptics who won't engage with data aren't your audience. The skeptics asking genuine questions are potential advocates once you have proof points.

When to Expand, When to Hold

After a successful pocket deployment, the temptation is to roll it out company-wide immediately.

Resist this.

Expand when:

You have 3+ successful pocket deployments with documented patterns
The patterns are showing consistency across teams
Other teams are actively asking to join
Leadership is asking "how do we scale this" not "should we do this"

Hold when:

Only one team has tried it
Results are mixed or context-specific
Teams aren't asking to join (they're being voluntold)
You can't articulate clear patterns yet

The expansion should be opt-in, not mandated. Teams that want to try it should go through the same pocket deployment framework: 8 weeks, clear metrics, documentation requirements.

This feels slow. It is slow. That's intentional.

Fast transformation creates backlash. Slow transformation creates sustainable change.

The Large Organization Playbook

"This works for 50-person companies. We're 5,000 people. How does this apply?"

You don't transform 5,000 people. You transform pockets of 3-5 people and let it spread.

Phase 1: Permission structure

Create explicit space for teams to experiment. This requires executive sponsorship not to mandate change, but to protect teams from process mandates while they're experimenting.

"These three teams (12 people total) are running AI-native experiments for Q1. They're exempt from sprint planning and story pointing. They'll document learnings and present results in Q2."

Phase 2: Multiple independent pockets

Run 5-10 pocket deployments simultaneously across different parts of the org. Different products, different risk profiles, different team compositions.

Some will succeed. Some will fail. All will generate data.

Phase 3: Pattern synthesis

After 6 months, you have 5-10 documented experiments. Extract patterns:

What worked across contexts?
What was context-specific?
What failed everywhere?
What needs more testing?

This becomes your internal playbook. Not "the Agile transformation playbook" - your specific learnings from your specific context.

Phase 4: Opt-in expansion

Open it up: any team can adopt these patterns if they commit to the documentation requirements and measurement framework.

Don't mandate. Don't set adoption targets. Just make it available and let results drive adoption.

If it's genuinely better, teams will opt in. If they don't, you have a data problem—either the patterns don't work as broadly as you thought, or you haven't communicated the value clearly.

What Large Orgs Get Wrong

The failure mode for large organizations: treating this as a process rollout instead of a learning program.

Wrong approach:

Hire consultants to define "the AI-native process"
Train 500 people on new ceremonies
Set adoption metrics (80% of teams by Q3)
Mandate compliance

Right approach:

Protect space for teams to experiment
Document what works and what doesn't
Synthesize patterns from actual experience
Make adoption opt-in with clear benefits

The wrong approach optimizes for coverage. The right approach optimizes for learning.

Large orgs that get this right don't have uniform processes. They have documented patterns and teams choosing what fits their context.

The Things That Will Break (And Why That's Good)

Running these experiments will surface problems. That's the point.

Expect:

Calendar planning breaks - When you stop estimating in story points, the annual roadmap process doesn't work the same way. You need new ways to think about capacity and commitments.

Career ladders need updates - When individual feature ownership isn't the metric, how do you evaluate performance? How do you decide promotions?

Cross-team dependencies get messy - When one team is running continuous flow and another is in two-week sprints, integration timing gets complicated.

Reporting gets weird - Executive dashboards built around velocity and burndown charts need new metrics.

These aren't reasons to stop. These are valuable discoveries about where your current processes have hidden dependencies.

Document them. Figure out solutions. Sometimes the solution is "this team needs different boundaries." Sometimes it's "we need to rethink career ladders for the AI era."

Every broken process is an opportunity to design something better.

The Safety Valve That Makes It Work

If a team tries this for 4 weeks and it's not working, they revert. No shame. No questions. Just document what didn't work and move on.

This safety valve creates three critical outcomes:

People will try things they'd otherwise resist (because they can stop)
Feedback stays honest (no pretending it works to save face)
Negative data becomes as valuable as success stories

The teams that revert and document why contribute as much as the teams that succeed.

How to Know If It's Actually Working

Six months in, you should have clear signals:

Good signals:

Multiple teams asking to join without prompting
Documented patterns being referenced by teams you didn't train
Calendar time to ship decreasing across participating teams
Retention increasing in experimental teams
Skeptics asking "how do we try this" instead of "why would we try this"

Bad signals:

Teams complaining but not opting out (suggests they feel pressured)
No documented patterns emerging (suggests learning isn't happening)
Success stories but no specifics (suggests cargo culting)
Teams reverting without documenting why (suggests psychological safety issues)

Neutral signals:

Some teams succeeding, some failing (expected)
Mixed results within same team (normal during learning)
Debates about what to measure (healthy exploration)

The goal isn't universal adoption. The goal is documented learnings and sustainable improvement for teams where it fits.

The Final Question

Every transformation starts with a leader asking: "How do we get there from here?"

The answer that works: protect space for teams to experiment, document what they learn, let success spread organically.

This feels unsatisfying to leaders used to decisive action. It feels slow. It feels uncontrolled.

It's also how actual organizational change happens—not through mandates, but through proof points that make the old way feel obsolete.

Start with one team. Eight weeks. Clear metrics. Honest documentation.

If it works, you'll know. If it doesn't, you'll know why.

Either way, you'll have data instead of beliefs.

And data is how you build conviction for the next step.

Series Conclusion

The organizations that thrive in the AI-native era won't be the ones with the best processes. They'll be the ones that learned fastest from their own experiments.

Start experimenting.

This is Part 5 of a 5-part series on building AI-native engineering organizations.

Blockenberg

Blockenberg

No activity yet

Blockenberg

Blockenberg

No activity yet

No activity yet

No activity yet

Don't Boil the Ocean: A Roadmap for Leaders Who Want to Move Fast

The Pocket Deployment Playbook for Leaders Who Want to Move Fast

Don't Boil the Ocean: A Roadmap for Leaders Who Want to Move Fast

The Pocket Deployment Playbook for Leaders Who Want to Move Fast

Start Where Failure Is Cheap

The Pocket Deployment Framework

What to Measure (And What to Ignore)

The Documentation That Actually Helps

How to Handle the Skeptics

When to Expand, When to Hold

The Large Organization Playbook

What Large Orgs Get Wrong

The Things That Will Break (And Why That's Good)

The Safety Valve That Makes It Work

How to Know If It's Actually Working

The Final Question

Start Where Failure Is Cheap

The Pocket Deployment Framework

What to Measure (And What to Ignore)

The Documentation That Actually Helps

How to Handle the Skeptics

When to Expand, When to Hold

The Large Organization Playbook

What Large Orgs Get Wrong

The Things That Will Break (And Why That's Good)

The Safety Valve That Makes It Work

How to Know If It's Actually Working

The Final Question