We Can't Have It Both Ways: AI Agents and Context Switching
The engineering industry has spent years building a near-sacred consensus: context switching is the number one productivity killer for software developers. Protect your flow state. Guard your focus time. A single Slack message costs 23 minutes of productive work. We have the research, the blog posts, the conference talks, and the Cal Newport citations to prove it. We have fought hard for no-meeting Wednesdays, async-first cultures, and 90-minute deep work blocks. We know, with genuine conviction, that fracturing an engineer's attention is one of the most expensive things a team can do.
And then, in the same breath, the same people will tell you to run multiple agents simultaneously. Spin up one for the new platform feature, another to clean up dead code in a different corner of the repo, maybe a third to draft some documentation. Ship faster. Parallelize everything. This is the future of engineering productivity.
I sit here and think: how is that not context switching?
The thing we keep saying we believe
Context switching is the main productivity killer for developers. techworld-with-milan That is the claim. Not one of several productivity killers. The main one. And the research behind it is solid. Developers need an average of 23 minutes to rebuild their focus after an interruption. techworld-with-milan The mental model engineers construct while solving a complex problem, the one that holds the architecture, the edge cases, the threading of a decision through a system, collapses the moment attention shifts. It does not pause. It does not wait. It shatters, and you rebuild it from scratch.
I have seen this play out in two different organizations. Engineers who are perfectly capable, motivated people who produce noticeably worse work, catch fewer bugs, and burn out faster when their days are fragmented. When we implemented protected focus time at one company, the difference was not subtle. Velocity improved. Bug rates dropped. People started saying they actually enjoyed their work again. The effect was real and measurable.
The core insight behind all of this is that engineering is not a production line. You cannot optimize it the same way you optimize a factory floor by adding parallel workers to parallel tasks. The work lives in someone's head. The thinking is the product. Fragment the thinking and you fragment the output, regardless of how much activity is happening on the surface.
We understood this. We agreed on it. We built workflows around it.
What the "run all the agents" crowd is actually describing
Now here is what I keep seeing described as the new model of AI-augmented productivity: an engineer with three or four agent sessions running at once. One agent is scaffolding a new feature. Another is refactoring a service nobody has touched in two years. A third is doing something in a completely different part of the repo. The engineer is orchestrating all of it, jumping between windows, reviewing diffs, course-correcting, feeding context back in.
This gets written up as a productivity multiplier. You are shipping four things simultaneously instead of one. The throughput is real, in a narrow sense. More code is being produced in parallel. If you measure productivity by lines of code or tickets closed, the numbers look good.
But look at what the engineer is actually doing. They are context switching. They are moving between distinct problem spaces, each with its own mental model, its own set of decisions in flight, its own risk surface. The agent does not carry that cognitive load for them. The engineer still has to hold it. They have to understand what the refactoring agent did well enough to review the PR intelligently. They have to track where the feature agent went so they can redirect it when it wanders. They have to switch from one context to the next and back again, all day, because that is what orchestrating multiple agents requires.
The only thing that changed is what generates the work. The engineer is no longer the one typing the code. The cognitive demand of managing, reviewing, and course-correcting multiple streams of work has not gone away. In some ways it has intensified, because the output volume has increased while the review burden on the human has increased proportionally.
We spent years arguing that meetings fracture focus and kill productivity. Now we are celebrating workflows that require an engineer to jump between {X} different problem spaces simultaneously and calling it a breakthrough. I am not sure we have actually thought this through.
Where the output actually gets created
The part of engineering that AI agents can accelerate is code generation. The part they cannot replace is understanding. Every PR produced by an agent still needs to be reviewed by someone who actually comprehends the change, understands its implications, and can evaluate whether it is correct (at least for now). That person is you. And that review requires the same kind of deep mental engagement that writing the code would have required.
The review problem is where the context switching cost lands hardest. When you have written a piece of code yourself, you have a continuous mental thread from the problem through the decision to the implementation. The review is partially a formality because you already understand the work. When an agent writes it, the review is where the understanding has to be built from scratch. You are loading a context you were not present for. That takes time and focused attention.
Now multiply that by, say four agents. You are loading four separate contexts every time you switch. The code might be arriving faster. The understanding is not. You can shortcut the review to keep up with the pace, but then you are not actually reviewing. You are rubber-stamping. And that is where the quality starts to slip in exactly the way the research on context switching predicts.
This is not theoretical. The research shows that frequent breaks and interruptions led to more bugs because developers struggled to regain their cognitive context, and interrupted coding sessions correlate with lower code maintainability, leading to longer review cycles and rework. techworld-with-milan The mechanism is the same whether the interruption comes from a Slack ping, a meeting, or a second agent window demanding your attention.
The honest complication
I want to be careful not to argue that multi-agent workflows have no value. They do. There are genuinely separable tasks where the agent can operate with low oversight, where the review is contained, and where parallelizing actually makes sense. Running an agent to clean up linting issues or update deprecated dependencies in a quiet corner of the codebase while you do focused work on something else is not the same as splitting your attention between four high-stakes feature branches. The cognitive load is not symmetric.
The more honest version of this position is not "multi-agent workflows are bad" but "multi-agent workflows carry a context switching cost that nobody is accounting for, and we are currently measuring the wrong thing." We are counting tickets closed and features shipped and treating those as the productivity number. We are not counting the bugs that slip through hurried reviews, the technical debt that accumulates when engineers do not have time to actually understand the code they are merging, or the burnout that builds when someone spends eight hours a day task-switching at high speed.
There is also a skill development dimension worth naming. The reason strong engineers can use AI well is precisely because they understand what they are reviewing. If your entire model of productivity involves spinning up agents and approving their output at pace, you may be skipping the part where engineers build the depth of understanding that makes the review meaningful in the first place. You are not removing the cognitive work. You are deferring it, and possibly to someone who does not have the foundation to do it well.
What we should actually be asking
If context switching is genuinely the main productivity killer, then any workflow that increases the number of context switches an engineer has to manage in a day deserves scrutiny. That scrutiny should not stop just because the workflow involves AI. It should intensify, because the pace at which AI can generate work is faster than anything we have had to manage before.
The useful question is not "how many agents can I run at once" but "how many agents can I run while still doing the review work that keeps the output trustworthy." For some people on some tasks, the answer might be two. For others it might be one. The throughput ceiling is not the agent. It is the human doing the reviewing, and that human still has a working memory that can only hold so much at once.
We built the case for protected focus time on the premise that quality engineering requires sustained attention. That case has not changed. What changed is the source of the work product. If we are serious about what we claim to believe about flow state and cognitive load, we should be applying those same standards to how we think about AI-augmented workflows, not setting them aside because the output numbers look impressive.
You cannot argue that a 30-minute meeting is a productivity catastrophe and then praise a workflow that puts four demanding context switches in yours or someone else’s afternoon. At some point the framework has to apply consistently, or we have to admit it was never really about cognitive science. It was about meetings.