The Intuition That Turned Out to Be Wrong

For the past year, the AI industry has been obsessed with agents.

Instead of one chatbot answering questions, developers now build teams of AI agents: one writes code, another researches information, another plans tasks, and another reviews results. The idea resembles a digital company staffed entirely by software workers.

The assumption behind this trend seems obvious: if one AI agent is powerful, a group of them working together should be even better.

But a growing body of research suggests that assumption may be wrong.

A study by researchers at Google Research and Google DeepMind found that multi-agent systems can perform significantly worse than a single AI agent in certain tasks. In experiments spanning 180 configurations across four benchmarks, multi-agent setups reduced performance by 39% to 70% on sequential reasoning and planning tasks compared with single-agent baselines.

The results challenge one of the central ideas behind the current wave of agentic AI — and reveal a problem that builders of complex AI systems are only beginning to confront.

The Agent Boom

AI agents have become one of the most important trends in artificial intelligence.

Major companies including OpenAI, Google DeepMind, and Anthropic are building frameworks to support agent-based systems. The agentic AI market is projected to reach $98.26 billion by 2033.

The appeal is obvious: instead of writing every step of a workflow, developers can deploy a network of AI workers to handle complex operations. One agent researches, another drafts, another reviews, and a coordinator combines the results.

But building these systems turns out to be harder than expected — and in some cases, counterproductive.

When More Agents Made Things Worse

To understand how agent systems scale, Google researchers tested 180 different multi-agent configurations across four benchmarks, evaluating five canonical agent architectures — single-agent and four multi-agent variants (independent, centralized, decentralized, and hybrid) — instantiated across three LLM families.

The findings were counterintuitive.

For tasks that could be easily divided into independent subtasks — like financial analysis where each agent examines a different company — multi-agent systems performed extremely well. In financial reasoning benchmarks, centralized coordination improved results by approximately 80.9% compared to a single agent.

But when tasks required sequential reasoning, where each step depends on the previous one, performance collapsed.

Across these scenarios, every multi-agent architecture tested performed worse than the single-agent baseline — sometimes by as much as 70%.

The culprit was not the individual agents. It was the collaboration itself.

The Coordination Problem

When multiple agents collaborate, they must constantly exchange information. Each agent receives only partial context, meaning it must rely on summaries from other agents to understand the full picture.

That process creates a problem researchers describe as information fragmentation.

As information moves between agents, details are compressed or lost. Small misunderstandings then propagate across the system. One agent may misinterpret another agent’s output, generating a flawed result that the next agent treats as fact.

The effect resembles the childhood game of telephone, where a message becomes progressively distorted as it passes from person to person — except in AI systems, the distortion happens at machine speed and can compound exponentially.

The Google researchers found that the overhead of communication fragmented the reasoning process, leaving insufficient “cognitive budget” for the actual task. In some architectures studied, independent agents amplified errors significantly compared with centralized coordination approaches. The researchers attributed this to the fundamental tension between agent autonomy and system coherence: the more independently agents operate, the greater the risk that their individual errors will compound rather than cancel out.

Parallel Versus Sequential: The Key Distinction

The research highlights a critical distinction that determines whether multi-agent systems help or hurt.

Parallel tasks can be split into independent subtasks that multiple agents handle simultaneously. Examples include analyzing multiple documents, running distributed data queries, or evaluating different scenarios in parallel. Because agents work independently and combine results afterward, collaboration can genuinely improve performance.

Sequential tasks require strict step-by-step reasoning where each step depends on the previous one. Examples include software debugging, strategic planning, long reasoning chains, and multi-step decision processes. These tasks depend on maintaining consistent context across every step — something that becomes dramatically harder when multiple agents are involved.

The takeaway is not that multi-agent systems are useless. It is that the structure of the task determines whether adding agents helps or hurts. Adding agents to a parallel problem can accelerate results. Adding agents to a sequential problem can destroy them.

Advertisement

Why the Industry Believed the Opposite

The assumption that more agents equals better results did not emerge randomly.

For years, the dominant pattern in AI has been that scaling improves performance. Larger models perform better. More training data improves accuracy. More compute produces stronger results. It seemed logical that more agents would follow the same rule.

Earlier research in AI collaboration showed that teams of models could outperform individual models through collective reasoning — a finding that reinforced the instinct to build larger agent networks.

But those experiments often focused on tasks that naturally benefit from parallelization. The new research suggests that task structure matters more than agent count.

The distinction is subtle but important: scaling works when you are adding independent capacity. It fails when you are adding coordination overhead to an inherently sequential process.

Real-World Confirmation

Academic studies are not the only place where multi-agent limitations appear.

In a 2025 experiment at Microsoft Research called Magentic Marketplace, researchers built a simulated online marketplace populated entirely by AI agents acting as buyers and sellers. The experiment involved 100 customer-side agents interacting with 300 business-side agents across tasks like ordering dinner according to user preferences.

The results revealed several weaknesses: agents struggled to coordinate roles when asked to collaborate toward a common goal, performance degraded sharply as customer agents were given more options to choose from, and businesses could use certain techniques to manipulate customer agents into buying their products. The researchers concluded that AI agents still require significant human guidance to operate effectively in complex, open-ended environments.

Similar patterns have appeared in enterprise deployments. According to Boston Consulting Group research on agentic AI, companies deploying multi-agent systems face coordination failures, goal drift, and emergent behavior patterns that are difficult to predict. BCG emphasizes that test environments must surface these multi-agent interaction issues, and a MIT Sloan/BCG executive survey found that 69% of executives agree that agentic AI requires entirely new management approaches.

The Single Agent Renaissance

These findings are triggering a quiet shift in AI architecture.

Modern large language models now have enormous context windows — some exceeding one million tokens — along with improved reasoning capabilities and built-in tool use. Because of these improvements, a single powerful agent may sometimes outperform complex agent teams simply by maintaining consistent context throughout a task.

Some researchers argue that future AI systems may rely more on strong single agents with sophisticated planning abilities, augmented by targeted tool use, rather than large networks of specialized agents.

Former Tesla AI director Andrej Karpathy has argued that current AI agents are cognitively lacking and that working through the issues with agents will take a decade — suggesting that simpler, more capable individual systems may outperform complex multi-agent orchestration in many practical scenarios today.

When Multi-Agent Systems Still Make Sense

Despite these challenges, multi-agent architectures remain valuable in specific contexts.

They work well when tasks can be genuinely parallelized: distributed research across multiple sources, large-scale data analysis, monitoring complex systems with many independent components, and simulations involving many actors.

The key insight is that agent architecture must match the structure of the task. Deploying a multi-agent system on a sequential problem is not just inefficient — it can actively degrade performance.

Researchers are now exploring new strategies to mitigate coordination costs, including centralized manager-based systems where one agent supervises others, improved communication protocols like Agent2Agent (A2A), and better algorithms for automatically determining whether a task should be handled by one agent or many.

The Lesson

The excitement around AI agents has led many developers to assume that building large teams of autonomous agents is the inevitable future of artificial intelligence.

But the research tells a different story. Adding more agents can increase communication overhead, fragment information, amplify errors, and degrade overall performance.

In certain tasks, a single well-designed AI agent may outperform an entire network of collaborating agents.

The lesson is not that agent systems are useless — but that scaling intelligence is more complicated than simply adding more agents. The real engineering challenge is learning when to distribute work and when to keep it centralized.

As AI continues to evolve, the systems that succeed will not be the ones with the most agents. They will be the ones that deploy the right architecture for the right problem.

Advertisement

Decision Radar (Algeria Lens)

Dimension Assessment
Relevance for Algeria High — Algerian enterprises and startups adopting AI agents need to understand that naive multi-agent scaling wastes resources and degrades results
Infrastructure Ready? Partial — Algeria’s compute and cloud infrastructure is limited, making efficient single-agent architectures even more valuable than in well-resourced markets
Skills Available? No — Few Algerian AI engineers have hands-on experience designing multi-agent systems or evaluating when multi-agent vs. single-agent architectures are appropriate
Action Timeline Immediate — Teams currently building AI agent workflows should audit whether multi-agent setups are justified by task structure
Key Stakeholders AI engineering teams, CTOs at Algerian tech companies, university AI research labs, Sonatrach and Sonelgaz digital transformation units
Decision Type Tactical — Use task structure analysis to choose the right agent architecture before investing in complex multi-agent orchestration

Quick Take: Algerian organizations exploring AI agents should default to single-agent architectures for sequential reasoning tasks and only deploy multi-agent systems for genuinely parallelizable workloads. Given Algeria’s limited compute resources, avoiding unnecessary multi-agent overhead can yield better results at lower cost.

Sources & Further Reading