Why Experienced Devs Get 19% Slower With AI: The J-Curve of Adoption

Introduction

Here is a result that should make every engineering leader pause. A rigorous study by the METR organization took experienced open-source developers, gave them real tasks on their own projects — codebases they knew intimately — and randomly assigned them to complete those tasks with or without AI tools. The developers using AI were 19% slower. Not faster. Slower.

The kicker: those same developers predicted they would be 24% faster with AI. They believed they were more productive while measurably being less productive. The gap between perception and reality — 43 percentage points — is not a rounding error. It is a systemic misunderstanding of how AI tools interact with established workflows.

This finding does not mean AI coding tools are useless. It means something more nuanced and more important: bolting AI onto an existing workflow makes things worse before it makes them better. And most organizations are stuck in the “worse” phase without realizing it.

What the METR Study Actually Measured

The study’s design is worth understanding because it addresses the weaknesses of most AI productivity research. Prior studies often used self-selected participants, artificial tasks, or self-reported productivity gains — all of which introduce bias toward positive results.

METR studied 16 experienced open-source developers. These were not students or novices. They were maintainers and significant contributors to established projects, working on tasks drawn from their own repositories. Each developer was randomly assigned to complete some tasks with AI tools and others without. The randomization is critical — it controls for task difficulty, developer skill, and project complexity.

The result: AI-assisted tasks took 19% longer on average. The effect was statistically significant and persisted across different task types and developer experience levels.

The researchers investigated why. Several factors compounded to create the slowdown:

Time formulating prompts. Translating a mental model of what the code should do into a prompt that produces useful output is not free. Developers spent meaningful time crafting, revising, and clarifying their requests. For developers who already knew exactly what to type, the prompt formulation was pure overhead.

Waiting for generation. AI code generation is not instantaneous. The wait introduces idle time that did not exist when the developer was simply typing. For short tasks, the generation time sometimes exceeded the time it would have taken to write the code manually.

Correcting “almost right” code. This is the most insidious cost. AI-generated code is frequently close to correct but subtly wrong — a variable name that does not match the project’s conventions, an edge case that is not handled, a library function that is called with slightly incorrect parameters. Identifying and fixing these near-misses takes concentration and time, and the cognitive load is often higher than writing the code from scratch would have been.

Context switching. Working with AI introduces a continuous oscillation between the developer’s mental model and the model’s output. The developer thinks about the problem, then shifts to evaluating the AI’s interpretation, then shifts back to reconsidering their own approach. This switching has a measurable cognitive cost.

Debugging subtle errors. Generated code that looks correct but contains hidden logic errors is harder to debug than code the developer wrote themselves, because the developer does not have the “mental trace” of having constructed the logic step by step. They are debugging someone else’s thinking — except the “someone else” is a probabilistic model that does not think in ways humans can easily trace.

The 2025 Stack Overflow Developer Survey provides corroborating context: only 29% of developers trust AI-generated code to be accurate, down from 40% in 2024. And 66% cite the same frustration: AI solutions that are “almost right, but not quite.” The survey also found that while 84% of developers are using or planning to use AI tools, positive attitudes have dropped from over 70% to roughly 60%. The industry is adopting the tools while becoming more skeptical of them.

The J-Curve of Adoption

The METR results fit a pattern that adoption researchers call the J-curve. When a transformative technology is introduced into an existing system, productivity does not increase linearly. It dips first. The dip occurs because the new tool changes the workflow, but the workflow has not been redesigned around the tool. The organization is running a new engine on an old transmission. The gears grind.

The J-curve has been documented across technological transitions. When spreadsheets replaced manual ledgers, early adopters spent more time learning the tool than they saved using it. When email replaced memos, early corporate adoption actually slowed communication because people wrote emails as if they were formal letters and checked them as infrequently as physical mail. The technology was transformative, but the transformation took time because the surrounding workflows had to catch up.

AI coding tools are in the early downslope of the J-curve for most organizations. The tools have been adopted. The workflows have not adapted. And the productivity dip is real, measurable, and being misinterpreted.

Many organizations see the dip and conclude that the tools are overhyped. They pull back, reduce investment, and declare that AI coding “does not work for us.” This is the wrong conclusion. The dip is not evidence that the tool fails. It is evidence that the workflow surrounding the tool has not been redesigned.

Other organizations see the dip and try to power through it by mandating more AI tool usage, adding more tools, or measuring developers on AI adoption metrics. This is also counterproductive. Forcing higher adoption of a tool that has not been integrated into a redesigned workflow just makes the dip deeper.

The “New Engine, Old Transmission” Problem

The core issue is architectural. Most software organizations are structured around a specific model of work: humans write code, other humans review it, teams coordinate through standups and sprint ceremonies, quality is ensured through code review and manual testing. Every process, role, and tool assumes that a human being is writing the code.

When AI coding tools are introduced into this structure, they are treated as a faster way for humans to write code. The developer uses Copilot or Cursor or Claude Code to generate a first draft, then reviews and refines it exactly as they would review another human’s pull request. The code review process does not change. The sprint ceremonies do not change. The team structure does not change. The evaluation criteria do not change.

In this configuration, AI is not a transformative capability. It is a fancy autocomplete. It saves time on the mechanical act of typing but adds time in prompt formulation, output evaluation, and error correction. For experienced developers who type fast and think clearly, the net effect is neutral or negative — exactly what the METR study found.

The organizations that have achieved real productivity gains from AI coding tools are the ones that redesigned their workflows around the tool’s capabilities. They moved from human-writes-code-AI-assists to AI-writes-code-human-evaluates. They changed their review processes from line-by-line code review to outcome-based evaluation. They restructured their teams from large implementation-focused groups to small specification-focused groups.

Ben Shapiro’s five levels of AI coding maturity provide a useful framework here. Most organizations are at Level 2 (iteration — AI generates drafts, humans refine them) with Level 1 workflows (designed for human-written code). The mismatch between tool capability and workflow design is where the J-curve lives. Organizations that redesign their workflows to match Level 2 or Level 3 tool usage see productivity gains. Organizations that keep Level 1 workflows while using Level 2 tools see the 19% slowdown.

Why Perception Diverges from Reality

The METR study’s most troubling finding may be the perception gap. Developers believed they were 24% faster while actually being 19% slower. That is not mild overconfidence. It is a systematic misperception.

Several factors explain this. AI tools create a feeling of productivity that does not correspond to actual output. Seeing code appear on screen faster feels productive, even if the total time to working code increases. The dopamine hit of watching an AI generate 50 lines of code in seconds is real, even if the next 15 minutes are spent finding and fixing the three subtle bugs in those 50 lines.

There is also a selection bias in how developers evaluate their AI experience. They remember the tasks where AI saved them significant time — generating boilerplate, scaffolding unfamiliar code, writing documentation. They underweight the tasks where AI cost them time — debugging subtle generated errors, reformulating failing prompts, reverting generated code that broke something upstream. The positive experiences are vivid and memorable. The negative experiences feel like normal debugging friction and are not attributed to the AI tool.

This perception gap is dangerous at the organizational level. If developers believe they are more productive with AI tools, they will advocate for more adoption. Managers who rely on developer self-reports will believe adoption is working. The organization continues investing in a tool that is actually reducing productivity, because nobody is measuring the right thing. The 46% of developers who say they do not fully trust AI-generated code are actually the clear-eyed ones — they have correctly identified the quality gap even if they cannot precisely quantify the productivity impact.

How to Climb the J-Curve

Acknowledging the J-curve is the first step. The second step is deliberately redesigning workflows to move past the dip. Several concrete approaches accelerate the transition:

Match the workflow to the tool level. If your team is using AI at Level 2 (iteration), redesign your code review process accordingly. Instead of reviewing every AI-generated line as if a human wrote it, evaluate the output against the requirements. Test behavior, not implementation. This alone can eliminate the overhead of treating AI output like a junior developer’s pull request.

Invest in specification quality. The biggest productivity gains come when the specification given to the AI is precise enough to produce correct output on the first iteration. Vague prompts produce vague code that requires extensive correction. Precise specifications produce correct code that requires minimal review. Training developers to write better specifications has a higher ROI than training them to use AI tools more frequently.

Measure outcomes, not activity. The organizations stuck in the J-curve are measuring AI adoption (how much are developers using the tool?) rather than AI impact (is the team shipping more working software?). The first metric encourages performative tool usage. The second reveals whether the tool is actually improving output.

Accept the dip. For organizations early in their AI adoption journey, the productivity dip is normal and expected. Rushing to eliminate it by forcing more tool usage makes it worse. The dip resolves when workflows are redesigned, not when tool usage increases. Give teams time and permission to experiment with workflow changes rather than mandating adoption targets.

Start with the right tasks. AI tools deliver the most value on well-specified, boilerplate-heavy tasks with clear acceptance criteria. They deliver the least value on complex, context-heavy tasks in familiar codebases — exactly the scenario the METR study measured. Starting AI adoption with the tasks where the tool excels builds confidence and habit before applying it to tasks where the J-curve is steepest.

🧭 Decision Radar

Dimension	Assessment
Relevance for Algeria	High — Algerian developers adopting AI coding tools will hit the same J-curve; awareness of this phenomenon is critical to avoiding wasted investment
Infrastructure Ready?	Partial — AI coding tools are accessible, but workflow redesign support and organizational change management are not
Skills Available?	Partial — developers have access to AI tools but lack training in specification-driven workflows and outcome-based evaluation
Action Timeline	Immediate
Key Stakeholders	Engineering team leads, CTOs, development managers, individual developers, tech training providers
Decision Type	Tactical

Quick Take: Algerian development teams adopting AI tools should expect a productivity dip before gains materialize. The fix is not more tool usage — it is workflow redesign. Teams that invest in specification quality and outcome-based evaluation will climb the J-curve faster than teams that simply mandate AI adoption.

Sources & Further Reading

METR Study: Measuring the Impact of Early 2025 AI Models on Experienced Open-Source Developer Productivity — Randomized controlled study of 16 experienced developers showing 19% slowdown with AI tools, against a self-predicted 24% speedup.
Stack Overflow 2025 Developer Survey — AI Section — Industry survey showing 84% AI tool adoption, declining trust (29% accuracy trust), and 66% frustration with “almost right” output.
StrongDM Five Levels of AI Coding Framework — Ben Shapiro’s maturity model for AI coding adoption, from autocomplete to dark factory.
GitHub Copilot Productivity Study (2023) — GitHub-commissioned study showing 55.8% faster task completion with Copilot, though limited to a narrow JavaScript task with Upwork-recruited participants.
McKinsey: Unleashing Developer Productivity with Generative AI — McKinsey analysis showing 35-45% code generation speedup but less than 10% improvement on high-complexity tasks.
Everett Rogers, Diffusion of Innovations (1962) — Foundational theory on technology adoption curves, including the productivity dip that occurs during workflow transition periods.

Introduction

What the METR Study Actually Measured

The J-Curve of Adoption

The “New Engine, Old Transmission” Problem

Why Perception Diverges from Reality

How to Climb the J-Curve

🧭 Decision Radar

Sources & Further Reading

Leave a Comment Cancel reply

Most recent

32

The Web Is Forking: How AI Agents Are Creating a Parallel Internet

34

The Token Is the New Unit of Work: How 3 Engineers Outproduce 10

34

The Dark Factory: Software Where No Human Writes or Reviews Code

36

Three Developer Tracks for 2026: Orchestrator, Architect, Domain Translator

34

Six Types of Hard Problems: A Framework for What AI Can and Can’t Automate