Last April, Shopify CEO Tobi Lutke published a company-wide memo that reverberated through the tech industry. The message was blunt: AI usage is no longer optional. Every team requesting additional headcount must first demonstrate that AI cannot accomplish the task. Performance reviews would now evaluate AI proficiency. Lutke shared the memo publicly himself, getting ahead of the inevitable leak.

En bref : Engineering teams at companies like Shopify, Stripe, and Vercel have rebuilt their development workflows around AI in 2026. The new cycle — spec, AI draft, human review, AI test — is replacing traditional pair programming. The 2025 DORA report confirms what practitioners observe on the ground: AI now correlates positively with delivery throughput, but continues to increase delivery instability. The tooling amplifies whatever is already there — discipline or dysfunction alike.

What made Lutke’s mandate remarkable was not its ambition but its timing. By the time that memo circulated, the shift he was demanding had already happened at dozens of engineering organizations. The question was no longer whether to use AI in software development. It was how to use it without losing the things that make software reliable: code review discipline, architectural coherence, and the institutional knowledge that lives in a team’s collective muscle memory.

The New Development Cycle

The traditional software development loop — write code, submit pull request, wait for review, iterate — has been fundamentally restructured at companies shipping production software with AI. What has emerged is something engineers are calling the “spec-draft-review-test” loop, and it looks nothing like the autocomplete-on-steroids model that defined the first wave of AI coding tools.

Here is how it works in practice. A product manager or tech lead writes a specification, often a natural language description of what needs to happen. An AI agent takes that spec and produces a first draft — not a code snippet, but a complete implementation spanning multiple files, with tests, documentation stubs, and error handling. A human engineer reviews the draft, not line by line in the traditional sense, but architecturally: Does this approach fit the system? Does it respect our conventions? Does it introduce technical debt we will regret?

Then the AI runs the test suite. If tests fail, it fixes the code and runs them again. The loop continues until the implementation passes, at which point a human gives final approval.

At Stripe, this cycle has reached an almost industrial scale. The company’s homegrown AI agents, internally called “Minions,” now merge more than a thousand pull requests every week — a figure that recently climbed past 1,300 according to Stripe’s own communications. These are not trivial formatting changes. The PRs contain no human-written code — engineers define the task via a Slack command, the CLI, or a “Fix with Minion” button in the bug tracker, and the agent handles everything from implementation through CI checks.

The critical nuance: humans still review every pull request. The code is AI-written but human-approved. Stripe built a six-layer infrastructure around their agent — a fork of Block’s open-source tool Goose — with a central MCP server called Toolshed housing more than 400 internal tools and integrations. The agent environments have no internet access and no production access. Isolation is the permission system.

Shopify’s Augmented Engineering

Shopify approaches AI integration differently than Stripe, but the outcome is similar. The company frames its strategy as “Augmented Engineering” — making engineers dramatically more productive by embedding AI into their daily workflows rather than replacing those workflows entirely.

All Shopify employees now have access to a suite of AI tools including GitHub Copilot, Anthropic’s Claude, and Cursor. The company places no limit on AI token spending. Product designers are expected to produce feature prototypes using AI tools. The engineering platform team has built a Dev MCP Server that enables AI agents to scaffold complete Shopify apps, execute GraphQL operations, and generate validated code across the Admin, UI extensions, Liquid templates, and the Hydrogen storefront framework.

The philosophy behind Augmented Engineering is pragmatic rather than utopian. As engineering complexity increased — more integrations, more platform surface area, more merchant customization options — Shopify found that AI could absorb the cognitive overhead. Rather than hiring proportionally more engineers, they used AI to keep production velocity constant even as the system grew more complex. Notably, Shopify is still hiring — including a thousand interns — signaling that AI-first does not mean headcount reduction.

Vercel and the v0 Paradigm

Vercel’s approach represents a third model: AI as the starting point of development rather than an assistant during it. The company’s v0 tool has evolved from a simple prototyping tool into what CEO Guillermo Rauch describes as a production-grade development environment supporting full Git workflows — what the industry now calls “vibe coding.”

What makes v0 different from a chatbot that generates React components is the infrastructure beneath it. A sandbox-based runtime can import any GitHub repository and automatically pull environment variables and configurations from Vercel. Every prompt generates code that runs in a real environment, not a playground. A Git panel lets developers create a branch for each conversation, open pull requests against the main branch, and deploy on merge.

Vercel launched AI SDK 6 with a native agent abstraction and a tool approval system that lets developers gate any action requiring human review. Define a tool with needsApproval: true, and the agent pauses until someone confirms. It is the engineering equivalent of a two-person rule for launching missiles, applied to database migrations and API deployments.

Advertisement

What the Data Actually Shows

The 2025 DORA report, Google’s annual assessment of software delivery performance, paints a nuanced picture of AI’s impact on engineering teams. The headline number: 90 percent of software development professionals now use AI tools at work, up 14 percentage points from the prior year. The median developer spends about two hours per day with AI tools — roughly a quarter of their workday.

Individual productivity gains are real. Analysis of the DORA data shows developers complete 21 percent more tasks and merge 98 percent more pull requests when using AI assistants. But here is the paradox that keeps engineering leaders up at night: organizational delivery metrics have not improved proportionally. While the 2025 report found that AI adoption now correlates positively with delivery throughput — a reversal from the prior year’s negative finding — it continues to correlate negatively with delivery stability. More code ships faster, but instability increases alongside it. Code review time grows 91 percent as PR volume overwhelms reviewers, pull request size swells 154 percent, and bug rates climb 9 percent as quality gates struggle to keep pace.

The DORA researchers describe this as the amplification effect. AI does not fix a broken team. It amplifies whatever is already there. Strong teams with loosely coupled architectures, fast feedback loops, and robust automated testing use AI to become faster and more reliable. Struggling teams — those with tightly coupled systems, slow CI pipelines, and poor review practices — find that AI only magnifies their dysfunction.

The teams seeing genuine productivity gains share three characteristics. First, they have strong automated test suites that catch regressions before AI-generated code reaches production. Second, they work in loosely coupled architectures where changes in one service do not cascade into others. Third, they have invested in internal developer platforms that provide standardized tooling, reducing the surface area where AI can introduce inconsistency.

Measuring What Matters

Traditional DORA metrics — deployment frequency, lead time for changes, change failure rate, and time to restore service — remain important but increasingly insufficient. The 2025 report expands the framework to include team performance, product performance, valuable work (are we building the right things), friction (where are developers stuck), burnout, and individual effectiveness.

The most sophisticated teams have stopped measuring “lines of code generated by AI” and started measuring “time from idea to validated feature.” That metric captures the full cycle: specification, implementation, review, testing, and deployment. It turns out that AI compresses the implementation phase dramatically but does almost nothing for the specification and review phases, which remain stubbornly human-speed.

Some teams report that AI has shifted the bottleneck upstream. When implementation takes hours instead of days, the constraint becomes how fast product managers can write clear specifications and how fast senior engineers can review architectural decisions. The slowest part of the pipeline is now the part that requires human judgment about what to build and whether it was built correctly.

The Human-in-the-Loop Reality

Despite the marketing language around “autonomous” and “agentic” AI coding, every production engineering team maintaining critical systems enforces a strict human-in-the-loop policy. The loop has moved, though. Engineers are no longer writing code and having AI check it. AI is writing code and humans are checking it. The skill set required has shifted from code production to code evaluation.

This distinction matters enormously. An engineer reviewing AI-generated code needs a different kind of expertise than one writing it from scratch. They need to understand system architecture deeply enough to spot when an AI has produced a technically correct but architecturally inappropriate solution. They need to recognize when a test suite passes but does not actually test the things that matter. They need to catch the subtle bugs that arise when an AI stitches together patterns from its training data that do not quite fit the specific context of this codebase.

Several engineering leaders have noted that junior engineers face a particular challenge in this new workflow. Without the experience of writing code from scratch, how do they develop the judgment needed to review it? Stripe addresses this by requiring junior engineers to build traditional coding foundations before relying on AI agents. Others have implemented “AI-free days” where teams code without assistance to maintain their foundational skills.

Where This Goes Next

The convergence is toward what might be called “software manufacturing” — a process where human engineers function more like product designers and quality inspectors than assembly line workers. The AI handles fabrication. The humans handle design, inspection, and the judgment calls that determine whether something should be built at all.

This is not a distant future scenario. It is happening now at companies processing millions of transactions daily. The teams succeeding with this model share one trait: they invested in engineering fundamentals — testing, architecture, review culture — long before they adopted AI. The tools amplify discipline. They also amplify its absence.

Advertisement

Decision Radar (Algeria Lens)

Dimension Assessment
Relevance for Algeria High — Algerian software teams can adopt the same AI-augmented workflows regardless of location; the tooling is globally accessible
Infrastructure Ready? Partial — reliable internet and cloud access exist in major cities, but latency to AI API providers and limited high-bandwidth connectivity outside urban centers can be friction points
Skills Available? Partial — Algeria has a growing developer community, but AI-first workflow adoption requires senior engineers experienced in code review, automated testing, and architecture design
Action Timeline Immediate — teams can begin integrating AI coding tools today; no infrastructure buildout required
Key Stakeholders Engineering managers, CTOs, developer training programs, university CS departments
Decision Type Strategic

Quick Take: Algerian engineering teams should start integrating AI into development workflows now, but must invest equally in automated testing, code review discipline, and architecture quality. Without these foundations, AI tools will amplify existing weaknesses rather than create productivity gains. The 2025 DORA data is clear: the tooling accelerates throughput but destabilizes delivery unless engineering practices are already strong.

Sources & Further Reading