The Dark Factory: Software Where No Human Writes or Reviews Code

Introduction

Imagine walking into a software company and being told two things: code must not be written by humans, and code must not be reviewed by humans. Not “code can be assisted by AI” or “developers should leverage automation.” The rules are absolute. No human hands touch the code. No human eyes scan it. This is not a thought experiment. It is the operating reality at StrongDM, where a team of three engineers runs what they call a “software factory” — a system that transforms markdown specification files into production software with no human implementation or review anywhere in the pipeline.

The concept borrows from manufacturing. A “dark factory” is a production facility that operates with the lights off because no humans are present on the floor. Robots build the products. Sensors handle quality control. Humans design the products and set the standards, but the factory floor itself is fully autonomous. StrongDM has applied this model to software. And the results are not theoretical — their AI context store, CXDB, has been running in production for months, built entirely by the factory.

What makes this interesting is not just that it works. It is that it reveals a maturity framework for AI-assisted development that exposes where almost every other software organization currently stands — and how far they have to go.

The Five Levels of AI Coding

Ben Shapiro, StrongDM’s engineering lead, describes five distinct levels of AI coding maturity. The framework is useful because it gives organizations honest language for assessing where they actually are, rather than where their marketing materials claim they are.

Level 1: Autocomplete. This is GitHub Copilot in its original form. Tab, tab, tab. The AI suggests the next line. The developer accepts or rejects. The workflow is fundamentally unchanged — the developer is still writing code, just writing it marginally faster. Most of the industry has adopted this level. It is also where most of the industry has stalled.

Level 2: Iteration. The developer uses AI as a conversational partner. They describe what they want. The AI generates a first draft. They refine it through multiple rounds. Tools like Cursor, Windsurf, and Claude Code operate at this level. The AI writes meaningful chunks of functionality, but the human remains the primary architect. Every line of code is still reviewed by a person. The majority of developers actively using AI tools today operate here.

Level 3: Delegation. This is where the dynamic shifts. The developer gives the AI an entire feature or module to build, then evaluates the output as a whole. They are not reading every line. They are running the code, testing it, checking whether it accomplishes the objective. The human’s role transitions from writer to evaluator. Fewer organizations operate here than claim to.

Level 4: Orchestration. The developer manages multiple AI agents working in parallel, each handling different components of the system. The developer writes specifications, not code. They design evaluation criteria, not test suites. The code itself is a black box. If it passes the evaluation, its internal implementation is irrelevant. This level requires deep trust — both in the AI system and in the developer’s own ability to write specifications precise enough to produce reliable output. That specification skill is something almost nobody has developed well yet.

Level 5: The Dark Factory. No human writes code. No human reviews code. Specifications go in. Working software comes out. The factory runs autonomously. This is StrongDM’s operating model, and it represents the far end of a spectrum that most organizations will not reach for years.

The framework matters because it provides a realistic map of the terrain. Most organizations are between Level 1 and Level 3. Most believe they are further along than they actually are. And the gaps between levels are not incremental — they are architectural.

Scenarios vs. Tests: The Most Important Distinction Nobody Talks About

The dark factory’s quality assurance system does not use traditional tests. It uses what StrongDM calls “scenarios.” The distinction sounds semantic. It is fundamental.

A scenario describes expected behavior from an external perspective. It reads like an acceptance criterion written as a story: “If a user creates a resource and an admin approves it, then the user should be able to access that resource.” A scenario captures what the software should do from the standpoint of someone who uses it.

A test is code that directly exercises internal functions. Assert that function X returns Y when given input Z. A test validates the implementation’s internal mechanics.

The difference becomes critical when AI writes both the code and the tests — which it will, if allowed. When an AI system generates the code and then generates tests for that code, a dangerous feedback loop emerges. The AI optimizes the code to pass its own tests. The tests are written to validate the AI’s own implementation assumptions. The result is code that passes 100% of its test suite while failing to do what anyone actually needed it to do.

This is the AI equivalent of teaching to the test. A human developer writing a test suite brings independent understanding of what the software should accomplish. Their tests reflect human comprehension of the problem, not just the code’s internal logic. When AI handles both sides — implementation and verification — that independent check vanishes unless the architecture deliberately prevents it.

StrongDM’s scenario architecture solves this by placing the validation criteria outside the implementation system entirely. The scenarios are written by humans in plain language. They describe externally observable behavior. The AI cannot optimize for them in the way it can optimize for its own programmatic tests, because the scenarios test outcomes, not internals. This is a subtle but critical architectural decision, and it is one of the primary reasons the dark factory produces reliable software rather than software that merely appears reliable.

The Digital Twin Universe

The second pillar of the dark factory is what StrongDM calls their “digital twin universe” — behavioral clones of every external service the software interacts with. A simulated Okta. A simulated Jira. A simulated Slack, Google Docs, Google Drive, Google Sheets. The AI agents develop against these digital twins, running full integration testing scenarios without ever touching real production systems, real APIs, or real data.

This is not a standard staging environment. It is a purpose-built simulated world designed specifically for autonomous software development. The digital twins do not just mock API responses — they simulate realistic behavior patterns, error conditions, rate limits, and edge cases that the production services exhibit. The AI agents work within this simulated universe as if they were working against real systems, and the scenarios validate behavior across those simulated integrations.

Building this infrastructure is expensive and time-consuming. It requires deep understanding of how every external service behaves, not just its happy path but its failure modes and quirks. For StrongDM, the investment was justified because the digital twin universe enables something that would otherwise be impossible: autonomous development against complex integrations with zero risk to production systems. But for most organizations, replicating this approach would require significant upfront engineering investment in simulation infrastructure that does not currently exist.

The Self-Referential Loop

The dark factory is not unique to StrongDM. The most striking examples of autonomous software development are happening at the AI companies themselves.

Anthropic has disclosed that approximately 90% of Claude Code’s codebase was written by Claude Code itself. The agent wrote the agent. OpenAI’s Codex product is shipping features that were entirely built by Codex agents — the system is building itself. These are not marketing claims. They are operational realities that have been confirmed by multiple engineers at both companies.

This creates a compounding intelligence loop. As the models improve, the code they write improves. Better code produces better tools. Better tools produce even better code. Each generation of the system creates a superior version of itself. And because these companies sell the very products they build with those products, every improvement simultaneously enhances their internal development capacity and their commercial offering.

This is one of the primary reasons AI companies are so far ahead of everyone else in adopting autonomous development workflows. They are using their product to build their product. The feedback loop is direct, measurable, and accelerating. For everyone else, the adoption curve is slower because the feedback loop is indirect — the AI tools they use are built by someone else, and the improvements they generate benefit their own products rather than the tools themselves.

The Brownfield Problem

For most companies, the dark factory is not a starting point. It cannot be. Because most software does not live in a greenfield. It lives in a brownfield of legacy code, technical debt, undocumented assumptions, and institutional knowledge that exists only in people’s heads.

StrongDM had an unusual advantage: their AI-native product, CXDB, was purpose-built for the dark factory workflow from the ground up. There was no legacy codebase to contend with, no decade of accumulated technical debt, no implicit assumptions baked into code that predates the AI era.

Most companies face a different reality. Millions of lines of existing code with no specifications, no scenarios, and no digital twin infrastructure. Reverse-engineering the existing behavior — understanding what the system actually does versus what outdated documentation claims it does — is the first and most difficult step. Building scenario suites that capture real behavior, constructing simulation environments for external integrations, and gradually transitioning from human-written to specification-driven development is a multi-year project.

This is why most of the industry will remain between Level 2 and Level 3 for the foreseeable future. Not because the AI models lack capability. Not because the tools are unavailable. But because the organizational and architectural transformation required to move from human-writes-code to machine-writes-code is massive. It requires new quality assurance approaches (scenario-based rather than test-based), new development infrastructure (digital twins and simulation environments), new team structures (smaller, specification-focused rather than implementation-focused), and new skills (specification writing, evaluation design, systems thinking).

Why Spec Writing Is Harder Than Code Writing

Perhaps the most counterintuitive insight from the dark factory model is that writing specifications is harder than writing code. This runs against the instinct of most developers, who view code as the hard part and requirements as the easy part.

When a developer writes code, they receive immediate feedback. They run it. It works or it does not. They see the error. They fix it. The feedback loop is tight. Ambiguity in their thinking is exposed instantly because the compiler or runtime does not tolerate it.

When a developer writes a specification for an autonomous system, every ambiguity becomes a potential bug. Every omission becomes a missing feature. Every unstated assumption becomes an incorrect implementation. The specification must be precise enough that an AI system — which has no human judgment about “what the developer probably meant” — can implement it reliably. There is no room for implied context, unwritten conventions, or “you know what I mean.”

StrongDM’s specification files are detailed, structured markdown documents that describe exactly what the software should do, how it should behave in edge cases, and what constraints it should obey. Writing these documents requires a depth of systems thinking that most software organizations have not cultivated, because they have never needed to. When humans write code, they carry context in their heads. When AI writes code from a spec, that context must be externalized completely.

This is the real bottleneck of the dark factory. Not AI capability. Not infrastructure. Specification quality. And it is a skill that the industry has not yet developed at scale.

The Organizational Implications

StrongDM’s dark factory team is three people. Three. No engineering manager — there is nothing to manage. No scrum master — there are no sprints to coordinate. No tech lead doing code reviews — no human reviews code. No QA team — the scenario architecture handles quality assurance autonomously.

The entire coordination layer that constitutes the operating system of a modern software organization — standups, sprint planning, retrospectives, cross-team dependencies, merge conflicts, design reviews — does not exist. Not because it was eliminated as a cost-saving measure, but because it no longer serves a purpose.

This is a preview of a structural shift that extends beyond any single company. The 500-person engineering organization of 2023 may become the 50-person engineering organization of 2027 — not because the software is simpler, but because the coordination overhead and the implementation labor have both been automated. The people who remain will be more valuable, better compensated, and working at a higher level of abstraction. But there will be fewer of them.

The engineering manager’s value shifts from “coordinate the team building the feature” to “define the specification clearly enough that agents build the feature.” The release manager’s value shifts from “coordinate deployment across teams” to “design the evaluation criteria that determine whether agent output ships.” The scrum master’s role becomes difficult to justify when there are no sprints. The layers of coordination that exist to manage hundreds of engineers building a product can be deleted when agents do the engineering.

🧭 Decision Radar

Dimension	Assessment
Relevance for Algeria	Medium — most Algerian software teams are at Level 1-2; understanding the trajectory matters for strategic planning
Infrastructure Ready?	No — the digital twin and scenario infrastructure required does not exist in Algeria
Skills Available?	No — specification-driven development is not practiced or taught
Action Timeline	12-24 months
Key Stakeholders	Software team leads, CTOs, university CS departments, startup founders
Decision Type	Educational

Quick Take: The dark factory is where software development is heading. Algerian teams should start climbing the levels now — even reaching Level 3 (delegation) would represent a major productivity leap over current practices.

Sources & Further Reading

StrongDM Software Factory and Five Levels of AI Coding — Ben Shapiro Framework — StrongDM engineering blog detailing the dark factory architecture and specification-driven development methodology.
Anthropic Claude Code Best Practices — 90% AI-Written Codebase — Anthropic’s engineering team confirming that Claude Code’s codebase is approximately 90% written by Claude Code itself.
METR Study: AI Tools and Developer Productivity — Rigorous randomized study of experienced open-source developers showing 19% slowdown with AI tools on familiar codebases.
OpenAI Codex Documentation — OpenAI’s Codex platform for autonomous software development, including features built entirely by Codex agents.
Brooks, F. P. (1975). The Mythical Man-Month — Classic software engineering text on coordination overhead and why adding people to late projects makes them later.
Stack Overflow 2025 Developer Survey — AI Tools Adoption — Industry survey showing 84% AI tool adoption but declining trust, with only 29% of developers trusting AI output accuracy.

Introduction

The Five Levels of AI Coding

Scenarios vs. Tests: The Most Important Distinction Nobody Talks About

The Digital Twin Universe

The Self-Referential Loop

The Brownfield Problem

Why Spec Writing Is Harder Than Code Writing

The Organizational Implications

🧭 Decision Radar

Sources & Further Reading

Leave a Comment Cancel reply

Most recent

32

The Web Is Forking: How AI Agents Are Creating a Parallel Internet

34

The Token Is the New Unit of Work: How 3 Engineers Outproduce 10

34

The Dark Factory: Software Where No Human Writes or Reviews Code

36

Three Developer Tracks for 2026: Orchestrator, Architect, Domain Translator

34

Six Types of Hard Problems: A Framework for What AI Can and Can’t Automate