What Microsoft Actually Built: Fairwater in Plain Terms
In November 2025, Microsoft announced the operational launch of Fairwater Atlanta, the second campus in what the company describes as its first “AI superfactory.” The name Fairwater applies to both the Wisconsin original and the Atlanta addition — two physically distinct data center campuses that Microsoft has designed to function as a single unified AI training supercomputer.
The fundamental innovation is not in the individual campuses. Both house leading-edge hardware: NVIDIA Blackwell GPUs in GB200 NVL72 rack-scale systems, delivering up to 72 Blackwell GPUs per rack with roughly 1.8 terabytes per second of GPU-to-GPU bandwidth via NVLink within each rack. What makes Fairwater architecturally distinct is how Microsoft connects these campuses and what that connection enables.
The separation is approximately 700 miles. The link is a dedicated optical fiber network — part of a 120,000-mile fiber infrastructure build that Microsoft expanded by 25% in a single year. Traffic travels at nearly the speed of light with minimal congestion because it is not shared with general internet traffic. This AI WAN creates a three-layer network architecture: NVLink for intra-rack GPU communication, Ethernet fabric at 800 Gbps for intra-site communication, and the optical backbone for inter-site communication across the 700-mile gap.
The result is that AI training workloads — including those for OpenAI models, the Microsoft AI Superintelligence Team, and Copilot capabilities — can be distributed across both campuses simultaneously, treating the aggregate GPU pool as a single resource. According to Microsoft’s own announcement, training timelines for large models have dropped from “several months” to “weeks” as a result of this distributed architecture.
The Architecture Decisions That Make This Possible
The Fairwater design incorporates several engineering choices that diverge significantly from conventional data center practice. Understanding these choices matters because they represent a template — Microsoft describes Fairwater as a “repeatable architectural pattern” intended for global deployment, not a one-off facility.
Liquid cooling as the baseline, not the exception. The Atlanta facility uses closed-loop liquid cooling systems that consume almost zero water in steady-state operation — initial fill equivalent to what 20 homes use in a year, according to Microsoft. This is a significant departure from air-cooled data center design and reflects the thermal reality of high-density GPU compute: NVIDIA’s Blackwell architecture GPU thermal design power exceeds 700W per chip, and a GB200 NVL72 rack with 72 GPUs generates heat loads that air cooling cannot manage at scale.
Two-story architecture for density. The Atlanta facility is a two-story building with racks arranged across three dimensions — not the traditional single-floor warehouse model. This increases rack density per building footprint, which matters because hyperscale AI campuses are increasingly constrained by land availability and power supply capacity rather than by building size.
SONiC-based switching with commodity Ethernet. Microsoft uses the open-source SONiC network operating system on commodity Ethernet hardware for the fabric switching, explicitly avoiding proprietary vendor lock-in. The 800 Gbps GPU-to-GPU connectivity across the Ethernet fabric uses minimal hop counts to reduce latency — a design choice driven by the sensitivity of distributed AI training to network latency variability.
Dedicated AI WAN, separate from general cloud traffic. The AI WAN is a purpose-built network that carries only AI training and inference traffic between data centers. It is not shared with Microsoft Azure’s general-purpose cloud traffic, eliminating the congestion and prioritization conflicts that would make multi-site AI training impractical over shared infrastructure.
Advertisement
What Enterprise Infrastructure Leaders Should Take From This
Microsoft’s Fairwater is not a consumer product and it is not a reference design that most organizations can directly replicate. The 120,000-mile fiber build and hundreds of thousands of Blackwell GPUs represent investments at a scale accessible only to hyperscalers. But the architectural principles embedded in Fairwater are directly relevant to enterprise infrastructure planning decisions being made today.
1. Model Future AI Compute Needs as Distributed, Not Monolithic
The monolithic AI training cluster — all GPUs in one location, all connected via high-speed local fabric — is reaching its practical limits at the frontier. Microsoft’s distributed design is a response to real constraints: no single power grid connection can supply the electricity needed for a frontier-scale AI cluster; no single building can house the racks; and GPU chip yields limit the total hardware that any one facility can receive and install at once. Enterprise AI infrastructure teams should plan future compute expansion with the assumption that distribution across sites will be necessary — and that the network linking those sites is as critical as the compute hardware itself.
2. Treat Your Network as AI Infrastructure, Not Just Connectivity
The AI WAN is the most underappreciated element of the Fairwater architecture. In traditional data center thinking, the network is infrastructure that connects compute to storage — important but subordinate. In distributed AI training, the network IS the critical path: a 10ms increase in inter-GPU communication latency can reduce training throughput by more than the cost difference between AI-optimized and standard switching would justify. Infrastructure teams planning AI deployments should audit their inter-site network capacity, latency, and jitter before committing to distributed training architectures. The right benchmark is not “is the link fast enough?” but “is the link fast and consistent enough to sustain gradient synchronization across thousands of GPUs?”
3. Prioritize Liquid Cooling in Any New AI Server Procurement Cycle
Fairwater’s liquid cooling decision is not optional at GPU rack densities above 50 kW per rack — and NVIDIA’s Blackwell systems push beyond that threshold. Organizations procuring AI servers for on-premises deployment or colocation should require liquid cooling readiness from the facility before finalizing any GPU hardware order. A data center facility that cannot support rear-door heat exchangers or direct liquid cooling loops will be unable to house the next generation of AI accelerators within 24 months. This is a facility specification issue, not a hardware issue — and it needs to be resolved before the GPUs arrive, not after.
4. Evaluate SONiC and Open Networking for AI Fabrics
Microsoft’s choice of SONiC-based switching with commodity Ethernet for the Fairwater fabric is a deliberate cost and flexibility decision: open-source network operating systems on commodity hardware are significantly cheaper than proprietary alternatives and allow faster adoption of new switching speeds as standards evolve. For enterprise AI infrastructure teams evaluating their switching layer, SONiC is now production-ready, validated at hyperscaler scale, and supported by the major switching hardware vendors. The 800 Gbps RoCEv2 fabric standard used in Fairwater will become the enterprise baseline within 2-3 years — planning for it now avoids a forklift upgrade later.
The Bigger Picture: Geo-Distribution as the New Infrastructure Paradigm
Fairwater is not an isolated experiment. It is the visible leading edge of a structural shift in how the world’s largest AI infrastructure operators think about the relationship between geography, power, and compute. The era of the monolithic data center campus — one location, maximum density, single power grid connection — is ending for frontier AI workloads, and Fairwater is the architectural proof of concept for what replaces it.
The geo-distributed AI superfactory model solves three constraints simultaneously. It decouples compute scale from single-site power availability — each campus draws from its local grid, and the aggregate is larger than either could support alone. It provides geographic resilience — a natural disaster, grid outage, or regulatory disruption at one site does not halt training. And it allows hardware deployment to proceed in parallel across multiple sites, reducing the time from GPU shipment to productive training capacity.
For the broader cloud and enterprise infrastructure market, the Fairwater model signals that investment in high-capacity, low-latency inter-datacenter networking is no longer optional infrastructure overhead — it is the core enabling technology for AI at scale. Companies and operators that treat their wide-area network as a shared-cost commodity rather than a purpose-built AI training backbone will find themselves architecturally unable to compete with the distributed training capabilities that define frontier AI development. Fairwater is Microsoft’s answer to the question of what AI infrastructure looks like at scale. The answer is: distributed, liquid-cooled, open-networked, and fiber-connected across hundreds of miles.
Frequently Asked Questions
What makes Fairwater’s distributed design different from a standard multi-site data center?
Conventional multi-site data centers replicate workloads for disaster recovery or serve different geographic user bases — they do not function as a single unified compute system. Fairwater’s innovation is that the 700-mile gap between Atlanta and Wisconsin is transparent to AI training workloads: GPUs across both campuses can participate in a single distributed training job simultaneously, synchronized via the AI WAN. This requires a dedicated, low-latency, high-throughput optical connection that Microsoft built specifically for AI traffic — not shared internet or standard cloud WAN infrastructure.
Why is liquid cooling mandatory for Blackwell GPU deployments?
NVIDIA’s Blackwell GPU architecture has a thermal design power exceeding 700W per chip. A GB200 NVL72 rack housing 72 Blackwell GPUs generates a heat load that conventional air cooling cannot remove fast enough to prevent thermal throttling — the condition where GPUs reduce their clock speed to avoid overheating, directly reducing AI training throughput. Liquid cooling systems — whether direct-to-chip, rear-door heat exchangers, or full immersion — can handle 10-50 times the heat flux of air cooling per square meter of rack space, making them the only viable thermal management approach at these density levels.
Can enterprises replicate any part of the Fairwater architecture at a smaller scale?
Yes — particularly the networking and cooling principles. SONiC-based open networking on commodity Ethernet hardware is available from multiple vendors at enterprise scale and provides the same architectural benefits (vendor independence, lower cost, faster upgrade cycles) that Microsoft achieves at hyperscaler scale. Liquid cooling solutions from vendors like Vertiv, Schneider Electric, and CoolIT Systems are available for enterprise-scale AI server deployments. The AI WAN principle — a dedicated low-latency network for AI traffic, separate from general-purpose enterprise WAN — is implementable at much smaller scales using SD-WAN with traffic prioritization and dedicated links between AI compute sites.
Sources & Further Reading
- From Wisconsin to Atlanta: Microsoft Connects Datacenters to Build Its First AI Superfactory — Microsoft News
- Microsoft’s Fairwater Atlanta and the Rise of the Distributed AI Supercomputer — Data Center Frontier
- Microsoft Launches Atlanta Fairwater Data Center: Two Stories, No UPS or Gen-Sets — Data Center Dynamics
- New Data Center Developments May 2026 — Data Center Knowledge



