The Chip That Breaks the 4,600 Teraflop Barrier
The AI infrastructure race just entered a new phase. Google’s seventh-generation Tensor Processing Unit, codenamed Ironwood, represents the company’s most aggressive move yet in custom AI silicon — a chip designed from the ground up for the inference era. When Anthropic simultaneously committed to deploying up to one million of these chips for its Claude models, it signaled that the balance of power in AI compute is shifting away from a single-vendor GPU world.
Each Ironwood chip delivers 4,614 FP8 teraflops — a 10x increase in peak performance over TPU v5p and more than 4x the per-chip efficiency of its immediate predecessor, TPU v6e (Trillium). The memory story is equally significant: 192 GB of HBM3E per chip with 7.37 TB/s bandwidth, a 6x capacity increase over Trillium. For models that are growing larger with every generation, this memory headroom eliminates bottlenecks that previously forced engineers to shard models across many more chips.
Each Ironwood chip uses a dual-chiplet architecture, with each chiplet containing one TensorCore, two SparseCores, and 96 GB of HBM — connected by a die-to-die interface six times faster than a single ICI link. Google claims 2x performance-per-watt improvement over Trillium, and nearly 30x better energy efficiency than its first Cloud TPU. In a world where data center power consumption is becoming a hard constraint, that efficiency metric matters as much as raw performance.
The Superpod: 9,216 Chips, 42.5 Exaflops
Where Ironwood truly differentiates is at scale. A single Ironwood superpod connects 9,216 chips through a 9.6 Tb/s inter-chip interconnect (ICI) network, delivering a combined 42.5 exaflops of FP8 compute. To put that in perspective, the entire TOP500 supercomputer list aggregates roughly 15 exaflops of LINPACK (FP64) performance — a single Ironwood superpod exceeds that in lower-precision AI workloads. The superpod also aggregates approximately 1.77 petabytes of HBM3E memory — enough to hold even the largest frontier models entirely in high-bandwidth memory.
The full superpod configuration consumes roughly 10 megawatts under full load. That sounds enormous in absolute terms, but the performance-per-watt at 42.5 exaflops makes it remarkably efficient compared to assembling equivalent compute from commodity GPUs. According to SemiAnalysis analysis, the all-in total cost of ownership per Ironwood chip is approximately 44% lower than the TCO of an NVIDIA GB200 server in comparable configurations. For external Google Cloud customers, hourly costs run approximately 30% lower than GB200 pricing.
Built for the Reasoning Era
Previous TPU generations were primarily marketed as training accelerators. Ironwood marks a deliberate pivot. Google describes it as “the first TPU for the age of inference,” custom-built for the low-latency, high-throughput demands of serving AI models at scale.
This design philosophy reflects a fundamental shift in the AI industry. As frontier models mature, the compute bottleneck is moving from training (run once) to inference (run billions of times). Reasoning models like chain-of-thought systems, mixture-of-experts architectures, and agentic AI frameworks all require sustained inference compute that dwarfs their training budgets. Ironwood’s architecture optimizes for fast context switching, low-latency path execution, and the bursty communication patterns characteristic of real-time model serving.
Google uses these chips internally across Search, YouTube, Gmail, and Gemini — services that collectively handle billions of inference queries daily. Making Ironwood available through Google Cloud means external customers can access the same infrastructure.
Advertisement
Anthropic’s Billion-Dollar Bet on Custom Silicon
On October 23, 2025, Anthropic announced the largest expansion of its TPU usage to date: access to up to one million Ironwood chips through Google Cloud, with well over a gigawatt of data center capacity coming online through 2026. Industry estimates place the deal’s value in the tens of billions of dollars, with roughly $35 billion typically allocated to chips in a 1-gigawatt data center buildout.
For Anthropic, this is both a capacity play and an architectural one. The company has trained and served Claude models on TPUs since its founding, and its engineering teams have deep expertise in optimizing for Google’s silicon. But Anthropic is not going all-in on a single vendor. The company maintains a deliberate multi-platform strategy, using Google’s TPUs, Amazon’s Trainium chips, and NVIDIA GPUs in parallel. This diversified approach hedges against supply chain risks and allows Anthropic to match workloads to the most cost-effective hardware.
The scale of the commitment — up to one million chips — suggests Anthropic is planning for a future where inference demand for Claude grows by orders of magnitude. Running reasoning models, agentic systems, and multimodal applications at global scale requires exactly the kind of dense, efficient compute that Ironwood superpods provide.
Google vs. NVIDIA: The Scale Divergence
At the individual chip level, Ironwood and NVIDIA’s B200 are roughly comparable — 4.6 petaflops versus 4.5 petaflops in FP8 performance. The divergence emerges at scale. NVIDIA’s GB200 NVL72 system connects 72 GPUs in a single NVLink domain, delivering approximately 0.72 exaflops of FP8 compute. Google’s Ironwood superpod connects 128 times more chips, achieving 42.5 exaflops in a single logical system.
This difference matters for the largest frontier models. When a model requires thousands of chips for a single inference pass, the fabric that connects those chips becomes as important as the chips themselves. Google’s ICI network, designed in-house and tightly co-optimized with the TPU silicon, avoids the multi-hop latency penalties that emerge when scaling GPU clusters beyond a single NVLink domain.
That said, NVIDIA retains decisive advantages in ecosystem breadth, software maturity (CUDA), and third-party hardware availability. Ironwood is available exclusively through Google Cloud — you cannot buy these chips. For organizations that need on-premises AI infrastructure or multi-cloud portability, NVIDIA remains the default choice. The TPU path only makes sense for workloads that can commit to Google’s stack.
What This Means for AI Infrastructure
The Ironwood launch and the Anthropic mega-deal crystallize several trends that will define AI infrastructure through the rest of this decade:
Custom silicon is no longer a niche. Google, Amazon (Trainium/Inferentia), and Microsoft (Maia 200) are all investing billions in proprietary AI chips. The era of NVIDIA as the sole supplier of frontier AI compute is ending — not because NVIDIA is declining, but because demand is so large that multiple silicon architectures will coexist.
Inference is the new battleground. Training a frontier model is a one-time capital expense. Serving it to millions of users is an ongoing operational cost that can easily exceed training budgets within months. Ironwood’s inference-first design reflects where the money is going.
Hyperscaler lock-in is the trade-off. Custom TPUs offer superior price-performance within Google Cloud, but they create deep vendor dependency. Anthropic hedges this by maintaining parallel capacity on AWS and with NVIDIA. Smaller organizations may not have that luxury.
Frequently Asked Questions
What makes Google’s Ironwood TPU different from previous generations?
Ironwood is the first Google TPU explicitly designed for the inference era rather than primarily for training. Each chip delivers 4,614 FP8 teraflops with 192 GB of HBM3E memory — a 10x peak performance increase over TPU v5p. Its dual-chiplet architecture and 2x improvement in performance-per-watt over Trillium make it optimized for the low-latency, high-throughput demands of serving AI models at scale.
Why did Anthropic commit to up to one million Ironwood chips?
Anthropic has trained and served Claude on Google’s TPU architecture since its founding, giving its engineers deep optimization expertise. The deal, announced in October 2025 and estimated at tens of billions of dollars with over one gigawatt of data center capacity, reflects Anthropic’s expectation that inference demand for Claude will grow by orders of magnitude as reasoning models and agentic AI systems scale globally.
How does Ironwood compare to NVIDIA’s Blackwell GPUs for AI workloads?
At the single-chip level, Ironwood (4.6 PFLOPS FP8) and NVIDIA’s B200 (4.5 PFLOPS FP8) are nearly identical. The critical difference is scale: an Ironwood superpod connects 9,216 chips delivering 42.5 exaflops, while NVIDIA’s GB200 NVL72 connects 72 GPUs for approximately 0.72 exaflops FP8. However, NVIDIA retains advantages in CUDA software ecosystem maturity and hardware availability beyond Google Cloud.
Sources & Further Reading
- Ironwood: The First Google TPU for the Age of Inference — Google Blog
- Anthropic to Expand Use of Google Cloud TPUs and Services — Google Cloud Press Corner
- Google and Anthropic Confirm Massive 1GW+ Cloud Deal — Data Centre Dynamics
- Inside the Ironwood TPU Codesigned AI Stack — Google Cloud Blog
- Google and Anthropic Announce Cloud Deal Worth Tens of Billions — CNBC
- Expanding Our Use of Google Cloud TPUs and Services — Anthropic
- Google Deploys Axion CPUs and Seventh-Gen Ironwood TPU — Tom’s Hardware
- TPU7x (Ironwood) Documentation — Google Cloud






