En bref : NVIDIA posted $130.5 billion in revenue for fiscal year 2025, with AI chips accounting for 88% of that total. But the company’s real power is not its hardware — it is the CUDA software ecosystem that creates prohibitive switching costs for the entire AI industry. This article explains how NVIDIA built the GPU economy, why challengers like AMD, Google, and Amazon are gaining ground, and what the shift from training to inference means for NVIDIA’s dominance.
The $130 Billion Machine
NVIDIA posted $130.5 billion in revenue for fiscal year 2025 — a figure that would have seemed absurd three years earlier when the company brought in $27 billion. Data center revenue, almost entirely AI chip sales, accounted for $115.2 billion of that total. NVIDIA’s market capitalization peaked above $5 trillion in late 2025 and as of early 2026 stands around $4.4 trillion, placing it among the most valuable companies on Earth alongside Apple and Microsoft.
These are not the numbers of a chipmaker. They are the numbers of a company that has positioned itself as the tollbooth operator for the entire AI infrastructure race. Every major AI lab, every hyperscaler, every enterprise deploying machine learning at scale pays NVIDIA for the privilege. Understanding how this monopoly was built — and what might break it — is essential for anyone making decisions about AI infrastructure.
The CUDA Moat
NVIDIA’s dominance is often attributed to its GPU hardware. That explanation is incomplete. The deeper advantage is CUDA, the proprietary parallel computing platform NVIDIA released in 2006.
CUDA is not a product NVIDIA sells directly. It is the invisible substrate on which the entire AI software ecosystem was built. PyTorch, TensorFlow, JAX — every major framework is optimized for CUDA first and everything else second. The training pipelines at OpenAI, Anthropic, Google DeepMind, and Meta all assume CUDA. The collective investment in CUDA-optimized code — billions of engineering hours over nearly two decades — represents switching costs that no hardware specification sheet can overcome.
AMD’s ROCm platform is the most serious alternative. It is open-source, technically capable, and backed by substantial engineering resources. But switching from CUDA to ROCm requires rewriting kernel code, revalidating numerical accuracy, debugging performance regressions, and retraining operations teams. For an AI lab that has spent months tuning a training run on CUDA, the prospect of repeating that effort on ROCm — with less community support and fewer pre-optimized libraries — is a hard sell even when AMD’s hardware is price-competitive.
Intel’s oneAPI framework attempted a hardware-agnostic alternative but gained minimal AI traction. The CUDA moat is not about technical superiority — it is about ecosystem gravity. Developers build on CUDA because other developers built on CUDA, and each year of accumulated compatibility makes migration harder.
From Chips to Platform
NVIDIA’s strategic evolution over the past three years is clear: the company is transforming from a chip supplier into a full-stack AI platform. The GPU economy is no longer just about selling silicon.
DGX Cloud provides turnkey access to GPU clusters through partnerships with Oracle, Microsoft Azure, Google Cloud, and Lambda. Rather than competing with hyperscalers, NVIDIA embeds itself inside their clouds.
NVIDIA Inference Microservices (NIM) package pre-optimized AI models as containerized microservices that run on NVIDIA GPUs with minimal configuration. For enterprises, NIM shortens the path from model selection to production — but deepens lock-in to NVIDIA’s stack.
AI Enterprise, priced at $1,000 per GPU per year, bundles NIM, development tools, and enterprise support — converting hardware sales into recurring software revenue, mirroring what Microsoft achieved with Azure and Office 365.
The platform strategy means NVIDIA does not need to win every hardware generation decisively. Even if a competitor produces a superior chip, NVIDIA’s software ecosystem creates enough friction that most customers stay. This is the playbook that kept Intel dominant in x86 for decades. NVIDIA has executed it with greater discipline.
Advertisement
The Blackwell Architecture
NVIDIA’s hardware remains formidable. The GB200 NVL72 — a liquid-cooled rack containing 72 Blackwell GPUs connected by NVLink — delivers 30x the inference performance and 4x the training performance of the previous-generation H100 system for large language model workloads. Each B200 GPU provides 20 petaflops of FP4 compute with 192 GB of HBM3e memory.
The Blackwell Ultra (B300), which began shipping in early 2026, pushed memory to 288 GB of HBM3e. NVIDIA’s roadmap maintains annual refreshes: Vera Rubin in H2 2026 with HBM4 memory, Rubin Ultra in 2027, and Feynman beyond that. This cadence forces competitors to aim at a moving target while pushing customers toward continual upgrades.
NVIDIA’s supply chain relationship with TSMC underpins this pace. NVIDIA is TSMC’s largest customer for advanced AI chip fabrication, reportedly securing over 60% of TSMC’s 2026 CoWoS advanced packaging allocation — a manufacturing advantage no competitor can easily replicate.
The Challengers
NVIDIA’s dominance is real, but the threat landscape is broader than it was two years ago.
AMD’s MI300X has won meaningful adoption, with 192 GB of HBM3 and 5.3 TB/s bandwidth excelling at memory-bound inference workloads. Microsoft Azure, Oracle Cloud, and several GPU cloud providers deploy it at scale. AMD’s MI350 series claims up to 35x faster inference than the MI300X, and the MI400 with Helios rack-scale system targets 2026 as a direct NVL72 competitor.
Google’s TPU v7 (Ironwood) delivers 4,614 FP8 TFLOPS per chip, scaling to 42.5 ExaFLOPS in pods of 9,216 chips. Anthropic’s deal for hundreds of thousands of Trillium (TPU v6e) chips — scaling toward one million by 2027 and worth tens of billions of dollars — signals that NVIDIA is not the only viable path to frontier AI.
Amazon’s Trainium chips eliminate the NVIDIA markup entirely. Trainium2 offers 30-40% better price performance than GPU-based EC2 instances. Trainium3, on a 3nm process, delivers 4.4x more compute and 4x better energy efficiency. Amazon controls chip, cloud, and customer relationship end to end.
Cerebras builds wafer-scale chips with 4 trillion transistors, signed a deal worth over $10 billion to supply OpenAI, and demonstrated 18x faster inference than GPU-based solutions powering Meta’s Llama API. The company targets a Q2 2026 IPO at a $23 billion valuation.
Each challenger attacks a different facet: AMD on specifications, Google and Amazon on vertical integration, Cerebras on architectural novelty. The cloud wars are accelerating this fragmentation as each hyperscaler builds proprietary silicon to differentiate its AI platform. None has displaced NVIDIA. But collectively, they are eroding the assumption that NVIDIA GPUs are the only option.
The GPU Economy Ahead
Three forces will shape the NVIDIA GPU economy’s next chapter.
First, inference is overtaking training as the dominant compute workload. Training is a one-time cost; serving models to millions of users is ongoing and scales with adoption. This shift favors specialized inference hardware and algorithmic efficiency over brute-force GPU compute, potentially opening space for architectures that compete on cost per token rather than peak training throughput.
Second, platform lock-in is deepening. Every enterprise adopting NIM, AI Enterprise, or DGX Cloud becomes more embedded in NVIDIA’s ecosystem. The long-term strategy is to make GPU hardware a commodity component of a software-defined platform where switching costs are measured in organizational dependencies, not chip specifications.
Third, geopolitical risk is growing. The AI infrastructure war has made NVIDIA a geopolitical actor. Export controls have cost billions in Chinese revenue. Antitrust scrutiny is intensifying. The Groq deal — a $20 billion licensing and talent arrangement bringing inference-optimized LPU technology into the Vera Rubin architecture — may face regulatory challenges.
NVIDIA’s position today resembles Intel’s in the early 2000s: an overwhelming market leader with a deep software moat, annual architecture refreshes, and prohibitive switching costs. Intel’s dominance lasted another 15 years before mobile and cloud eroded it. NVIDIA’s moat may prove more durable — or face disruption from a direction no one anticipates.
What is clear is that the GPU economy is no longer just about GPUs. It is about who controls the stack — silicon to software to cloud — that makes AI work.
Frequently Asked Questions
What is nvidia and the gpu economy?
NVIDIA and the GPU Economy: How One Company Controls the AI Hardware Pipeline covers the essential aspects of this topic, examining current trends, key players, and practical implications for professionals and organizations in 2026.
Why does nvidia and the gpu economy matter?
This topic matters because it directly impacts how organizations plan their technology strategy, allocate resources, and position themselves in a rapidly evolving landscape. The article provides actionable analysis to help decision-makers navigate these changes.
How does the cuda moat work?
The article examines this through the lens of the cuda moat, providing detailed analysis of the mechanisms, trade-offs, and practical implications for stakeholders.
Sources & Further Reading
- NVIDIA FY2025 Financial Results — NVIDIA Newsroom
- NVIDIA Blackwell Architecture Technical Overview — NVIDIA Developer Blog
- AMD Instinct MI350 Series GPUs — AMD
- Google Ironwood TPU v7 — Google Cloud Blog
- Amazon Trainium AI Accelerators — AWS
- NVIDIA Roadmap: Rubin GPUs 2026, Feynman Beyond — Tom’s Hardware
- Anthropic to Expand Use of Google Cloud TPUs — Google Cloud
- NVIDIA Acquires Groq for $20 Billion — CNBC

















