GPU Cloud Wars: CoreWeave, Lambda, and the Challengers to AWS AI Infrastructure

The Scarcest Resource in Technology

In 2024, the most valuable commodity in the technology industry was not software, data, or talent. It was GPUs — specifically, NVIDIA’s H100 and H200 accelerators — the specialized chips that train and run the large AI models powering the current technology revolution.

Demand for AI compute has outstripped supply so dramatically that GPU access has become a strategic bottleneck for AI companies, hyperscalers, and sovereign governments alike. NVIDIA’s data center revenue grew from $15 billion in fiscal year 2023 to over $115 billion in fiscal year 2025 — a 7.7x increase in two years — and every chip was spoken for months before it shipped.

This GPU scarcity created an opening for a new category of cloud provider: companies that do not try to compete with AWS, Azure, or GCP across the full cloud stack but instead specialize exclusively in GPU-dense compute for AI workloads. CoreWeave, Lambda Labs, Crusoe Energy, Together AI, and others have raised billions of dollars and are building a parallel cloud infrastructure optimized for a single purpose: making AI run.

CoreWeave: From Crypto to Cloud to a $23 Billion IPO

CoreWeave’s origin story is one of the most remarkable pivots in recent tech history. Founded in 2017 as a cryptocurrency mining operation called Atlantic Crypto, the company began pivoting to GPU cloud computing after the 2018 crypto crash made mining unviable. By 2019-2020, CoreWeave was providing GPU cloud services for VFX and 3D rendering workloads. Then in 2022, CoreWeave recognized that AI was its primary opportunity and invested heavily in NVIDIA’s latest H100 GPUs, positioning itself as a key infrastructure partner for companies like OpenAI.

The timing was extraordinarily fortunate. By 2023, CoreWeave was one of the few companies outside the hyperscalers with significant GPU inventory, and demand from AI companies was insatiable. The company raised $7.5 billion in debt financing in 2024, led by Blackstone and Magnetar, secured primarily by customer contracts and GPU infrastructure. CoreWeave completed a March 2025 IPO at $40 per share — an initial valuation of approximately $23 billion — with the market capitalization subsequently rising above $35 billion as AI infrastructure demand intensified.

CoreWeave’s competitive advantage is specialization. While AWS offers hundreds of services across compute, storage, networking, databases, analytics, and machine learning, CoreWeave focuses on one thing: providing the fastest, most reliable GPU compute for AI training and inference. This focus enables:

Higher GPU density. CoreWeave’s data centers are designed from the ground up for GPU workloads, with liquid cooling, high-bandwidth InfiniBand networking between GPUs, and storage architectures optimized for the large-file, high-throughput access patterns of AI training.

Lower latency between GPUs. Training a large AI model requires thousands of GPUs communicating intensively. The network topology and bandwidth between GPUs matters as much as the GPUs themselves. CoreWeave’s purpose-built clusters achieve lower inter-GPU latency than general-purpose cloud environments where AI workloads share infrastructure with web servers and databases.

Simpler pricing. Instead of AWS’s notoriously complex pricing with hundreds of instance types and billing dimensions, CoreWeave offers straightforward per-GPU-hour pricing with predictable costs.

CoreWeave’s customer base includes Microsoft (which signed a multi-billion dollar agreement — reported at approximately $10 billion — to use CoreWeave infrastructure for Azure AI workloads), Meta, and dozens of AI startups. The Microsoft relationship is particularly notable: the world’s second-largest cloud provider is itself renting GPU capacity from a specialized provider, underscoring the severity of the compute shortage.

Lambda Labs: The Developer-First GPU Cloud

Lambda Labs approached the GPU cloud market from a different angle: developer experience. Founded by Stephen Balaban, Lambda started by selling GPU workstations to AI researchers and expanded into cloud GPU rental with a focus on making AI compute as easy to use as possible.

Lambda’s differentiation is its developer-centric approach:

Pre-configured environments. Lambda cloud instances come with the latest NVIDIA drivers, CUDA toolkit, PyTorch, TensorFlow, and other ML frameworks pre-installed and tested. A researcher can go from zero to training a model in minutes rather than spending hours debugging driver compatibility issues — a genuine pain point on general-purpose cloud platforms.

Simple cluster management. Lambda’s managed clusters handle the infrastructure complexity of multi-GPU and multi-node training — job scheduling, distributed training frameworks, checkpoint management, and fault tolerance — so researchers can focus on model development.

Competitive pricing. Lambda’s H100 pricing has consistently been 30-50% below AWS and Azure equivalent instances, made possible by lower overhead (no need to maintain hundreds of non-GPU services) and efficient data center operations.

Lambda has become the default cloud provider for many AI research labs and startups, including significant contracts with academic institutions that need GPU access but cannot justify hyperscaler pricing.

The Broader Challenger Ecosystem

CoreWeave and Lambda are the most prominent GPU cloud challengers, but the ecosystem is expanding:

Crusoe Energy pairs GPU compute with stranded natural gas — gas that would otherwise be flared (burned wastefully) at oil extraction sites. By building modular data centers at gas well sites and using the stranded gas to generate electricity, Crusoe provides AI compute at lower cost while reducing methane emissions. The company has raised over $2.5 billion in total funding, including a $1.375 billion Series E round in October 2025 that valued it at over $10 billion, and operates GPU clusters powered entirely by energy that would otherwise be wasted.

Together AI combines a GPU cloud platform with an open-source AI research lab, offering cloud inference for popular open-source models (Llama, Mistral, etc.) alongside custom training services. Together’s “serverless inference” product lets developers call open-source models via API without managing infrastructure — competing directly with OpenAI and Anthropic on price for applications that do not require frontier capabilities.

Voltage Park, a non-profit funded by billionaire Jed McCaleb (co-founder of Ripple and Stellar), entered the market in late 2023 with an order of 24,000 H100 GPUs worth approximately $500 million, deployed across six data centers by early 2024. The organization targets AI companies and researchers that need guaranteed long-term compute access at competitive rates.

Nebius (spun off from Yandex) is building GPU cloud infrastructure in Europe, targeting the growing demand for sovereign AI compute — European companies and governments that want AI infrastructure located in European data centers under European data protection laws.

The Hyperscaler Response

AWS, Azure, and GCP are not standing still. Each has invested tens of billions of dollars in AI infrastructure:

AWS has expanded its GPU offerings with NVIDIA H100 and H200 instances (P5 and P5e families), its own custom AI chips (Trainium2, which reached general availability in December 2024, with Trainium3 expected in late 2025), and Bedrock — a managed service for running foundation models. AWS’s advantage is integration: GPU instances connect seamlessly to S3 storage, SageMaker ML pipelines, and the rest of the AWS ecosystem.

Microsoft Azure has the deepest AI partnership through its exclusive relationship with OpenAI and its investment in CoreWeave for overflow capacity. Azure’s ND H200 instances and custom Maia 100 AI accelerator position it for both training and inference workloads, though large-scale production deployment of the Maia 100 has been limited, and the next-generation Maia 200 has been delayed to 2026.

Google Cloud benefits from TPU (Tensor Processing Unit) infrastructure — Google’s custom-designed AI chips that offer an alternative to NVIDIA GPUs. TPU v5p pods provide massive scale for training (up to 8,960 chips per pod) at competitive pricing, and Google’s Gemini models are trained entirely on TPUs.

The hyperscalers’ structural advantage is ecosystem and enterprise relationships. An enterprise already running on AWS is unlikely to move its AI workloads to CoreWeave just for GPU pricing — the integration costs and data gravity (moving terabytes of training data to a new provider) create strong lock-in. Challengers win when: (a) the workload is GPU-dominant with minimal dependency on other cloud services, (b) the customer is a GPU-native AI company (not a traditional enterprise), or (c) the price/performance gap is large enough to justify the integration overhead.

Custom Silicon: The NVIDIA Alternative Path

The GPU cloud war is intertwined with the custom silicon war. Dependence on a single vendor (NVIDIA) for 80%+ of AI compute chips is a strategic risk that hyperscalers are actively addressing:

Google TPUs are the most mature alternative, with TPU v5 offering competitive performance for training and inference. Google uses TPUs internally for all Gemini model training and offers them to external customers through Cloud.

Amazon Trainium2 is AWS’s custom AI training chip, delivering 4x the performance of Trainium1 at lower per-unit costs. Generally available since December 2024, Trainium2 clusters are positioned as cost-competitive with NVIDIA for large-scale training. AWS has announced Trainium3 for late 2025, promising another generational performance leap.

Microsoft Maia 100 is Azure’s custom AI accelerator, designed with input from OpenAI to optimize for the specific model architectures and inference patterns that OpenAI deploys. However, large-scale production deployment has been limited, and the next-generation Maia 200 has been delayed to 2026 due to design changes and staffing issues.

AMD MI300X is NVIDIA’s primary third-party competitor, offering competitive raw compute performance at lower price points. AMD has gained significant traction with Microsoft, Meta, and Oracle, though NVIDIA’s CUDA software ecosystem remains a major barrier to switching.

Intel Gaudi 3 targets the price-sensitive segment of the AI compute market, offering lower raw performance than NVIDIA H100 but at substantially lower cost per chip.

The diversification trend means that by 2027-2028, the GPU cloud market will be significantly less NVIDIA-dominated than it is today — creating more pricing competition and reducing the supply bottleneck risk.

The Financial Reality: Profitability and Risk

The GPU cloud challengers face a fundamental financial tension: they must commit to massive capital expenditure (purchasing GPUs, building data centers) based on demand projections that could shift rapidly if AI investment cycles cool.

CoreWeave’s IPO prospectus revealed the scale of this bet: $8.7 billion in property and equipment, much of it financed with debt. Industry-wide GPU utilization rates have been estimated above 90% during the current AI infrastructure boom, though specific utilization figures were not disclosed in CoreWeave’s filing. If those utilization rates hold, the economics are excellent. If demand drops — due to an AI investment slowdown, model efficiency improvements that reduce compute needs, or customer shifts to custom silicon — the debt burden could become unsustainable.

The parallel to the telecom infrastructure buildout of the late 1990s is uncomfortable: massive capital investment based on demand projections that ultimately proved optimistic, followed by a wave of bankruptcies. The GPU cloud market may avoid this fate if AI demand continues its current growth trajectory — but the risk is real and should be understood by investors and customers evaluating long-term commitments to challenger providers.

Decision Radar (Algeria Lens)

Dimension	Assessment
Relevance for Algeria	Moderate — Algeria does not currently have GPU cloud infrastructure; access to international GPU cloud providers is possible but limited by bandwidth, latency, and payment constraints
Infrastructure Ready?	No — Algeria has no domestic GPU cloud infrastructure; Oran AI Data Center (under development) could change this, but current options require using international providers
Skills Available?	Moderate — Algerian AI researchers and developers can use GPU cloud platforms; the constraint is cost and access, not skills
Action Timeline	18-36 months — Until domestic GPU infrastructure exists, Algerian AI teams should use international GPU clouds (Lambda’s pricing is accessible); the Oran Data Center project could provide sovereign GPU compute by 2028
Key Stakeholders	Ministry of Digital Economy, Algerian Space Agency, university AI research labs, Sonatrach (AI for energy optimization), AI startups, Oran Data Center project team
Decision Type	Strategic + Infrastructure — Whether Algeria builds sovereign GPU compute capability is a national strategic decision with implications for AI sovereignty and research capacity

Quick Take: Algeria’s ability to participate in the AI revolution depends partly on access to GPU compute. Today, Algerian AI researchers and startups must rent compute from international providers — Lambda Labs and Together AI offer the most accessible pricing. The planned Oran AI Data Center represents Algeria’s opportunity to build sovereign GPU compute infrastructure, which would enable local AI training, reduce dependence on foreign cloud providers, and attract AI investment. Algeria should study the Nebius (European sovereign AI cloud) model as a template for building national AI compute infrastructure that serves both domestic and regional demand.

Leave a Comment Cancel reply

Most recent

Digital Economy

After Jumia’s Exit: Who Will Win Algeria’s E-Commerce Market?

Policy & Regulation

Digital Accessibility Laws: How WCAG Mandates and the EU Accessibility Act Are Reshaping the Web

AI & Automation

AI at the Border: How Algeria’s Customs and Port Systems Are Going Digital

Skills & Careers

The Algerian Developer Stack: What Languages, Frameworks, and Tools Algerian Developers Actually Use in 2026