The Largest Publicly Disclosed Kubernetes Cluster
In a Google Cloud engineering blog post, Google described a 130,000-node GKE cluster built and operated in experimental mode — twice the previously supported 65,000-node ceiling and the largest publicly disclosed Kubernetes cluster to date. The demonstration, detailed alongside KubeCon 2025, is proof that mainstream Kubernetes can now operate at the scale required by frontier AI training runs rather than forcing operators into custom schedulers.
The cluster orchestrated approximately 1.3 million virtual TPUs while sustaining 90% utilization in AllReduce collectives — the pattern that matters for large-model training. It also hit benchmark numbers that redefine what “hyperscale orchestration” means:
- API server QPS peaked at 500k, etcd writes at 100k/sec.
- Sustained throughput of 1,000 pod starts per second with pod startup latency under 5 seconds cluster-wide.
- Kueue preempted 39,000 pods in 93 seconds to make room for higher-priority workloads.
How Kubernetes Broke Its Own Ceiling
Three architectural changes made 130K nodes feasible:
- Replacing etcd with a Spanner-based store. Object counts exceeding 1.3 billion saturate etcd’s memory and write paths. Google swapped the default key-value store for a custom Spanner-backed system that scales horizontally without the historic etcd limits. This is the most significant control-plane change in Kubernetes’ 10-year history.
- API server watch cache and sharding. A strongly consistent watch cache combined with a more fine-grained sharding model kept the API server responsive at 500k QPS instead of melting down at the tens of thousands that production clusters traditionally top out at.
- Job-level scheduling with Kueue and JobSet. The default Pod-level scheduler is the wrong primitive for AI. Kueue adds gang-scheduling, all-or-nothing admission, fair-share, priorities, and quotas — the batch-system vocabulary that ML training has been missing. JobSet orchestrates the multi-job training runs on top.
Advertisement
Why This Matters for AI Workloads
The competitive context is worth naming. AWS EKS tops out at 10,000 nodes per cluster and Azure AKS at 5,000, which forces multi-cluster architectures and the operational debt that comes with them. Google’s 13x-to-26x headroom over managed competitors means frontier AI training can be expressed as a single Kubernetes job instead of a federation of clusters stitched together with custom glue.
For enterprise AI teams, three practical shifts follow:
- Job scheduling is now a first-class Kubernetes primitive. If your training stack has custom schedulers bolted outside Kubernetes (Ray, Slurm, custom operators), Kueue and JobSet are now the reference path. Teams should evaluate a migration rather than accumulating more custom code.
- The multi-cluster / federation premium shrinks. Teams that architected for “the single cluster won’t scale” assumptions built 12 months ago need to revisit. Simpler single-cluster topologies may now be feasible for many enterprise training workloads.
- Observability tooling has to keep up. Running even a few thousand nodes puts pressure on Prometheus, on logging pipelines, and on dashboards. A 130K-node world implies re-architecting the observability stack with streaming and sampling built in.
The Bottleneck Shifts from Chips to Power
The most candid acknowledgement in the disclosures is about the real constraint. The industry is transitioning from a world constrained by chip supply to a world constrained by electrical power. A single NVIDIA GB200 draws 2,700W, and a 100K-GPU cluster’s power footprint can scale into the hundreds of megawatts — a load profile that most data centers and most utility interconnections cannot deliver quickly.
That is why the GKE scalability story dovetails with the fuel-cell-powered data center story — both are responses to the same underlying reality. Kubernetes now scales to the job; the data center power stack has to scale to the Kubernetes cluster. The companies that solve both ends of the stack will own the AI infrastructure decade.
What Enterprise Architects Should Do
Three practical moves for 2026 planning:
- Audit the custom scheduler surface. Every custom scheduler, custom operator, or non-Kubernetes batch system is potential technical debt now that Kueue exists. Not everything should migrate, but everything should be reviewed.
- Pilot Kueue on existing GKE footprints. The primitives that made 130K nodes work — job queueing, gang scheduling, fair-share quotas — solve real problems even on 500-node clusters. The technology is available today.
- Rebuild capacity plans around power, not just cores. The scarce resource is no longer chip availability for most enterprise use cases — it is the kilowatts it takes to feed them. Capacity planning should explicitly model power-constrained regions and on-site generation options.
Google did not just break a Kubernetes limit. It changed the shape of the conversation about how AI training should be orchestrated — and by extension, how the next generation of AI data centers will be designed.
Frequently Asked Questions
How large is Google’s 130,000-node cluster compared to other managed Kubernetes offerings?
The 130,000-node GKE cluster is roughly 13 times larger than the current AWS EKS ceiling of 10,000 nodes per cluster and 26 times larger than Azure AKS’s 5,000-node limit. It is the largest publicly disclosed Kubernetes cluster to date and orchestrated about 1.3 million virtual TPUs with 90% utilization in AllReduce collectives.
What are Kueue and JobSet, and why do they matter?
Kueue is a job queueing controller that brings batch system capabilities — gang scheduling, all-or-nothing admission, priorities, quotas, fair-share — to Kubernetes. JobSet is a companion that orchestrates multi-job training runs. Together they turn Kubernetes from a pod scheduler into an AI-training-aware orchestrator, removing the need for external systems like Slurm or custom operators for many workloads.
What does this announcement mean for power and data center planning?
Google explicitly flagged the shift from a chip-constrained world to a power-constrained world. A 100K-GPU cluster can consume hundreds of megawatts. Enterprises planning AI capacity should model power and interconnection availability alongside GPU availability, and evaluate on-site generation options (fuel cells, co-located renewables) in regions where utility queues are long.
Sources & Further Reading
- How We Built a 130,000-Node GKE Cluster — Google Cloud Blog
- Google Cloud Demonstrates Massive Kubernetes Scale with 130,000-Node GKE Cluster — InfoQ
- Benchmarking a 65,000-Node GKE Cluster with AI Workloads — Google Cloud Blog
- Google’s 130,000-Node GKE Cluster: Scaling AI, Confronting Power Limits — Austin Osuide
- GCP: Building the Largest Known Kubernetes Cluster — CloudSteak
- Google Breaks Kubernetes Limits Again — FAUN Kaptain
- Google’s 130,000-Node Kubernetes Colossus — WebProNews
















