What AWS Actually Announced on June 10
AWS brought Graviton5 into general availability on June 10, 2026, attaching the new processor to two instance families: M9g for general-purpose workloads and M9gd for storage-intensive deployments. The announcement had been telegraphed since AWS re:Invent in December 2025, but GA availability transforms a roadmap slide into a procurement decision.
The headline number is 192 Arm cores per chip — the largest core count yet in the Graviton lineage. But core count alone does not tell the story. According to SiliconAngle’s coverage of the launch, Graviton5 ships with DDR5 memory support and PCIe integration, both firsts for the Graviton family. L3 cache is 5x larger than Graviton4, and each core accesses 2.6x more L3 cache — a structural improvement that reduces memory-access latency for the parallel workloads that define modern cloud architectures.
Performance claims from AWS put the aggregate compute improvement at 25% over Graviton4. Drill into workload types and the numbers widen: 35% faster on web applications and ML inference, 30% faster on databases. The M9gd variant adds local NVMe — up to 11.4TB of SSD storage per instance — with 30% faster I/O operations per second versus the prior generation. Network bandwidth improves 15% overall; larger M9g instances reach double the Amazon EBS bandwidth of their Graviton4 counterparts.
Over 120,000 AWS customers already run earlier Graviton generations, according to the AWS Graviton product page. That installed base does not automatically migrate to Graviton5, but it signals how deeply Arm-based instances have penetrated enterprise workloads that would historically have defaulted to x86.
The Custom-Silicon War and Why It Escalates in 2026
Graviton5 did not emerge in isolation. It is the latest move in a multi-year campaign by hyperscalers to displace merchant silicon from their data centers. The strategic logic is straightforward: owning the processor roadmap lets a cloud provider optimize for its own workloads, decouple from Intel and AMD release cycles, and capture margin that previously went to chip vendors.
AWS started this playbook in 2018 with Graviton1. Each generation has expanded the performance advantage and the use-case coverage. Graviton4 cracked the database tier. Graviton5 is explicitly positioned for agentic AI — workloads requiring continuous, high-throughput CPU compute at scale, real-time reasoning, code generation, and multi-step task orchestration. The 192-core count is not incidental; it is calibrated for the concurrency patterns that AI agents generate when multiple reasoning threads run simultaneously.
Google has followed a parallel path with Axion, its Arm-based data center processor. Microsoft is developing Cobalt 100. The pattern is now structural: every major hyperscaler is building or acquiring custom silicon. The implication for enterprises is that by 2027, the x86 instance will no longer be the default choice on any major cloud — it will be a deliberate selection for workloads with specific ISA dependencies or software licensing restrictions.
The efficiency angle amplifies the economic pressure. AWS states that Graviton-based instances use up to 60% less energy than comparable EC2 instances for the same performance. At the scale of a large enterprise cloud bill, that efficiency gain translates directly into cost reduction and sustainability reporting. For organizations with net-zero commitments, running Graviton5 is not a technical choice — it is a governance one.
Advertisement
What Three Early Adopters Signal About Deployment Patterns
Meta, Snowflake, and Uber have publicly committed to deploying Graviton5 instances, as reported by Guru3D in its Graviton5 launch coverage. Reading their use-case profiles reveals the three deployment patterns that will define early adoption.
Meta’s interest aligns with inference at scale. Large language model inference is a CPU-plus-accelerator workload where the CPU handles tokenization, scheduling, and pre/post-processing steps that sit around the GPU computation. A 192-core chip with 2.6x more L3 cache per core means each CPU can handle significantly more concurrent inference sessions before saturating — directly reducing the number of GPU instances required.
Snowflake’s deployment signals database and analytics workloads. The 30% database performance improvement and the doubled EBS bandwidth on larger instances address Snowflake’s core architectural constraint: moving large datasets between storage and compute at query time. Faster I/O means lower query latency at the same cost, or the same latency at lower cost.
Uber’s adoption points toward distributed microservices and real-time decisioning — ride matching, surge pricing, fraud detection. These are exactly the workloads the 15% network bandwidth improvement and lower inter-core latency target. For a company running thousands of microservices across millions of concurrent requests, a 25% compute improvement compounds across the entire fleet.
What Engineering Leaders Should Do About Graviton5
1. Audit Your x86 Instance Inventory for Graviton Migration Candidates
The first step is not a pilot — it is a classification exercise. Segment your EC2 inventory into three buckets: (a) workloads with no ISA dependencies and containerized runtimes — these migrate with a recompile and a flag change; (b) workloads with commercial software licenses that restrict Arm deployment — these require vendor negotiation before migration; (c) workloads bound to x86 by legacy native code or embedded assembly — these need a longer remediation path.
AWS reports that over 120,000 customers have already completed Graviton migrations, with many finishing in hours. The tooling has matured significantly since Graviton1. For bucket (a) workloads running on containers or modern interpreted runtimes (Python, Node.js, JVM, Go), the migration path is documented and the risk is low. Prioritize these first — they are the fastest path to capturing the 20% cost reduction and 60% energy efficiency improvement.
2. Benchmark Against Your Actual Workload Profile, Not AWS Aggregate Claims
The 25% compute improvement is an aggregate across workload classes. Your mileage will vary significantly. Web application and ML inference workloads see 35% improvement; databases see 30%. Batch compute, video transcoding, and scientific simulation will have their own profiles. Before committing instance families at scale, run your actual production workload — or a representative replay — on M9g instances in parallel with your current instance type.
AWS provides a free trial tier (t4g.small, 750 hours/month through December 2026) for initial evaluation. For production-scale benchmarking, run a two-week shadow test: route a percentage of traffic to M9g instances and compare latency percentiles, throughput, and error rates against your baseline. Capture cost-per-transaction as the primary metric — not raw throughput, which optimizes the wrong variable.
3. Factor M9gd Local NVMe Into Your Storage Architecture Review
The M9gd variant with up to 11.4TB of local NVMe SSD is not just a storage upgrade — it changes the architectural calculus for latency-sensitive data access. Applications that currently pull hot data from EBS or ElastiCache because EBS latency is too high may be able to consolidate onto M9gd instances with local NVMe, eliminating a tier of infrastructure complexity.
The tradeoff is instance-store volatility: local NVMe does not persist across instance stops. The pattern that works is instance-store for ephemeral hot data (caches, write buffers, working sets) with synchronization to durable EBS or S3 on checkpoint. If your application already handles node failure gracefully — Kafka partitions, Redis replicas, Cassandra replication — M9gd’s local NVMe fits naturally. If your application assumes storage persistence across restarts, M9gd adds architectural work that may not be worth the latency gain for your use case.
4. Reassess Your Agentic AI Infrastructure Budget Assumptions
Graviton5 was explicitly designed for the concurrency patterns of agentic AI. If your 2026 AI infrastructure plan includes multi-agent orchestration frameworks — whether LangChain, LlamaIndex, Anthropic’s tool-use API, or proprietary orchestration — your CPU sizing assumptions were probably made before Graviton5 was available. A 192-core instance with 35% better ML inference performance changes the ratio of CPU-to-GPU instances you need in your agent serving stack.
Specifically: the pre- and post-processing steps around GPU inference (prompt formatting, token counting, context management, output parsing) are CPU-bound. On a Graviton4-era instance, these steps could bottleneck a fast GPU. On Graviton5, the same steps complete faster and with more concurrency, which means your GPU utilization improves — you get more inference throughput from the same accelerator investment. Model this in your capacity plan before your next GPU instance purchase.
The Bigger Picture: What Custom Silicon Means for Cloud Economics Through 2028
The Graviton5 launch is not a single product announcement. It is a data point in a structural shift that will play out through the rest of the decade.
The hyperscaler custom-silicon trajectory is now established: AWS (Graviton, Trainium, Inferentia), Google (Axion, TPU), Microsoft (Cobalt, Maia). Each cycle, these chips gain ground on merchant silicon for a wider set of workloads. The point at which a workload running on custom silicon is always cheaper than the same workload on Intel or AMD — absent ISA lock-in — is approaching. For general-purpose compute, AWS argues that point has already been crossed with the 20% cost advantage claim.
For engineering organizations, this creates a governance question that goes beyond chip selection. The software supply chain increasingly needs to be Arm-aware. Container base images, runtime dependencies, compiled binaries, and third-party agents all need to target Arm/aarch64 as a first-class platform. Organizations that treat this as a one-time migration task will find themselves redoing the work every time a new workload gets added. The sustainable posture is to establish Arm as the default build target and x86 as the exception, requiring explicit justification.
The winners in this transition are enterprises that run the classification audit now, establish the Arm-first build policy early, and capture the compounding savings — 20% cost plus 60% energy efficiency — across an expanding instance inventory. The losers are those who wait until x86 instance pricing reflects the competitive pressure from custom silicon, at which point the cost advantage will have partially eroded the market.
Frequently Asked Questions
What is the difference between AWS Graviton5 M9g and M9gd instances?
M9g instances are general-purpose compute optimized for web applications, databases, ML inference, and distributed workloads. M9gd instances add local NVMe SSD storage — up to 11.4TB per instance — making them suited for applications requiring low-latency access to large ephemeral datasets such as caches, write buffers, and analytics working sets. Both use the same 192-core Graviton5 chip and deliver the same 25% compute improvement over Graviton4.
How does Graviton5 compare to previous AWS Graviton generations?
Graviton5 delivers 25% higher overall compute performance versus Graviton4, with specific improvements of 35% for web applications and ML inference and 30% for database workloads. The L3 cache is 5x larger than Graviton4, and each core accesses 2.6x more cache — reducing memory latency for concurrent workloads. Graviton5 also introduces DDR5 memory and PCIe integration, both firsts for the Graviton family, along with 15% higher network bandwidth and up to 100% greater EBS bandwidth on larger instances.
Should enterprises migrate x86 workloads to Graviton5 immediately?
Not all at once. The right approach is to classify workloads into three buckets: containerized or modern-runtime workloads with no ISA dependencies (migrate quickly — days to weeks), commercial software with Arm license restrictions (requires vendor negotiation first), and legacy native x86 code (longer remediation path). AWS reports that many customers complete Graviton migrations in hours using current tooling. For most organizations, starting with the first bucket captures the majority of cost savings with minimal risk, while the third bucket is deprioritized until software vendors ship Arm-native builds.
Sources & Further Reading
- Further Reading
- AWS Graviton5 Debuts with New M9g and M9gd Instances — SiliconAngle
- AWS Graviton — Official Product Page — Amazon Web Services
- AWS Graviton5 Debuts with 192 Arm Cores and PCIe 6.0 — Guru3D
- AWS Rebuilds Server CPU Around Agentic AI: 192-Core Graviton5 Launch — TechTimes
- AWS Launches Graviton5 CPU with 192 Cores for Agentic AI — ConvergeDigest




