When AI Outperforms the Gold Standard
In November 2023, Google DeepMind published a paper in Science demonstrating that GraphCast, its AI weather prediction model, could produce 10-day global weather forecasts more accurately than the European Centre for Medium-Range Weather Forecasts’ (ECMWF) Integrated Forecasting System — the gold standard that meteorological agencies worldwide had relied on for decades. GraphCast produced these forecasts in under a minute on a single Google TPU v4, compared to the hours of supercomputer time required by traditional numerical weather prediction models.
This was not a marginal improvement. GraphCast outperformed ECMWF’s HRES model on more than 90% of 1,380 test variables and forecast lead times. When limited to the troposphere — the 6-to-20-kilometer-high region of the atmosphere nearest to Earth’s surface, where accurate forecasting is most critical — GraphCast outperformed HRES on 99.7% of test variables. For extreme weather events, GraphCast showed particular strength in tropical cyclone track prediction, outperforming traditional models by significant margins. Huawei’s Pangu-Weather, published in Nature the same year, achieved comparable accuracy, outperforming operational ECMWF forecasts for many variables at 10,000 times the speed. Nvidia’s earlier FourCastNet (2022) had pioneered the AI weather prediction approach at high resolution but fell short of operational accuracy — it was GraphCast and Pangu-Weather that proved AI could genuinely match and exceed physics-based models.
The implications extend far beyond weather. What GraphCast demonstrated is that AI can learn the physics of complex systems directly from data — in this case, 40 years of ERA5 reanalysis data — without being explicitly programmed with the governing equations. The same principle is now being applied to materials science, mathematics, biology, and fundamental physics, raising the question: is AI becoming not just a tool for scientists, but a scientist itself?
From GraphCast to GenCast: AI Weather Forecasting Goes Operational
The GraphCast breakthrough was just the beginning. In December 2024, DeepMind published GenCast in Nature, a next-generation probabilistic weather model that outperformed ECMWF’s ensemble forecast (ENS) on 97.2% of targets — and on 99.8% at lead times greater than 36 hours. Unlike GraphCast’s single-point deterministic forecasts, GenCast produces ensembles of 50 or more predictions representing the full range of possible weather scenarios, giving forecasters probability distributions rather than single answers. A 15-day ensemble forecast takes just 8 minutes on a single Google Cloud TPU v5.
The strongest validation came in 2025, when ECMWF itself — the institution whose gold-standard models AI had surpassed — adopted AI operationally. In February 2025, ECMWF launched its Artificial Intelligence Forecasting System (AIFS) as an operational product running alongside its traditional physics-based IFS. In July 2025, the ensemble version (AIFS ENS, 51 members) followed. AIFS ENS outperforms the physics-based ensemble for many measures, including surface temperature forecasts, with gains of up to 20% — while using up to 1,000 times less energy. The system is built on Anemoi, an open-source framework co-developed with ECMWF’s member states.
When the world’s premier weather forecasting agency runs AI models operationally, the paradigm shift is no longer theoretical. It is infrastructure.
Advertisement
Materials Science: 2.2 Million New Crystals — and Hard Questions
In November 2023 — the same month GraphCast was published — DeepMind announced GNoME (Graph Networks for Materials Exploration), an AI system that predicted the stability of 2.2 million new crystal structures, of which 381,000 were deemed stable and potentially synthesizable. To put this in perspective, the total number of known stable inorganic crystals discovered by human scientists across all of history was approximately 48,000. GNoME expanded the known landscape of stable materials by a factor of nearly eight, in a single research project. The results were published in Nature and the 380,000 stable materials were contributed to Berkeley Lab’s Materials Project database.
Materials science is foundational to technological progress. New battery chemistries require new electrode and electrolyte materials. More efficient solar cells depend on semiconductor materials with specific band gaps. Superconductors, catalysts, thermoelectrics — every frontier technology is ultimately constrained by the materials available to build it. Traditional materials discovery is painstakingly slow: a researcher hypothesizes a compound, synthesizes it in a lab, tests its properties, and publishes results. This process takes months to years per candidate material.
GNoME compressed the hypothesis-generation step from years to hours. Lawrence Berkeley National Laboratory’s A-Lab, an autonomous laboratory combining AI prediction with robotic synthesis, reported producing 41 of 58 GNoME-predicted materials it attempted over 17 days of continuous operation. However, a subsequent analysis by researchers at Princeton and University College London challenged these results, arguing that most of the supposedly novel materials were actually mixtures of already-known compounds and that the validation methodology was unreliable. The Berkeley team disputed the critique but acknowledged that at least two materials were not genuinely new. The episode highlights a recurring tension in AI-driven science: computational prediction is racing far ahead of experimental validation. As MIT Technology Review noted in late 2025, despite the hype around AI discovering millions of new materials, no breakthrough “miracle material” has yet emerged from these predictions — the most time-consuming and expensive step remains making them in the real world.
Mathematics, Biology, and the Expanding Frontier
AI’s scientific contributions extend well beyond weather and materials. In July 2024, DeepMind’s AlphaProof and AlphaGeometry 2 solved four of six problems at the International Mathematical Olympiad (IMO), earning 28 of 42 possible points — equivalent to a silver medal. AlphaProof solved two algebra problems and one number theory problem, including the competition’s hardest problem (solved by only five human contestants). AlphaProof combined a language model with the Lean formal proof system, learning to construct valid mathematical arguments through reinforcement learning. AlphaGeometry 2 proved the geometry problem.
Then, in February 2025, DeepMind escalated further. An advanced version of Gemini with Deep Think solved five of six IMO 2025 problems, earning 35 of 42 points — gold medal standard. Unlike AlphaProof, Gemini Deep Think operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions within the 4.5-hour competition time limit. The jump from silver to gold in a single year underscored how quickly AI mathematical reasoning is advancing.
In structural biology, AlphaFold’s impact continues to compound. Since predicting the 3D structures of virtually all known proteins (over 200 million) in 2022, AlphaFold has been cited over 20,000 times and used by more than 3 million researchers across 190 countries. In October 2024, Demis Hassabis and John Jumper received the Nobel Prize in Chemistry for AlphaFold — alongside David Baker for computational protein design — while Geoffrey Hinton and John Hopfield received the Nobel Prize in Physics for foundational neural network work. It was the first time AI research claimed both science Nobels in the same year. AlphaFold 3, released in May 2024, extended predictions to protein-DNA, protein-RNA, and protein-ligand interactions, modeling the full molecular machinery of cells. Code and weights were released for academic use in November 2024. In February 2026, DeepMind’s drug-discovery spin-off Isomorphic Labs announced a next-generation system that more than doubled AlphaFold 3’s performance on the most challenging drug design cases.
In particle physics, CERN is integrating AI-driven surrogate models and fast simulation methods to reduce the computational cost of modeling particle collisions, with its new AI Steering Committee (CAISC, established April 2025) formalizing strategy across the organization. In climate science, AI emulators are maturing rapidly: Allen AI’s ACE2 became the first climate emulator accurate for both climate variability and climate change, simulating 1,200 years of climate per day on a single H100 GPU — a 1,000-fold speedup over traditional models. Google’s NeuralGCM, a hybrid physics-AI model, has produced reasonable 30-year climate simulations, though stability issues remain. In genomics, AI systems continue to predict gene expression patterns, identify disease-associated variants, and design synthetic DNA sequences. The common thread is the same: AI excels at learning complex mappings from data, and science is fundamentally about mapping inputs to outputs in complex systems.
The Methodology Shift: From Hypothesis to Data
The deeper significance of these achievements is methodological. Traditional science follows the hypothetico-deductive method: formulate a hypothesis, design an experiment to test it, analyze the results, and refine the theory. This approach has driven scientific progress for centuries, but it has inherent limitations — a scientist can only test hypotheses they can conceive, and the hypothesis space is constrained by human cognitive capacity and prior knowledge.
AI-driven science inverts this process. Instead of starting with a theory and testing it against data, AI starts with data and discovers patterns that may suggest new theories. GNoME did not start with a hypothesis about which crystal structures should be stable — it learned the relationship between atomic composition, crystal geometry, and thermodynamic stability from existing data, then generalized to predict millions of new compounds. GraphCast did not encode the Navier-Stokes equations that govern atmospheric dynamics — it learned the temporal evolution of weather states directly from historical observations.
This inversion raises epistemological questions that scientists are actively debating. Can a pattern discovered by an AI be called a “discovery” if no human understands why the pattern exists? When GNoME predicts a stable crystal structure, it does not explain the physical mechanism of stability — it identifies a statistical regularity. Some philosophers of science argue that understanding requires explanation, not just prediction. Others counter that prediction is the ultimate test of scientific knowledge, and that explanation is a human cognitive preference, not a scientific requirement.
The practical resolution may be pragmatic. AI systems are tools that generate hypotheses at superhuman speed. Human scientists then test, validate, and explain the most promising AI-generated predictions. This division of labor — AI for hypothesis generation, humans for interpretation and experimental validation — may represent the future scientific method. The A-Lab controversy illustrates the stakes: when AI predictions outpace experimental capacity by orders of magnitude, the bottleneck shifts from ideas to verification. The risk is that science becomes dependent on opaque AI systems whose predictions cannot be audited, creating a “black box” at the foundation of knowledge. But the 2024 Nobel Prizes suggest the scientific establishment is embracing AI not as a threat to the method, but as its most powerful accelerant.
Advertisement
🧭 Decision Radar (Algeria Lens)
| Dimension | Assessment |
|---|---|
| Relevance for Algeria | High — weather prediction directly impacts agriculture and disaster preparedness; materials science underlies industrial development |
| Infrastructure Ready? | No — AI-driven science requires substantial compute and data infrastructure; Algeria’s research institutions are under-resourced |
| Skills Available? | Partial — Algerian universities have physicists, chemists, and mathematicians; AI/ML skills for scientific computing are developing |
| Action Timeline | 6-12 months — Algerian researchers can begin using existing AI tools (AlphaFold, GNoME databases) now; building original capability takes 2-4 years |
| Key Stakeholders | Algerian universities, DGRSDT (research directorate), meteorological services (ONM), agricultural research (INRAA), Sonatrach R&D |
| Decision Type | Strategic |
Quick Take: AI is producing scientific results that match or exceed human expert performance in weather prediction, materials discovery, and mathematical reasoning. Algerian researchers can already access many of these tools (AlphaFold database, GNoME materials data) to accelerate their own work. The critical question is whether Algeria’s research institutions will invest in the compute infrastructure and AI skills needed to participate in this paradigm shift.
Sources & Further Reading
- GraphCast: Learning Skillful Medium-Range Global Weather Forecasting — Science
- GenCast: Probabilistic Weather Forecasting with Machine Learning — Nature
- ECMWF’s AI Forecasts Become Operational — ECMWF
- GNoME: Scaling Deep Learning for Materials Discovery — Nature
- A-Lab: Autonomous Laboratory for Accelerated Synthesis — Nature
- AlphaProof and AlphaGeometry 2: AI Solves IMO Problems at Silver Medal Level — Google DeepMind
- Gemini Deep Think Achieves Gold Medal Standard at IMO — Google DeepMind
- AlphaFold 3: Accurate Structure Prediction of Biomolecular Interactions — Nature
- Nobel Prize in Chemistry 2024 — NobelPrize.org
- Pangu-Weather: Accurate Medium-Range Global Weather Forecasting — Nature
- AI Materials Discovery Now Needs to Move into the Real World — MIT Technology Review
- AlphaFold Protein Structure Database — DeepMind / EMBL-EBI
Advertisement