The Data Lakehouse Revolution: How Apache Iceberg and Delta Lake Are Reshaping Data Architecture

The Architecture That Ate Both Worlds

For two decades, enterprise data lived in one of two places. Structured data — transactions, customer records, financial reports — went into data warehouses: Teradata, Oracle, and later Snowflake and BigQuery. Unstructured and semi-structured data — logs, sensor readings, JSON blobs, images — went into data lakes: Hadoop clusters and later cloud object storage like S3. The two systems served different purposes, used different tools, and rarely talked to each other efficiently. Teams that needed both maintained two copies of data, two sets of pipelines, and two sets of skills.

The data lakehouse eliminates this divide. It applies warehouse-style structure — schemas, ACID transactions, time travel, indexing — directly to data sitting in open file formats on cloud object storage. You get the reliability and query performance of a warehouse with the scalability, openness, and cost of a lake. Databricks coined the term in 2020, but by 2026 the lakehouse pattern has been adopted by virtually every major data platform vendor. It is no longer a concept to debate; it is the default architecture for new enterprise data platforms.

The enabling technology behind this shift is the open table format — a metadata layer that sits between compute engines and storage files. Two formats dominate: Apache Iceberg and Delta Lake. Their competition, and the ecosystem consolidation happening around them, is the most consequential architectural story in data engineering today.

Apache Iceberg: The Open Standard Winning the Industry

Apache Iceberg was created at Netflix in 2017 by Ryan Blue and Dan Weeks to solve a specific problem: managing petabyte-scale tables in S3 with reliable, concurrent access from multiple compute engines. Netflix open-sourced it and donated it to the Apache Software Foundation in November 2018, and it graduated as an Apache Top-Level Project in May 2020. What happened next was an industry tidal wave.

Apple adopted Iceberg across all its divisions, managing petabyte-scale tables spanning a wide range of use cases — from real-time streaming and micro-batches to traditional ETL workloads. LinkedIn, Airbnb, and Expedia followed. Then, in 2022, Snowflake announced native Iceberg Tables support — allowing Snowflake to read and write Iceberg-format data in the customer’s own cloud storage rather than Snowflake’s proprietary internal format. This was a seismic shift: the world’s most prominent data warehouse company was endorsing an open format that reduced vendor lock-in to its own platform. By 2025, Snowflake had elevated Iceberg to a first-class table format with full lifecycle management, automatic compaction, and write support for externally managed Iceberg tables reaching general availability in October 2025.

Iceberg’s technical design explains its adoption velocity. It uses a tree of metadata files that track every change to a table, enabling snapshot isolation (multiple readers and writers without conflicts), time travel (querying the table as it existed at any past point), and schema evolution (adding, removing, or renaming columns without rewriting data). Crucially, Iceberg supports multiple compute engines simultaneously: Spark, Trino, Flink, Dremio, Snowflake, and Athena can all read and write the same Iceberg table. This engine interoperability is Iceberg’s killer feature — it prevents any single vendor from owning the data layer, giving enterprises genuine portability and the ability to use the best engine for each workload.

The Iceberg V3 specification, rolling out through releases 1.8.0 to 1.10.0 in 2025, adds significant new capabilities: binary deletion vectors that dramatically improve row-level update performance, default column values for instant schema evolution without rewriting data, a variant type for semi-structured data, native geospatial types, and row lineage tracking with unique row IDs. These features cement Iceberg’s position as the most technically advanced open table format.

Market data confirms the momentum. Current adoption sits at roughly 31% for Iceberg, with planned three-year adoption rates of 29% for Iceberg versus 23% for Delta Lake — indicating Iceberg is set to pull further ahead. The Iceberg project also has nearly double the unique contributors compared to Delta Lake. AWS, Google Cloud, and Microsoft Azure have all integrated Iceberg deeply into their data platforms, with Amazon Redshift reaching general availability for writing to Iceberg tables in 2025.

Delta Lake: Databricks’ Foundation Under Pressure

Delta Lake was created by Databricks and open-sourced in 2019 as the storage layer for the Databricks Lakehouse Platform. It provides similar capabilities to Iceberg — ACID transactions, time travel, schema enforcement — and has been the default format for Databricks’ massive customer base. Delta Lake processes over 10 exabytes of data daily across Databricks’ customer base, and the platform serves over 60% of the Fortune 500.

For several years, Delta Lake and Iceberg coexisted with minimal direct competition — Delta was the Databricks ecosystem format, Iceberg was the multi-engine open format. But the lines blurred as the industry converged on lakehouse architecture. Databricks responded to Iceberg’s momentum with Delta Lake UniForm, a compatibility layer that allows Delta tables to be read as Iceberg tables (and Apache Hudi tables) by external engines. UniForm reached general availability in 2025, and benchmarks show that read performance for Delta tables via UniForm is comparable to native Snowflake managed Iceberg — near-zero performance overhead. UniForm has been validated with Snowflake, BigQuery, Redshift, and Athena. The message is pragmatic: Delta’s write-side optimizations and Databricks integration make it the better primary format for Databricks-centric shops, while UniForm ensures interoperability with the rest of the ecosystem.

The competitive dynamics intensified in June 2024 when Databricks acquired Tabular — the company founded by Iceberg’s original creators Ryan Blue, Daniel Weeks, and Jason Reid — for approximately $2 billion. This acquisition sent shockwaves through the data industry. Databricks now stewards both Delta Lake and has significant influence over Iceberg’s development. The company has signaled that it will support both formats and work toward convergence, but the acquisition raised concerns about whether Iceberg’s truly vendor-neutral governance would survive. The Apache Software Foundation’s governance model provides some protection, but the key engineering talent now sits inside Databricks.

Snowflake vs. Databricks: The Platform War Behind the Format War

The open table format competition is inseparable from the broader platform war between Snowflake and Databricks — the two companies that have defined modern data architecture and together command over $10 billion in combined annual revenue run-rate. Databricks reached approximately $5.4 billion ARR growing at 65% year-over-year, while Snowflake sits at approximately $4.84 billion ARR. Understanding the format competition requires understanding the platform strategies.

Snowflake’s embrace of Iceberg is strategic offense. By adopting the open format, Snowflake positions itself as the premium query engine that works with your data wherever it lives. Customers can start with Iceberg tables in their own S3 or Azure Storage, query them through Snowflake, and switch engines without migrating data. This counters the historical criticism that Snowflake’s proprietary storage format created lock-in. Snowflake reinforced this strategy by open-sourcing Polaris Catalog in July 2024 — an implementation of the Iceberg REST Catalog specification — and donating it to the Apache Foundation. Apache Polaris is heading toward graduation in 2026 and is gaining support for multi-format cataloging, with planned compatibility for Hudi and Delta tables alongside Iceberg. The catalog layer is becoming the new strategic battleground.

Databricks’ strategy is more complex. The company pioneered the lakehouse concept and has the strongest ML/AI integration story — its acquisition of MosaicML for $1.3 billion in 2023 and its development of the DBRX large language model (136 billion parameters, mixture-of-experts architecture) demonstrate a vision where data engineering and AI model training converge on the same platform. Delta Lake’s tighter integration with Spark (Databricks’ execution engine) and the Photon query engine gives it performance advantages for Databricks-native workloads. Databricks responded to Snowflake’s catalog move by open-sourcing Unity Catalog in June 2024, which supports the Iceberg REST Catalog API and enables external engines to read (GA) and write (public preview) to Unity Catalog-managed Iceberg tables. With the Tabular acquisition, Databricks can influence both formats while arguing for convergence. The risk is that customers perceive Databricks as co-opting the open ecosystem — the same criticism that clouded Hadoop vendor strategies a decade ago.

For enterprise data teams, the format war has a clear winner: openness. Whether an organization chooses Iceberg or Delta as its primary format, both are open-source, both store data in Parquet files on cloud object storage, and both support multi-engine access through either native support or UniForm compatibility. The days of data being locked inside a proprietary warehouse are ending. The practical advice for 2026 is to pick the format that aligns with your primary compute engine — Iceberg for Snowflake, Trino, or multi-engine environments, Delta for Databricks-centric shops — and invest in an open catalog (Polaris or Unity Catalog) that provides a governance layer above the format itself.

What the Lakehouse Means for AI/ML Pipelines

The lakehouse architecture’s impact on AI and machine learning is arguably more significant than its impact on traditional analytics — and this is where the story connects to the dominant technology trend of the decade. AI/ML pipelines have fundamentally different data requirements than BI dashboards. They need structured training data alongside unstructured data (images, text, audio). They need feature stores that serve both batch training and real-time inference. They need data versioning to reproduce model training runs. And they need to process massive volumes without the cost of loading everything into a warehouse.

The lakehouse handles all of these requirements natively. Iceberg and Delta both support table versioning and time travel, making training data reproducibility straightforward. Both store data in Parquet — a columnar format that ML frameworks like PyTorch and TensorFlow can read efficiently. Both support schema evolution, which matters when feature engineering adds new columns to training datasets. And both sit on cloud object storage, which scales to petabytes at a fraction of warehouse storage costs. Iceberg V3’s new variant type further strengthens the AI story by providing native handling of semi-structured data — the kind of mixed-format payloads common in ML feature pipelines.

The practical convergence is happening in real-time. Databricks’ Feature Store is built on Delta tables. Snowflake’s Cortex AI services operate on Iceberg tables. Both platforms offer integrations with MLflow, Weights & Biases, and other ML experiment tracking tools. For data teams, this means the same infrastructure that serves their dashboards and reports also serves their AI model training — eliminating the data movement and duplication that historically slowed ML projects. The lakehouse is not just a better data warehouse; it is the data platform for the AI era.

🧭 Decision Radar (Algeria Lens)

Dimension	Assessment
Relevance for Algeria	Medium — lakehouse architecture is the future of enterprise data platforms; Algerian enterprises adopting cloud analytics will encounter this pattern
Infrastructure Ready?	Partial — lakehouse tools (Iceberg, Delta, Spark) are open-source and cloud-accessible, but Algeria lacks local cloud regions; nearest are France, Spain, and Bahrain
Skills Available?	Partial — Algerian data professionals have SQL and Python skills that transfer; lakehouse-specific expertise (Iceberg, Delta, Spark) requires targeted upskilling
Action Timeline	12-24 months — relevant for organizations planning data platform modernization; immediate for individual data engineers building future-ready skills
Key Stakeholders	Enterprise data teams, Sonatrach and large state enterprises with analytics needs, cloud platform architects, data engineering professionals
Decision Type	Strategic

Quick Take: The data lakehouse is becoming the default architecture for enterprise data platforms, replacing both traditional warehouses and raw data lakes. For Algerian organizations planning data infrastructure, adopting open table formats (Iceberg or Delta) avoids vendor lock-in and future-proofs investments. For individual data engineers, lakehouse skills are increasingly mandatory for competitive positioning in the global job market.

The Architecture That Ate Both Worlds

Apache Iceberg: The Open Standard Winning the Industry

Delta Lake: Databricks’ Foundation Under Pressure

Snowflake vs. Databricks: The Platform War Behind the Format War

What the Lakehouse Means for AI/ML Pipelines

🧭 Decision Radar (Algeria Lens)

Sources & Further Reading

Leave a Comment Cancel reply

Most recent

Digital Economy

After Jumia’s Exit: Who Will Win Algeria’s E-Commerce Market?

Policy & Regulation

Digital Accessibility Laws: How WCAG Mandates and the EU Accessibility Act Are Reshaping the Web

AI & Automation

AI at the Border: How Algeria’s Customs and Port Systems Are Going Digital

Skills & Careers

The Algerian Developer Stack: What Languages, Frameworks, and Tools Algerian Developers Actually Use in 2026