Getty Images/iStockphoto

Snowflake embraces open source with full support for Iceberg

Three years after providing limited support for open data storage, the data platform vendor is signaling its embrace of open source with full support for the open table format.

Snowflake on Tuesday unveiled full support for Apache Iceberg tables, now applying the same query performance, data sharing and governance capabilities natively applied to data stored in Snowflake platform to externally stored Iceberg tables.

Apache Iceberg is an open source table format for storing large data sets in open data lakes and lakehouses. Because it is a table format -- a data structure for organizing information -- rather than a file format, Iceberg can provide a metadata layer on top of data files, making it easier to manage data and discover relevant information to inform analytics and AI applications.

Snowflake previously enabled customers to use its platform in conjunction with Iceberg tables. However, there were limitations.

Until now, only some Snowflake core capabilities, such as governance and security, were available for Iceberg tables, which forced users to choose between the flexibility enabled by storing data in open tables and the full breadth of the Snowflake platform enabled by storing data in Snowflake.

Given that joint Snowflake and Iceberg users no longer have to make that choice, Snowflake's added support for Iceberg tables is significant, according to Michael Ni, an analyst at Constellation Research.

"This is Snowflake resolving the open versus proprietary tradeoff," he said. "[Snowflake has] been inching toward performance parity with Iceberg. Now they're saying you no longer have to compromise. It's not just a feature update. It's a clear statement that Snowflake is all-in on open formats."

Based in Bozeman, Mont., but with no central headquarters, Snowflake is a data cloud vendor that, like many data management specialists, has expanded into AI development over the past two years.

Full support

As decentralized data architectures such as data mesh become popular to connect an organization's data operations across different domains, open table formats that operate with different systems without forcing users to make copies of data are gaining popularity.

In addition, with enterprises fearing vendor lock-in, open source tools provide a way to develop data infrastructures without aligning closely with any one data management provider.

Popular open source table formats include Iceberg, Delta Lake and Apache Hudi, with Iceberg perhaps gaining the most momentum. As a result, many data management vendors are adding support for the open table format designed by Netflix in 2017 and released as an Apache project in 2021.

For example, Dremio, SingleStore and Starburst all enable users to store data in Iceberg tables. Even Snowflake rival Databricks, which helped develop Delta Lake and continues to advance its capabilities, has added support for Iceberg.

Snowflake introduced partial support for Iceberg tables in 2022. Now, with the vendor's open source Polaris Catalog, users can apply all of Snowflake's compute power, query performance improvements, data governance, data sharing, data security and disaster recovery capabilities to Iceberg data where it's stored.

"The real win is freedom," Ni said. "You don't need to get locked into Snowflake's format to leverage some of Snowflake's best features. … That's exactly what modern data teams are demanding."

Matt Aslett, an analyst at ISG Software Research, noted that Snowflake's addition of full support for Iceberg tables essentially elevates data stored in Iceberg tables to the same status as data stored in Snowflake, providing both with the same features and functionality.

As a result, it's an important addition for joint Snowflake and Iceberg users.

"While the additional functionality is an incremental improvement on Snowflake's existing support for Iceberg, it is significant in providing Snowflake users with greater flexibility and reduced complexity, " Aslett said.

Regarding the impetus for expanding its support of Iceberg tables, customer feedback was a motivating factor, according to Saurin Shah, a senior product manager at Snowflake.

"One thing is crystal clear -- openness is what customers want," he said. "It gives them cost efficiency, simplicity and, in many ways, serves as an insurance policy by offering the flexibility to choose what's best for their business without vendor lock-in."

Some enterprises prefer storing data in Snowflake with the vendor's native tables because it simplifies their data architecture. Others, however, prefer open source storage so they can centralize data across domains and better enable interoperability across systems, Shah continued.

"Our goal isn't to push one or the other," he said. "It's to empower customers with choice so they can architect in the way that best serves their needs."

While important for joint Snowflake and Iceberg users, the full application of data management capabilities to Iceberg tables and other open source table formats is not unique.

This is Snowflake resolving the open versus proprietary trade-off. [Snowflake has] been inching toward performance parity with Iceberg. Now they're saying you no longer have to compromise. It's not just a feature update. It's a clear statement that Snowflake is all-in on open formats.
Michael NiAnalyst, Constellation Research

Vendors such as Dremio and Starburst are closely aligned with Iceberg, essentially developing their platforms on top of Iceberg lakehouses, while Databricks is similarly aligned with Delta Lake.

"This is Snowflake joining the other leaders in the market in going beyond supporting and embracing open formats," Ni said. "They're saying. 'We're not just a warehouse; we're part of your open ecosystem.' And that's the market's expectation now -- open by default."

Aslett likewise said that Snowflake is not unique among data management vendors in supporting Iceberg tables. However, the support among vendors varies, with some providing partial support -- as Snowflake did until now -- and others more complete support.

"All data platform providers are adding support for Apache Iceberg, but the breadth and depth of support available does vary," Aslett said. "Snowflake is among the more progressive in terms of making Iceberg tables a core element of the platform, alongside its native table formats."

Next steps

Snowflake's full support of Iceberg tables is one way the vendor is attempting to make enterprise data easier to use for analytics and AI-driven analysis, according to Shah.

Snowflake's annual user conference is June 2-5. At the conference, the vendor plans to introduce new features aimed at enabling customers to develop and maintain AI models and applications trained on trusted data. 

"We're focused on helping enterprises maximize the potential of their AI investments, and this starts with the ability to tap into all of their data to drive these initiatives forward," Shah said.

Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data integration