Scaling Iceberg Ingestion with Apache Beam

Presented at Beam College 2026

This session explores the technical evolution of Apache Iceberg integration within the Apache Beam ecosystem. We dive into a suite of recent performance enhancements designed to streamline data lake ingestion at scale. Key topics include the adoption of table-defined compression for improved processing and storage efficiency, and the implementation of metadata caching to minimize lookups and prevent metadata service quota exhaustion. We also examine direct write capabilities that bypass expensive processing for large bundles, and autosharding mechanisms that optimize file sizes and ensure horizontal scalability.

Join us to learn how these establish an efficient and cost-effective performance baseline for streaming pipelines.

Instructor(s):

Scaling Iceberg Ingestion with Apache Beam

Tom Stepp