Beam College 2025 Sessions

Check out the schedule of sessions for Beam College 2025.

All times in UTC.

Title
Track
Time
How Apache Beam sets you up for a generative AI world
by Mehran Nazir
This session provides an overview of Apache Beam and its place in the generative AI space. What problem does Beam solve? What are its advantages? Why and how are companies using it?
Fundamentals
May-15 15:00
Getting Started: Intro to Creating a Beam Pipeline
by Sascha Kerbler
In this introductory session, Sascha provides a technical overview of Apache Beam and how it enables developers to build data pipelines that run on various frameworks like Spark and Flink. It supports both batch and stream processing, providing flexibility for diverse data needs. Beam’s SDKs are available in Go, Python, and Java, allowing developers to choose their preferred language.
Fundamentals
May-15 15:30
Implementing a Complex ML Pipeline in Beam
by Kerry Donny-Clark & Danny McCormick
In this session we explain how to implement a complex ML pipeline with Apache Beam. The pipeline we will build takes audio data, convert it to text, classify it to identify the topic or subject, feed it to a LLM and then take the output of the model and turn it back to voice.
Fundamentals
May-15 16:00
Implementing a ML Pipeline with Google AI Studio
by Israel Herraiz
This tutorial demonstrates how to perform streaming inference with Apache Beam and Google AI Studio’s Gemini model, focusing on a geography-based example to get country capitals.
Fundamentals
May-15 17:00
Making the Jump from Batch to Streaming
by Yi Hu
This session dives into Apache Beam’s streaming primitives, focusing on reading from unbounded sources using Splittable DoFn and Unbounded Source, windowing strategies, and triggers with accumulation modes.
Fundamentals
May-15 17:45
Building Scalable Semantic Search and RAG Pipelines
by Claude van der Merwe
This presentation introduces vector-based semantic search and Retrieval Augmented Generation (RAG), demonstrating how to build scalable pipelines for using Apache Beam. We’ll start by explaining fundamental concepts like chunking, embeddings and vector similarity. Then we’ll explore semantic search applications before extending to full RAG systems.
New features
May-16 15:00
Real-Time Anomaly Detection with Apache Beam
by Shunping Huang
Real-time anomaly detection is essential for identifying unexpected patterns and critical events in streaming data. This talk addresses the unique algorithmic challenges of anomaly detection in streaming environments and introduces a new feature within Apache Beam designed for this purpose. We will demonstrate how to seamlessly integrate both online and pre-trained offline anomaly detection models into Beam pipelines, empowering users to build robust, scalable, and real-time anomaly detection systems.
New features
May-16 15:30
The Dataflow Job Builder
by Ryan Madden
Learn how you can create low-code and no-code Beam YAML jobs in the Cloud Dataflow UI.
New features
May-16 16:00
YAML: a new SDK to author your pipelines.
by Svetak Sundhar
In this talk, we explore a new way of authoring and running your Beam pipelines; via the YAML SDK! Learn how you can split your pipeline infrastructure from your complex processing logic.
New features
May-16 16:30
Real-Time Streaming with Kafka
by Tom Stepp
Explore real-time streaming pipelines with Kafka I/O. This session will share best practices for optimizing Kafka I/O performance and cost-efficiency, including strategies like redistribute transforms and offset-based deduplication. We will also cover integrating Dataflow with Google Managed Kafka for scalable data processing.
New features
May-16 17:30
Stateful processing In Apache Beam
by Rakesh Kumar
The stateful processing interface in Apache Beam serves as a versatile tool for data processing, empowering users with advanced capabilities to handle complex workflows. This session will delve into the diverse functionalities provided by stateful processing, illustrating their practical applications through clear and concise code examples.
New features
May-16 18:00