Authoring your first pipeline

Presented at Beam College 2026

This hands-on session guides beginners through creating their first Apache Beam pipeline from scratch. We’ll start with core Beam concepts—PCollections, PTransforms, and the Pipeline object—then walk through a practical example building a data processing pipeline step by step. You’ll learn how to read data from sources, apply transformations like Map, FlatMap, and GroupByKey, and write results to sinks. The session covers common patterns, debugging techniques, and best practices for structuring your pipeline code. We’ll also explore how these foundational concepts translate to real-world MLOps scenarios like feature engineering pipelines and batch inference workflows. Whether you’re new to Beam or looking to integrate it into your ML platform, you’ll leave with the confidence to start building production-ready pipelines on runners like Dataflow.

Key points addressed:

Core Apache Beam concepts (PCollections, PTransforms, Pipeline)
Reading from sources and writing to sinks
Essential transformations (Map, FlatMap, GroupByKey, Combine)
Pipeline structure and best practices
Local testing and debugging strategies
Real-world applications in MLOps and production deployment

Instructor(s):

Authoring your first pipeline

Raj Katakam