SlideShare a Scribd company logo
1 of 45
Download to read offline
1
Realizing the promise of portability
with Apache Beam
https://s.apache.org/beam-portability-slides-jonthebeach
Tyler Akidau
Senior Staff Software Engineer at Google
Apache Beam PMC
@takidau
With many slides by Frances Perry (@francesjperry)
J On the Beach 2017
2
Apache Beam: Open Source data processing APIs
Expresses data-parallel batch and streaming
algorithms using one unified API
Cleanly separates data processing logic from
runtime requirements
Supports execution on multiple distributed
processing runtime environments
3
The evolution of Apache Beam
MapReduce Apache
Beam
Cloud
Dataflow
BigTable DremelColossus
FlumeMegastore Spanner
PubSub
Millwheel
4
Table of Contents
01
02
03
04
Expressing data-parallel pipelines with the Beam model
The Beam vision for portability
Parallel and portable pipelines in practice
Getting Started with Apache Beam
5
01 Expressing data-parallel pipelines
with the Beam Model
A unified model for batch and streaming
6
Processing time vs. event time
7
The Beam Model: asking the right questions
What results are calculated?
Where in event time are results calculated?
When in processing time are results materialized?
How do refinements of results relate?
8
The Beam Model: What is being computed?
PCollection<KV<String, Integer>> input = IO.read(...)
.apply(ParDo.of(new ParseFn());
.apply(Sum.integersPerKey());
10
The Beam Model: Where in event time?
PCollection<KV<String, Integer>> input = IO.read(...)
.apply(ParDo.of(new ParseFn());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)))
.apply(Sum.integersPerKey());
12
The Beam Model: When in processing time?
PCollection<KV<String, Integer>> input = IO.read(...)
.apply(ParDo.of(new ParseFn());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))
.triggering(AtWatermark())
.apply(Sum.integersPerKey());
14
The Beam Model: How do refinements relate?
PCollection<KV<String, Integer>> input = IO.read(...)
.apply(ParDo.of(new ParseFn());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))
.triggering(AtWatermark()
.withEarlyFirings(
AtPeriod(Duration.standardMinutes(1)))
.withLateFirings(AtCount(1)))
.accumulatingFiredPanes())
.apply(Sum.integersPerKey());
16
Customizing What/Where/When/How
3. Streaming 4. Streaming + Accumulation
1. Classic Batch 2. Windowed Batch
17
02 The Beam vision for portability
“Write once, run anywhere”
18
Beam Vision: mix and match SDKs and runtimes
● The Beam Model: the abstractions
at the core of Apache BeamLanguage A
SDK
Language C
SDK
Runner 1 Runner 3Runner 2
● Choice of SDK: Users write their
pipelines in a language that’s
familiar and integrated with their
other tooling
● Choice of Runners: Users choose
the right runtime for their current
needs -- on-prem / cloud, open
source / not, fully managed / not
● Scalability for Developers: Clean
APIs allow developers to contribute
modules independently
The Beam Model
Language A Language CLanguage B
The Beam Model
Language B
SDK
19
Beam Vision: as of May 2017
First stable release: Beam 2.0.0
Beam’s Java SDK runs on multiple
runtime environments, including:
Apache Apex
Apache Flink
Apache Spark
Google Cloud Dataflow
[in development] Apache Gearpump
Cross-language infrastructure is in
progress.
Beam’s Python SDK currently runs
on Google Cloud Dataflow
Beam Model: Fn Runners
Apache
Spark
Cloud
Dataflow
Beam Model: Pipeline Construction
Apache
Flink
JavaPython
Apache
Apex
Apache
Gearpump
Python Java
20
Example Beam Runners
Apache Spark
● Open-source
cluster-computing
framework
● Large ecosystem of
APIs and tools
● Runs on premise or in
the cloud
Apache Flink
● Open-source
distributed data
processing engine
● High-throughput and
low-latency stream
processing
● Runs on premise or in
the cloud
Google Cloud Dataflow
● Fully-managed service
for batch and stream
data processing
● Provides dynamic
auto-scaling,
monitoring tools, and
tight integration with
Google Cloud
Platform
21
How do you build an abstraction layer?
Apache
Spark
Cloud
Dataflow
Apache
Flink
????????
????????
22
Beam: the intersection of runner functionality?
23
Beam: the union of runner functionality?
24
Beam: the future!
25
Categorizing Runner Capabilities
https://s.apache.org/beam-capability-matrix
26
03 Parallel and portable pipelines
in practice
Demo time
27
Demo!
(sort of)
43
04 Getting started with Apache Beam
Beaming into the future
44
Learn more!
Apache Beam
beam.apache.org
Demo code
github.com/davorbonaci/beam-portability-demo
The World Beyond Batch: Streaming 101 and 102
www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
The DataflowBeam Model paper, VLDB 2015
vldb.org/pvldb/vol8/p1792-Akidau.pdf
Streaming Systems book
www.streamingsystems.net
@takidau on Twitter
45
05 Demo Screenshots
Because if I make them, I won’t need them (famous last words)

More Related Content

What's hot

Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Flink Forward
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Flink Forward
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink StreamingTuri, Inc.
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...confluent
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache FlinkAKASH SIHAG
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Animesh Singh
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Flink Forward
 
Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Animesh Singh
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 PivotalOpenSourceHub
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIAnimesh Singh
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingAnimesh Singh
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
 

What's hot (20)

Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
 
Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
 

Similar to Realizing the promise of portability with Apache Beam

Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beamconfluent
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...DataWorks Summit
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
ApacheBeam_Google_Theater_TalendConnect2017.pdf
ApacheBeam_Google_Theater_TalendConnect2017.pdfApacheBeam_Google_Theater_TalendConnect2017.pdf
ApacheBeam_Google_Theater_TalendConnect2017.pdfRAJA RAY
 
ApacheBeam_Google_Theater_TalendConnect2017.pptx
ApacheBeam_Google_Theater_TalendConnect2017.pptxApacheBeam_Google_Theater_TalendConnect2017.pptx
ApacheBeam_Google_Theater_TalendConnect2017.pptxRAJA RAY
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamDataWorks Summit
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationKnoldus Inc.
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationKnoldus Inc.
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Wes McKinney
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 

Similar to Realizing the promise of portability with Apache Beam (20)

Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
ApacheBeam_Google_Theater_TalendConnect2017.pdf
ApacheBeam_Google_Theater_TalendConnect2017.pdfApacheBeam_Google_Theater_TalendConnect2017.pdf
ApacheBeam_Google_Theater_TalendConnect2017.pdf
 
ApacheBeam_Google_Theater_TalendConnect2017.pptx
ApacheBeam_Google_Theater_TalendConnect2017.pptxApacheBeam_Google_Theater_TalendConnect2017.pptx
ApacheBeam_Google_Theater_TalendConnect2017.pptx
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow Presentation
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow Presentation
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 

More from J On The Beach

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayJ On The Beach
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t HaveJ On The Beach
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...J On The Beach
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoTJ On The Beach
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsJ On The Beach
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternJ On The Beach
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorJ On The Beach
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.J On The Beach
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EEJ On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorJ On The Beach
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTingJ On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...J On The Beach
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysJ On The Beach
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to failJ On The Beach
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersJ On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...J On The Beach
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every levelJ On The Beach
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
 

More from J On The Beach (20)

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 

Recently uploaded

Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 

Recently uploaded (20)

Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 

Realizing the promise of portability with Apache Beam

  • 1. 1 Realizing the promise of portability with Apache Beam https://s.apache.org/beam-portability-slides-jonthebeach Tyler Akidau Senior Staff Software Engineer at Google Apache Beam PMC @takidau With many slides by Frances Perry (@francesjperry) J On the Beach 2017
  • 2. 2 Apache Beam: Open Source data processing APIs Expresses data-parallel batch and streaming algorithms using one unified API Cleanly separates data processing logic from runtime requirements Supports execution on multiple distributed processing runtime environments
  • 3. 3 The evolution of Apache Beam MapReduce Apache Beam Cloud Dataflow BigTable DremelColossus FlumeMegastore Spanner PubSub Millwheel
  • 4. 4 Table of Contents 01 02 03 04 Expressing data-parallel pipelines with the Beam model The Beam vision for portability Parallel and portable pipelines in practice Getting Started with Apache Beam
  • 5. 5 01 Expressing data-parallel pipelines with the Beam Model A unified model for batch and streaming
  • 7. 7 The Beam Model: asking the right questions What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate?
  • 8. 8 The Beam Model: What is being computed? PCollection<KV<String, Integer>> input = IO.read(...) .apply(ParDo.of(new ParseFn()); .apply(Sum.integersPerKey());
  • 9.
  • 10. 10 The Beam Model: Where in event time? PCollection<KV<String, Integer>> input = IO.read(...) .apply(ParDo.of(new ParseFn()); .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))) .apply(Sum.integersPerKey());
  • 11.
  • 12. 12 The Beam Model: When in processing time? PCollection<KV<String, Integer>> input = IO.read(...) .apply(ParDo.of(new ParseFn()); .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)) .triggering(AtWatermark()) .apply(Sum.integersPerKey());
  • 13.
  • 14. 14 The Beam Model: How do refinements relate? PCollection<KV<String, Integer>> input = IO.read(...) .apply(ParDo.of(new ParseFn()); .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)) .triggering(AtWatermark() .withEarlyFirings( AtPeriod(Duration.standardMinutes(1))) .withLateFirings(AtCount(1))) .accumulatingFiredPanes()) .apply(Sum.integersPerKey());
  • 15.
  • 16. 16 Customizing What/Where/When/How 3. Streaming 4. Streaming + Accumulation 1. Classic Batch 2. Windowed Batch
  • 17. 17 02 The Beam vision for portability “Write once, run anywhere”
  • 18. 18 Beam Vision: mix and match SDKs and runtimes ● The Beam Model: the abstractions at the core of Apache BeamLanguage A SDK Language C SDK Runner 1 Runner 3Runner 2 ● Choice of SDK: Users write their pipelines in a language that’s familiar and integrated with their other tooling ● Choice of Runners: Users choose the right runtime for their current needs -- on-prem / cloud, open source / not, fully managed / not ● Scalability for Developers: Clean APIs allow developers to contribute modules independently The Beam Model Language A Language CLanguage B The Beam Model Language B SDK
  • 19. 19 Beam Vision: as of May 2017 First stable release: Beam 2.0.0 Beam’s Java SDK runs on multiple runtime environments, including: Apache Apex Apache Flink Apache Spark Google Cloud Dataflow [in development] Apache Gearpump Cross-language infrastructure is in progress. Beam’s Python SDK currently runs on Google Cloud Dataflow Beam Model: Fn Runners Apache Spark Cloud Dataflow Beam Model: Pipeline Construction Apache Flink JavaPython Apache Apex Apache Gearpump Python Java
  • 20. 20 Example Beam Runners Apache Spark ● Open-source cluster-computing framework ● Large ecosystem of APIs and tools ● Runs on premise or in the cloud Apache Flink ● Open-source distributed data processing engine ● High-throughput and low-latency stream processing ● Runs on premise or in the cloud Google Cloud Dataflow ● Fully-managed service for batch and stream data processing ● Provides dynamic auto-scaling, monitoring tools, and tight integration with Google Cloud Platform
  • 21. 21 How do you build an abstraction layer? Apache Spark Cloud Dataflow Apache Flink ???????? ????????
  • 22. 22 Beam: the intersection of runner functionality?
  • 23. 23 Beam: the union of runner functionality?
  • 26. 26 03 Parallel and portable pipelines in practice Demo time
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. 43 04 Getting started with Apache Beam Beaming into the future
  • 44. 44 Learn more! Apache Beam beam.apache.org Demo code github.com/davorbonaci/beam-portability-demo The World Beyond Batch: Streaming 101 and 102 www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 The DataflowBeam Model paper, VLDB 2015 vldb.org/pvldb/vol8/p1792-Akidau.pdf Streaming Systems book www.streamingsystems.net @takidau on Twitter
  • 45. 45 05 Demo Screenshots Because if I make them, I won’t need them (famous last words)