SlideShare a Scribd company logo
1 of 19
PORTABLE BATCH AND
STREAM DATA PROCESSING
WITH APACHE BEAM
William Vambenepe, Google
@vambenepe
SPEAKERS INFO
WILLIAM VAMBENEPE
Group Product Manager
Data Processing and Analytics
Google Cloud Platform
@vambenepe
 Open source (top-level Apache project)
 Portable
 Unifies batch and stream
 Cloud-native
 Built on 15 years of large scale data processing at Google
You don’t need to be a developer to benefit from Beam
APACHE BEAM: THE KEY TO MODERN DATA PROCESSING
MapReduce
Apache
Beam
Cloud
Dataflow
BigTable Dremel
Colossus
Flume
Megastore Spanner
PubSub
Millwheel
THE EVOLUTION OF DATA PIPELINES
BEAM = Batch + StrEAM
Progressive evolution from batch to stream
- Stream as the new default
Cost/perf trade-offs without re-architecting
- Just turn the knob
ML: data preparation consistency between training & scoring
- Same pipeline to train in batch and score in stream
BENEFIT OF BATCH / STREAM UNIFICATION
PROCESSING
TIME VS.
EVENT TIME
What results are calculated?
Where in event time are results calculated?
When in processing time are results materialized?
How do refinements of results relate?
THE BEAM MODEL: ASKING THE RIGHT QUESTIONS
The Beam Model:
is being
computed?
WHAT
WHERE
The Beam Model:
in event
time ?
WHEN
The Beam Model:
in processing
time ?
HOW
The Beam Model:
do refinements
relate?
What results are calculated?
Where in event time are results calculated?
When in processing time are results materialized?
How do refinements of results relate?
THE BEAM MODEL: ASKING THE RIGHT QUESTIONS
PORTABLE
Write once, run anywhere
 The Beam Model: the abstractions at the
core of Apache Beam
 Choice of API: Users write their pipelines in
a language that’s familiar and integrated with
their other tooling
 Choice of Runtime: Users choose the right
runner for their current needs -- on-prem /
cloud, open source / not, fully managed / not
 Scalability for Developers: Clean APIs allow
developers to contribute modules independently
Language B
SDK
Language A
SDK
Language C
SDK
Runner
1
Runner
3
Runner
2
The Beam Model
Language A
Language
C
Language B
The Beam Model
BEAM VISION: MIX AND MATCH SDKS AND RUNTIMES
APACHE SPARK
 Open-source cluster-
computing framework
 Large ecosystem of
APIs and tools
 Runs on premise or
in the cloud
APACHE FLINK
 Open-source distributed data
processing engine
 High-throughput and
low-latency stream processing
 Runs on premise or in the cloud
EXAMPLE BEAM RUNNERS
GOOGLE CLOUD DATAFLOW
 Fully-managed service for batch and
stream data processing
 Provides dynamic auto-scaling,
monitoring tools, and tight integration
with Google Cloud Platform
GA 360
Cloud
Pub/Sub
BigQuery Storage
(tables)
Cloud Bigtable
(NoSQL)
Cloud Storage
(files)
Cloud Dataflow
Capture Store Analyze
Stackdriver
Process
Stream
Use
Cloud Dataproc
Cloud Datalab
Real-time analytics
Real-time
dashboard
Real-time
alerts
ML Engine
Batch
Firebase
Storage
Transfer
Service
Cloud
Dataflow
etc...
SQL
BigQuery Analytics
Adwords
DoubleClick
YouTube
BEAM ON GOOGLE CLOUD: SERVERLESS DATA PROCESSING
BEAM
MORE INFO
 Apache Beam: https://beam.apache.org
 Google Cloud Platform: https://cloud.google.com
The Dataflow Model paper from VLDB 2015
http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
Streaming 101 and 102: The World Beyond Batch
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
THANK YOU!

More Related Content

Similar to ApacheBeam_Google_Theater_TalendConnect2017.pptx

Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAmazon Web Services
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...DataWorks Summit
 
Deep dive session - sap and aws - extend and innovate
Deep dive session - sap and aws - extend and innovateDeep dive session - sap and aws - extend and innovate
Deep dive session - sap and aws - extend and innovateRitesh Toshniwal
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Affordable Workflow Options for APEX
Affordable Workflow Options for APEXAffordable Workflow Options for APEX
Affordable Workflow Options for APEXNiels de Bruijn
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Porting an Open Source Lp Solver to Web Assembly
 Porting an Open Source Lp Solver to Web Assembly Porting an Open Source Lp Solver to Web Assembly
Porting an Open Source Lp Solver to Web AssemblyFabion Kauker
 
Beginner's Guide: Programming with ABAP on HANA
Beginner's Guide: Programming with ABAP on HANABeginner's Guide: Programming with ABAP on HANA
Beginner's Guide: Programming with ABAP on HANAAshish Saxena
 
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...Amazon Web Services
 
Create and Manage APIs with API Connect, Swagger and Bluemix
Create and Manage APIs with API Connect, Swagger and BluemixCreate and Manage APIs with API Connect, Swagger and Bluemix
Create and Manage APIs with API Connect, Swagger and BluemixDev_Events
 
SITIST 2015 Dev - Abap on Hana
SITIST 2015 Dev - Abap on HanaSITIST 2015 Dev - Abap on Hana
SITIST 2015 Dev - Abap on Hanasitist
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCMark Smith
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshopShareThis
 
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014Amazon Web Services
 
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud
AWS Webcast - The Business Value of Running SAP Solutions on the AWS CloudAWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud
AWS Webcast - The Business Value of Running SAP Solutions on the AWS CloudAmazon Web Services
 

Similar to ApacheBeam_Google_Theater_TalendConnect2017.pptx (20)

Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWS
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Deep dive session - sap and aws - extend and innovate
Deep dive session - sap and aws - extend and innovateDeep dive session - sap and aws - extend and innovate
Deep dive session - sap and aws - extend and innovate
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
PowerApps
PowerAppsPowerApps
PowerApps
 
Intro to Google Cloud Platform Data Engineering.
Intro to Google Cloud Platform Data Engineering.Intro to Google Cloud Platform Data Engineering.
Intro to Google Cloud Platform Data Engineering.
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Affordable Workflow Options for APEX
Affordable Workflow Options for APEXAffordable Workflow Options for APEX
Affordable Workflow Options for APEX
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Porting an Open Source Lp Solver to Web Assembly
 Porting an Open Source Lp Solver to Web Assembly Porting an Open Source Lp Solver to Web Assembly
Porting an Open Source Lp Solver to Web Assembly
 
Beginner's Guide: Programming with ABAP on HANA
Beginner's Guide: Programming with ABAP on HANABeginner's Guide: Programming with ABAP on HANA
Beginner's Guide: Programming with ABAP on HANA
 
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
 
Create and Manage APIs with API Connect, Swagger and Bluemix
Create and Manage APIs with API Connect, Swagger and BluemixCreate and Manage APIs with API Connect, Swagger and Bluemix
Create and Manage APIs with API Connect, Swagger and Bluemix
 
SITIST 2015 Dev - Abap on Hana
SITIST 2015 Dev - Abap on HanaSITIST 2015 Dev - Abap on Hana
SITIST 2015 Dev - Abap on Hana
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshop
 
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
 
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud
AWS Webcast - The Business Value of Running SAP Solutions on the AWS CloudAWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

ApacheBeam_Google_Theater_TalendConnect2017.pptx

  • 1. PORTABLE BATCH AND STREAM DATA PROCESSING WITH APACHE BEAM William Vambenepe, Google @vambenepe
  • 2. SPEAKERS INFO WILLIAM VAMBENEPE Group Product Manager Data Processing and Analytics Google Cloud Platform @vambenepe
  • 3.  Open source (top-level Apache project)  Portable  Unifies batch and stream  Cloud-native  Built on 15 years of large scale data processing at Google You don’t need to be a developer to benefit from Beam APACHE BEAM: THE KEY TO MODERN DATA PROCESSING
  • 5. BEAM = Batch + StrEAM
  • 6. Progressive evolution from batch to stream - Stream as the new default Cost/perf trade-offs without re-architecting - Just turn the knob ML: data preparation consistency between training & scoring - Same pipeline to train in batch and score in stream BENEFIT OF BATCH / STREAM UNIFICATION
  • 8. What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate? THE BEAM MODEL: ASKING THE RIGHT QUESTIONS
  • 9. The Beam Model: is being computed? WHAT
  • 10. WHERE The Beam Model: in event time ?
  • 11. WHEN The Beam Model: in processing time ?
  • 12. HOW The Beam Model: do refinements relate?
  • 13. What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate? THE BEAM MODEL: ASKING THE RIGHT QUESTIONS
  • 15.  The Beam Model: the abstractions at the core of Apache Beam  Choice of API: Users write their pipelines in a language that’s familiar and integrated with their other tooling  Choice of Runtime: Users choose the right runner for their current needs -- on-prem / cloud, open source / not, fully managed / not  Scalability for Developers: Clean APIs allow developers to contribute modules independently Language B SDK Language A SDK Language C SDK Runner 1 Runner 3 Runner 2 The Beam Model Language A Language C Language B The Beam Model BEAM VISION: MIX AND MATCH SDKS AND RUNTIMES
  • 16. APACHE SPARK  Open-source cluster- computing framework  Large ecosystem of APIs and tools  Runs on premise or in the cloud APACHE FLINK  Open-source distributed data processing engine  High-throughput and low-latency stream processing  Runs on premise or in the cloud EXAMPLE BEAM RUNNERS GOOGLE CLOUD DATAFLOW  Fully-managed service for batch and stream data processing  Provides dynamic auto-scaling, monitoring tools, and tight integration with Google Cloud Platform
  • 17. GA 360 Cloud Pub/Sub BigQuery Storage (tables) Cloud Bigtable (NoSQL) Cloud Storage (files) Cloud Dataflow Capture Store Analyze Stackdriver Process Stream Use Cloud Dataproc Cloud Datalab Real-time analytics Real-time dashboard Real-time alerts ML Engine Batch Firebase Storage Transfer Service Cloud Dataflow etc... SQL BigQuery Analytics Adwords DoubleClick YouTube BEAM ON GOOGLE CLOUD: SERVERLESS DATA PROCESSING
  • 18. BEAM MORE INFO  Apache Beam: https://beam.apache.org  Google Cloud Platform: https://cloud.google.com The Dataflow Model paper from VLDB 2015 http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf Streaming 101 and 102: The World Beyond Batch https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102