SlideShare a Scribd company logo
1 of 12
Download to read offline
Sep 2015
Google Dataflow
introduction
iglushkov@machinezone.com
What is Google Dataflow
❖ Data processing system: batch and streaming
❖ Set of SDKs
❖ Google Cloud Platform managed services:
❖ Google Compute Engine (VMs)
❖ Google Cloud Storage (r/w data)
❖ BigQuery (r/w data)
Programming Model
❖ Pipeline - entire series of computations
❖ PCollection - set of data in a pipeline
❖ Transform - any data processing operation
❖ Pipeline I/O - data source and data sink APIs
Pipeline
❖ Data + Transforms
❖ Branching + merging
❖ Multiple sources
❖ Unit testing + Integration testing
❖ Pipeline Execution Parameters (local/prod)
❖ Where from, what it looks like, what to do, where store
PCollection
❖ Represent data in a pipeline from any source
❖ Potentially unlimited (stream)
❖ Serializable, immutable, no random access to elements
❖ Deferred data (may have yet to be computed)
❖ Windowing, triggers
Windowing
❖ Window - subdivided logical parts of a PCollection
❖ Each element is assigned to one or more windows
❖ Fixed time windows
❖ Sliding time windows
❖ Per-session windows
❖ Single global windows
Late Data
❖ Event time / Processing time
❖ No order guarantee
❖ No consistent delta b/w Event and Processing time
❖ Watermark
❖ Late data
❖ Triggers to refine windowing, data reporting time
Triggers
❖ Enough data for the window -> aggregate result: “pane”
❖ Help handle late data
❖ Time-based triggers
❖ Data-driven triggers (e.g. certain amount is enough)
❖ Composite triggers: OR, AND - operations on triggers
❖ Window Accumulation modes: accumulate/discard the
previous “panes”
Transforms
❖ Math, convert format, grouping, filtering, combining
❖ [PCollection] -> [PCollection]
❖ Core Transforms: ParDo, GroupByKey, Combine, …
❖ Functions with business logic to apply:

Serializable, Thread-compatible, Idempotent
❖ Composite Transforms
Pipeline I/O
❖ Read/Write from/to external sources
❖ Text Files in Google Cloud Storage or local FS
❖ BigQuery tables
❖ Google Cloud PubSub
❖ Custom Sources and Sinks
Extra
❖ Parallelization, distribution, optimization, scaling
❖ Dataflow monitoring UI and CLI
❖ Logging
❖ Unit testing (locally) any Fn, end-to-end
❖ Introspection toolchain
❖ Update toolchain: for code, windowing configs
Questions?

More Related Content

What's hot

Google Cloud Fundamentals
Google Cloud Fundamentals Google Cloud Fundamentals
Google Cloud Fundamentals Omar Fathy
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowDataWorks Summit
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itBruno Faria
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsLynn Langit
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflowmutt_data
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
"Building Data Warehouse with Google Cloud Platform",  Artem Nikulchenko"Building Data Warehouse with Google Cloud Platform",  Artem Nikulchenko
"Building Data Warehouse with Google Cloud Platform", Artem NikulchenkoFwdays
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 

What's hot (20)

Google Cloud Fundamentals
Google Cloud Fundamentals Google Cloud Fundamentals
Google Cloud Fundamentals
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using it
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
"Building Data Warehouse with Google Cloud Platform",  Artem Nikulchenko"Building Data Warehouse with Google Cloud Platform",  Artem Nikulchenko
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 

Similar to Introduction to Google Dataflow Platform

Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Hortonworks
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainKamal A
 
Time to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setupTime to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setupCheck my Website
 
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver JanOSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver JanNETWAYS
 
Uni w pachube 111108
Uni w pachube 111108Uni w pachube 111108
Uni w pachube 111108Paul Tanner
 
Data Onboarding
Data Onboarding Data Onboarding
Data Onboarding Splunk
 
Data Onboarding
Data Onboarding Data Onboarding
Data Onboarding Splunk
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...Flink Forward
 
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver JanOSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver JanNETWAYS
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 

Similar to Introduction to Google Dataflow Platform (20)

Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
Pentaho ppt up
Pentaho ppt upPentaho ppt up
Pentaho ppt up
 
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domain
 
Time to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setupTime to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setup
 
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver JanOSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
 
Uni w pachube 111108
Uni w pachube 111108Uni w pachube 111108
Uni w pachube 111108
 
Data Onboarding
Data Onboarding Data Onboarding
Data Onboarding
 
Data Onboarding
Data Onboarding Data Onboarding
Data Onboarding
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver JanOSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 

More from Ivan Glushkov

Distributed tracing with erlang/elixir
Distributed tracing with erlang/elixirDistributed tracing with erlang/elixir
Distributed tracing with erlang/elixirIvan Glushkov
 
Kubernetes is not needed to 90 percents of the companies.rus
Kubernetes is not needed to 90 percents of the companies.rusKubernetes is not needed to 90 percents of the companies.rus
Kubernetes is not needed to 90 percents of the companies.rusIvan Glushkov
 
Mystery Machine Overview
Mystery Machine OverviewMystery Machine Overview
Mystery Machine OverviewIvan Glushkov
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015Ivan Glushkov
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulIvan Glushkov
 

More from Ivan Glushkov (8)

Distributed tracing with erlang/elixir
Distributed tracing with erlang/elixirDistributed tracing with erlang/elixir
Distributed tracing with erlang/elixir
 
Kubernetes is not needed to 90 percents of the companies.rus
Kubernetes is not needed to 90 percents of the companies.rusKubernetes is not needed to 90 percents of the companies.rus
Kubernetes is not needed to 90 percents of the companies.rus
 
Mystery Machine Overview
Mystery Machine OverviewMystery Machine Overview
Mystery Machine Overview
 
Raft in details
Raft in detailsRaft in details
Raft in details
 
Hashicorp Nomad
Hashicorp NomadHashicorp Nomad
Hashicorp Nomad
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
 
fp intro
fp introfp intro
fp intro
 

Recently uploaded

Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 

Recently uploaded (20)

Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 

Introduction to Google Dataflow Platform

  • 2. What is Google Dataflow ❖ Data processing system: batch and streaming ❖ Set of SDKs ❖ Google Cloud Platform managed services: ❖ Google Compute Engine (VMs) ❖ Google Cloud Storage (r/w data) ❖ BigQuery (r/w data)
  • 3. Programming Model ❖ Pipeline - entire series of computations ❖ PCollection - set of data in a pipeline ❖ Transform - any data processing operation ❖ Pipeline I/O - data source and data sink APIs
  • 4. Pipeline ❖ Data + Transforms ❖ Branching + merging ❖ Multiple sources ❖ Unit testing + Integration testing ❖ Pipeline Execution Parameters (local/prod) ❖ Where from, what it looks like, what to do, where store
  • 5. PCollection ❖ Represent data in a pipeline from any source ❖ Potentially unlimited (stream) ❖ Serializable, immutable, no random access to elements ❖ Deferred data (may have yet to be computed) ❖ Windowing, triggers
  • 6. Windowing ❖ Window - subdivided logical parts of a PCollection ❖ Each element is assigned to one or more windows ❖ Fixed time windows ❖ Sliding time windows ❖ Per-session windows ❖ Single global windows
  • 7. Late Data ❖ Event time / Processing time ❖ No order guarantee ❖ No consistent delta b/w Event and Processing time ❖ Watermark ❖ Late data ❖ Triggers to refine windowing, data reporting time
  • 8. Triggers ❖ Enough data for the window -> aggregate result: “pane” ❖ Help handle late data ❖ Time-based triggers ❖ Data-driven triggers (e.g. certain amount is enough) ❖ Composite triggers: OR, AND - operations on triggers ❖ Window Accumulation modes: accumulate/discard the previous “panes”
  • 9. Transforms ❖ Math, convert format, grouping, filtering, combining ❖ [PCollection] -> [PCollection] ❖ Core Transforms: ParDo, GroupByKey, Combine, … ❖ Functions with business logic to apply:
 Serializable, Thread-compatible, Idempotent ❖ Composite Transforms
  • 10. Pipeline I/O ❖ Read/Write from/to external sources ❖ Text Files in Google Cloud Storage or local FS ❖ BigQuery tables ❖ Google Cloud PubSub ❖ Custom Sources and Sinks
  • 11. Extra ❖ Parallelization, distribution, optimization, scaling ❖ Dataflow monitoring UI and CLI ❖ Logging ❖ Unit testing (locally) any Fn, end-to-end ❖ Introspection toolchain ❖ Update toolchain: for code, windowing configs