SlideShare a Scribd company logo
1 of 32
Download to read offline
Portable Batch and Streaming Processing
APACHE BEAM
ABOUT ME
● Data Architect at Globant
● Certified Google Data
Engineer
● Applying software engineering
practices to Big Data
development
AGENDA
1. Introduction
2. What is Apache Beam?
3. Why Apache Beam?
4. Crash course
5. Beam on Flink
6. Beam on DataFlow
7. Summary
INTRODUCTION
Computation
Result
Computation Computation Computation Computation
Result
INTRODUCTION
Computation Computation Computation Computation
Result
Computation Computation Computation Computation
Result
Batch Streaming
WHAT IS APACHE BEAM?
● Original code from Google (Cloud Dataflow SDK).
● Apache open-source project since 2016.
● Parallel/distributed data processing.
● Unified programming model for batch and streaming.
● Portable, execution engine and programming language
of your choice.
WHY APACHE BEAM?
Write Translate
SDK INSTALLATION
CRASH COURSE
Input Transform PCollection Transform PCollection Sink
Pipeline
Output
PIPELINE
A Pipeline encapsulates your entire data processing task, from start to finish. This
includes reading input data, transforming that data, and writing output data.
PIPELINE
Pipeline
PCollection
The PCollection abstraction represents a potentially distributed, multi-element data
set. You can think of a PCollection as “pipeline” data; Beam transforms use
PCollection objects as inputs and outputs. As such, if you want to work with data in
your pipeline, it must be in the form of a PCollection.
PCollection
PCollection
● The elements of a PCollection may be of any type, but
must all be of the same type.
● A PCollection is immutable.
● A PCollection can be either bounded or unbounded in
size.
● Each element in a PCollection has an associated intrinsic
timestamp.
PCollection
PCollection PCollection
Pipeline
Transforms
Transforms
Transforms
Transforms
● Read - parallel connectors to external systems
● ParDo - per element processing
● GroupByKey - aggregating elements per key and window
● Flatten - union of PCollections
● Window - set the windowing strategy for a PCollection
Transforms
Transform PCollection Transform PCollection Sink
Pipeline
Pipeline I/O
Pipeline I/O
Pipeline I/O
Input Transform PCollection Transform PCollection Sink
Pipeline
Output
Beam on Flink
Beam on Google Cloud DataFlow
BE AWARE OF
Write Translate
BE AWARE OF
Portable protocol
JAVA Objects
What is being computed?
Where in event time?
Summary
● Apache Beam is not an execution engine, is a programming model.
● Apache Flink and Google Cloud Dataflow are the execution engines
that supports all the features of Beam.
● The JAVA SDK is the the most up to date.
● PCollections are the main Data Structure in Beam.
● Transformations receive PCollections as inputs and produces the
same as outputs.
● Check the compatibility table before jump into the development.
Q&A
Thank you for listening!

More Related Content

What's hot

Open source tools for logic synthesis and soc design an overview
Open source tools for logic synthesis and soc design  an overviewOpen source tools for logic synthesis and soc design  an overview
Open source tools for logic synthesis and soc design an overview
Vaibhav R
 

What's hot (20)

Cloud Native CI/CD with Spring Cloud Pipelines
Cloud Native CI/CD with Spring Cloud PipelinesCloud Native CI/CD with Spring Cloud Pipelines
Cloud Native CI/CD with Spring Cloud Pipelines
 
Go programming language
Go programming languageGo programming language
Go programming language
 
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
 
Five Real-World Strategies for Perforce Streams
Five Real-World Strategies for Perforce StreamsFive Real-World Strategies for Perforce Streams
Five Real-World Strategies for Perforce Streams
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine Learning
 
Go lambda-presentation
Go lambda-presentationGo lambda-presentation
Go lambda-presentation
 
Php Inspections (EA Extended): realtime static code analysis
Php Inspections (EA Extended): realtime static code analysisPhp Inspections (EA Extended): realtime static code analysis
Php Inspections (EA Extended): realtime static code analysis
 
Multitenant SaaS Apps In Rails By Iqbal Hasnan
Multitenant SaaS Apps In Rails By Iqbal HasnanMultitenant SaaS Apps In Rails By Iqbal Hasnan
Multitenant SaaS Apps In Rails By Iqbal Hasnan
 
Open source tools for logic synthesis and soc design an overview
Open source tools for logic synthesis and soc design  an overviewOpen source tools for logic synthesis and soc design  an overview
Open source tools for logic synthesis and soc design an overview
 
Config management for kubernetes: GitOps + Helm
Config management for kubernetes: GitOps + HelmConfig management for kubernetes: GitOps + Helm
Config management for kubernetes: GitOps + Helm
 
Trailblazer Rails Architecture
Trailblazer Rails ArchitectureTrailblazer Rails Architecture
Trailblazer Rails Architecture
 
RxSwift
RxSwiftRxSwift
RxSwift
 
Sebastian Spaink [InfluxData] | Layer by Layer: Printing Your Own External In...
Sebastian Spaink [InfluxData] | Layer by Layer: Printing Your Own External In...Sebastian Spaink [InfluxData] | Layer by Layer: Printing Your Own External In...
Sebastian Spaink [InfluxData] | Layer by Layer: Printing Your Own External In...
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
Berlin AWS meetup: here.com on AWS
Berlin AWS meetup: here.com on AWSBerlin AWS meetup: here.com on AWS
Berlin AWS meetup: here.com on AWS
 
helm, the real world
helm, the real worldhelm, the real world
helm, the real world
 
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
.Net Introduction
.Net Introduction.Net Introduction
.Net Introduction
 

Similar to Apache Beam: Lote portátil y procesamiento de transmisión

Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 

Similar to Apache Beam: Lote portátil y procesamiento de transmisión (20)

Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
 
Introduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OKIntroduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OK
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
 
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
Prefect Paris Airflow Meetup Jeff Hale April 2023.pdf
Prefect Paris Airflow Meetup Jeff Hale April 2023.pdfPrefect Paris Airflow Meetup Jeff Hale April 2023.pdf
Prefect Paris Airflow Meetup Jeff Hale April 2023.pdf
 
Apache Beam @ GCPUG.TW Flink.TW 20161006
Apache Beam @ GCPUG.TW Flink.TW 20161006Apache Beam @ GCPUG.TW Flink.TW 20161006
Apache Beam @ GCPUG.TW Flink.TW 20161006
 
SpringOne 2016 in a nutshell
SpringOne 2016 in a nutshellSpringOne 2016 in a nutshell
SpringOne 2016 in a nutshell
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
OpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit LondonOpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit London
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
CCT (Check and Calculate Transfer)
CCT (Check and Calculate Transfer)CCT (Check and Calculate Transfer)
CCT (Check and Calculate Transfer)
 
Presentation CCT
Presentation CCTPresentation CCT
Presentation CCT
 

More from Globant

Navegando el desafío de transformación digital de los servicios financieros
Navegando el desafío de transformación digital de los servicios financierosNavegando el desafío de transformación digital de los servicios financieros
Navegando el desafío de transformación digital de los servicios financieros
Globant
 

More from Globant (20)

Webinar MLOps: When AA gets serious.
Webinar MLOps: When AA gets serious.Webinar MLOps: When AA gets serious.
Webinar MLOps: When AA gets serious.
 
Google Cloud Spanner y NewSQL
Google Cloud Spanner y NewSQLGoogle Cloud Spanner y NewSQL
Google Cloud Spanner y NewSQL
 
Eventos Asíncronos como estrategia virtual
Eventos Asíncronos como estrategia virtualEventos Asíncronos como estrategia virtual
Eventos Asíncronos como estrategia virtual
 
Cultura y valores 4.0 para líderes 4.0
Cultura y valores 4.0 para líderes 4.0Cultura y valores 4.0 para líderes 4.0
Cultura y valores 4.0 para líderes 4.0
 
Tech Insiders Salesforce: SFDX e Integración Continua
Tech Insiders Salesforce: SFDX e Integración ContinuaTech Insiders Salesforce: SFDX e Integración Continua
Tech Insiders Salesforce: SFDX e Integración Continua
 
Como impulsar tu carrera Salesforce
Como impulsar tu carrera SalesforceComo impulsar tu carrera Salesforce
Como impulsar tu carrera Salesforce
 
3D Programming Basics: WebGL
3D Programming Basics: WebGL3D Programming Basics: WebGL
3D Programming Basics: WebGL
 
Converge augmented report
Converge augmented reportConverge augmented report
Converge augmented report
 
Sistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeSistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta Lake
 
Kubeflow: Machine Learning en Cloud para todos
Kubeflow: Machine Learning en Cloud para todosKubeflow: Machine Learning en Cloud para todos
Kubeflow: Machine Learning en Cloud para todos
 
Orquestando Pipelines de Datosen AWS con Step Function y AWS Glue
Orquestando Pipelines de Datosen AWS con Step Function y AWS GlueOrquestando Pipelines de Datosen AWS con Step Function y AWS Glue
Orquestando Pipelines de Datosen AWS con Step Function y AWS Glue
 
Navegando el desafío de transformación digital de los servicios financieros
Navegando el desafío de transformación digital de los servicios financierosNavegando el desafío de transformación digital de los servicios financieros
Navegando el desafío de transformación digital de los servicios financieros
 
Converge 2020
Converge 2020 Converge 2020
Converge 2020
 
Converge 2020
Converge 2020Converge 2020
Converge 2020
 
Tendencias de tecnología para el recién egresado
Tendencias de tecnología para el recién egresadoTendencias de tecnología para el recién egresado
Tendencias de tecnología para el recién egresado
 
SRE: ¿Qué es y cómo gestionar el Toil?
SRE: ¿Qué es y cómo gestionar el Toil?SRE: ¿Qué es y cómo gestionar el Toil?
SRE: ¿Qué es y cómo gestionar el Toil?
 
Monitoreo en tiempo real para la mejora continua de una aplicación
Monitoreo en tiempo real para la mejora continua de una aplicaciónMonitoreo en tiempo real para la mejora continua de una aplicación
Monitoreo en tiempo real para la mejora continua de una aplicación
 
¿Cómo automatizar pruebas de infraestructura y no morir en el intento?
¿Cómo automatizar pruebas de infraestructura y no morir en el intento?¿Cómo automatizar pruebas de infraestructura y no morir en el intento?
¿Cómo automatizar pruebas de infraestructura y no morir en el intento?
 
Automatización en AWS con Chatbot Serverless (Amazon Lex)
Automatización en AWS con Chatbot Serverless (Amazon Lex)Automatización en AWS con Chatbot Serverless (Amazon Lex)
Automatización en AWS con Chatbot Serverless (Amazon Lex)
 
¿Cómo importar funcionalidades de C++ a NodeJS?
¿Cómo importar funcionalidades de C++ a NodeJS?¿Cómo importar funcionalidades de C++ a NodeJS?
¿Cómo importar funcionalidades de C++ a NodeJS?
 

Recently uploaded

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Apache Beam: Lote portátil y procesamiento de transmisión