SlideShare a Scribd company logo
1 of 59
Download to read offline
INTRODUCTION TO
APACHE
Leveraging unified analytics
in stateful processing.
MARCGONZALEZ.EU
InnoIT: a disruptive IT company!
INNOIT IS A CONSULTING COMPANY SPECIALISED IN IT
We work with:
Web & Mobile Development
Systems & DevOPS
Quality Assurance Testing
Big Data & Machine Learning
Methodologies: Agile, Lean, Product Owner
We are specialists!
9 YEARS IN FRANCE, 3 YEARS IN SPAIN:
In France, we are > 200 consultants
In Spain, we have built a team of > 35 consultants in 3 years
We reached the confidence of more than 25 multinational clients
We have organized >20 technological meetups with >900 attendees
We are simply different
 
COACHING OF OUR CONSULTANTS AS A “FOOTBALL AGENT SCOUT”: WE WORK “FROM THE CANDIDATE TO THE MARKET”.
OUR CLIENT IS THE CANDIDATE.
SOFT SKILLS: WE LOOK FOR THE BEST TALENTS WE CAN TRUST IN. WE FOCUS ON THE POTENTIAL OF A PERSON
TRANSPARENCY: WE ONLY SAY WHAT WE ARE GOING TO COMPLETE AND WE COMPLY OUR PROMISE
WE WON THE TRUST OF THE CANDIDATES:
1 / 2 CVS WE RECEIVED COMES FROM A “REFERRAL”
THE SAME EFFECT WITH OUR CLIENTS
OUR CONSULTANTS GIVE US REVIEWS & QUOTES ON SOCIAL MEDIA
They trust InnoiT
We are hiring people like you!
YOU CAN A LOOK TO OUR OFFERS AND APPLICATE IN OUR WEBSITE:
WWW.INNO-IT.ES
YOU CAN ALSO SEND YOUR CV TO THE EMAIL:
APPLY@INNO-IT.ES
YOU CAN ALSO SIMPLY COME AND SPEAK WITH US ☺
MONIKA, MELISSA, GABRIEL AND MYSELF CAN EXPLAIN YOU OUR OPPORTUNITIES!
Next Event in InnoIT
JANUARY
MEETUP: “INTRODUCTION TO APACHE BEAM. APACHE BEAM AND HOW TO LEVERAGE UNIFIED PROCESSING TO
TACKLE NEW DEVELOPMENTS”?
MIÉRCOLES 29/01 DE 19H A 21H (OFICINA URQUINAONA)
ORGANIZADO POR INNOIT CONSULTING & UBEEQO/ EUROPCAR NEW MOBILITIES
FEBRUARY
DESAYUNO DE TRABAJO: “LEADERSHIP: CÓMO GESTIONAR CONFLICTOS EN EMPRESAS EN GRAN CRECIMIENTO?”
MARTES 11/02 DE 9H A 11H (OFICINA POBLE NOU)
ORGANIZADO POR INNOIT Y EL COACH (EX CTO) ALFONS FOUBERT
INTRODUCTION TO
APACHE
Leveraging unified analytics
in stateful processing.
MARCGONZALEZ.EU
Hi, I’m Marc!
Freelance Data Engineer
Developer, Consultant, Speaker
5+ years of big data experience,
applied to classifieds market.
Q? sli.do #BEAM
About Ubeeqo goes here
Building the new data platform:
• 7.7 M customers
• 40 M bookings
• 350 K cars
Q? sli.do #BEAM
Audience
• Experience with Spark.
• Streams and tables theory.
• Beam model.
• Apache Beam in production.
Q? sli.do #BEAM
Motivation
• Processing pipelines requirements change all the time.
The “What” The “How”
Change
KPI A !-> KPI B
MySQL !-> Redshift
Json !-> Parquet
Batch !-> Streaming
Method DDD, Functional ?
Q? sli.do #BEAM
Motivation
00:00 00:00
crontab

0 1 * * * spark-submit
00:00 00:00
01:00
02:00
03:00
04:00
05:00
14:13 16:46 19:14
X
12:00
crontab

0 */1 * * * spark-submit
airflow

0 */1 * * * spark-submit
00:00 00:0006:00 12:00 18:00
crontab

0 */6 * * * spark-submit
Q? sli.do #BEAM
Challenges
• Processing out-of-order data based on application timestamps (also called event time)
• Maintaining large amounts of state
• Supporting high-data throughput
• Processing each event exactly once despite machine failures
• Handling load imbalance and stragglers
• Responding to events at low latency
• Joining with external data in other storage systems
• Determining how to update output sinks as new events arrive
• Writing data transactionally to output systems
• Updating your application’s business logic at runtime
Q? sli.do #BEAM
TALK STRUCTURE
Beam model through Streams & Tables theory.
Getting started with Apache Beam.
Apache Beam Barcelona meetup.3
2
1
Q Q&A sli.do code: BEAM
Q? sli.do #BEAM
Notes
• Most material is from Tyler Akidau, either from his blog, talks or book.
Q? sli.do #BEAM
TALK STRUCTURE
Beam model through Streams & Tables theory.
Getting started with Apache Beam.
Apache Beam Barcelona meetup.3
2
1
Q? sli.do #BEAM
Beam model
• What results are calculated?
• Where in event time are results calculated?
• When in processing time are results materialized?
• How do refinements of results relate?
Q? sli.do #BEAM
What results are calculated?
Event vs Processing Times
Q? sli.do #BEAM
What results are calculated?
Event vs Processing Times Example
Q? sli.do #BEAM
What results are calculated?
Importance of Order
Q? sli.do #BEAM
What results are calculated?
Importance of Order
Q? sli.do #BEAM
Beam model
• What results are calculated? Insights
• Where in event time are results calculated?
• When in processing time are results materialized?
• How do refinements of results relate?
Q? sli.do #BEAM
Where in event time are results calculated?
Windowing
• Partitioning a data set along temporal boundaries.
Fixed Sliding Session
Event-Time
Q? sli.do #BEAM
Where in event time are results calculated?
2 Minute Windowing Example
Q? sli.do #BEAM
Beam model
• What results are calculated? Insights
• Where in event time are results calculated? Windowing
• When in processing time are results materialized?
• How do refinements of results relate?
Q? sli.do #BEAM
When in processing time are results materialized?
Triggers
• Mechanism for declaring when the output for a window should be
materialized (relative to some external signal).
• Per element
• Window completion
• Fixed
Q? sli.do #BEAM
When in processing time are results materialized?
2 Minute Triggers Example
Q? sli.do #BEAM
Beam model
• What results are calculated? Insights
• Where in event time are results calculated? Windowing
• When in processing time are results materialized? Triggers
• How do refinements of results relate?
Q? sli.do #BEAM
How do refinements of results relate?
State
• Amount of context stored between runs.
Q? sli.do #BEAM
How do refinements of results relate?
Watermarks
• Temporal notions of input completeness in the event-time domain.
Q? sli.do #BEAM
How do refinements of results relate?
Watermarks Example
Q? sli.do #BEAM
How do refinements of results relate?
Handling late data
• Firing functions when events are observed outside the state.
Technique Side-effect
Discarding Approximate
Accumulation Duplicates
Accumulation
& Retraction
Late updates
Q? sli.do #BEAM
How do refinements of results relate?
Tagging late data Example
Q? sli.do #BEAM
Beam model
• What results are calculated? Insights
• Where in event time are results calculated? Windowing
• When in processing time are results materialized? Triggers
• How do refinements of results relate? Watermarks & Exactly Once
Q? sli.do #BEAM
“Every Stream can yield a Table at a certain time,
& every Table can be observed into a Stream.”
Streams & Tables theory
Q? sli.do #BEAM
Operation
Stream !=> Stream Mapping
Stream !=> Table Grouping
Table !=> Stream Partitioning
Streams & Tables theory: General approach
Q? sli.do #BEAM
Operation
Stream !=> Stream Mapping
Stream !=> Table Grouping
Table !=> Stream Partitioning
Table !=> Table Part+Group
Streams & Tables theory: General approach
Q? sli.do #BEAM
Table #=> Table are Part+Group
• Spark Dataframes
• MapReduce
• SQL Engines
Q? sli.do #BEAM
“Semantically batch is really just
a (strict) subset of streaming.”
Streams & Tables theory:
Batch & Streaming Engines
Q? sli.do #BEAM
Streams & Tables theory:
Bounded & Unbounded Tables
struct
!=>Insights
Unbounded tableData stream
Q? sli.do #BEAM
Streams & Tables theory:
Bounded & Unbounded Tables
struct
!=> Insights
Data stream
Q? sli.do #BEAM
Batch+strEAM model
• What results are calculated? Insights
• Where in event time are results calculated? Windowing
• When in processing time are results materialized? Triggers
• How do refinements of results relate? Watermarks & Exactly Once
Q? sli.do #BEAM
Recap Part 1
• Beam model useful for processing of Bounded & Unbounded Tables.
• Event vs Processing time & how it relates to Windowing and Triggering.
• Stateful processing is useful when working to guarantee correctness.
• State is managed with Watermarks, Late Data firings & Fault Tolerant
Exactly One semantics.
Q? sli.do #BEAM
TALK STRUCTURE
Beam model through Streams & Tables theory.
Getting started with Apache Beam.
Apache Beam Barcelona meetup.3
2
1
Q? sli.do #BEAM
Apache Beam
• Unified model
• Multiples languages
• Portable runners!
SQL
Q? sli.do #BEAM
Pipelines = PCollections + PTransforms
Q? sli.do #BEAM
PCollection
• Distributed Dictionary (inspired from RDD, Dataframes) but
can be bounded or unbounded.
• Source Readers
• Sink Writers
Q? sli.do #BEAM
IOConnectors for PCollections
File Messaging Database
FileIO
(HDFS*),
Text,Avro,
Parquet,ML…
Kafka,
Kinesis,
Pub/Sub,
-MQ…
JDBC,Redis,
Cassandra,
Hive,HBase,
BQ,BT,Spanner,
Solr…
Q? sli.do #BEAM
PTransform
• Inmutable transformations:
[Output PCollection] = [Input PCollection].apply([Transform])
• 6 primitives:
Mapping Grouping Partitioning
ParDo GroupByKey Partition
!Flatten CoGroupByKey
Combine
Q? sli.do #BEAM
ParDo
• ParDo applies an DoFn in distributed fashion.
• DoFn are User Dofined Functions. Which must be:
• Serializable
• Thread safe
• Idempotent
Q? sli.do #BEAM
High-level PTransforms
Filter ApproximateQuantiles Min
!FlatMapElements ApproximateUnique Sample
Keys CoGroupByKey Sum
KvSwap Combine Top
MapElements CombineWithContext Create
ParDo Count !Flatten
Partition Distinct PAssert
Regex GroupByKey View
Reify GroupIntoBatches Window
ToString HllCount
WithKeys Latest
WithTimestamps Max
Values Mean
Q? sli.do #BEAM
public class ScoreSum {
public static void main(String[] args) {
Options options = PipelineOptionsFactory.fromArgs(args)
.withValidation().as(Options.class);
Pipeline pipeline = Pipeline.create(options);
!// Read events from a text file and parse them.
PCollection<KV<String, Integer!>> input = pipeline
.apply(TextIO.read().from(options.getInput()))
.apply("ParseGameEvent", ParDo.of(new ParseEventFn()))
!// Apply your complex transformation.
PCollection<KV<String, Integer!>> scores = input
   .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)))
               .triggering(
                 AtWatermark()
                   .withEarlyFirings(AtPeriod(Duration.standardMinutes(2)))
                   .withLateFirings(AtCount(1)))
               .accumulatingAndRetractingFiredPanes())
   .apply(Sum.integersPerKey());
scores.apply(TextIO.write().to(options.getOutput()));
pipeline.run().waitUntilFinish();
}
}
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
Code example
• Generic pipeline creation
(Spark context)
• Reader (spark.read)
• Where: 2 min fixed window
transformation
• When: Fixed Trigger
• How: Correcteness
• What: Sum integers
• Writer (spark.write)
• Execute lazy eval
Q? sli.do #BEAM
Code example
Q? sli.do #BEAM
Recap Part 2
• Beam pipelines are combinations of PCollections + PTransformations
• A lot of out-of-the-box IOTransforms
• Separate your Readers and Writers for reusability & testability.
• Use high-level Transforms for a jump start.
• Identify in the model for good complex transforms design.
• Language & Runner independent FTW
Q? sli.do #BEAM
TALK STRUCTURE
Beam model through Streams & Tables theory.
Getting started with Apache Beam.
Apache Beam Barcelona meetup.3
2
1
Q? sli.do #BEAM
Apache Beam
Barcelona Meetup
Join for slides
& more!
Q? sli.do #BEAM
Community
• This is all about the community!
• Despite level & background!
• Call for speakers!
Q? sli.do #BEAM
Thank you &
see you soon!
sli.do

More Related Content

Similar to Introduction to Apache Beam

Architectural Considerations for Startups
Architectural Considerations for StartupsArchitectural Considerations for Startups
Architectural Considerations for StartupsNiall Roche
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudRoman Weber
 
Natural born conversion killers - Conversion Jam
Natural born conversion killers - Conversion JamNatural born conversion killers - Conversion Jam
Natural born conversion killers - Conversion JamCraig Sullivan
 
Racing for the Flexibility Integrating Aras into the IT Landscape
Racing for the Flexibility Integrating Aras into the IT LandscapeRacing for the Flexibility Integrating Aras into the IT Landscape
Racing for the Flexibility Integrating Aras into the IT LandscapeAras
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp
 
Better Business Process - PBI World Tour Charlotte '18
Better Business Process - PBI World Tour Charlotte '18Better Business Process - PBI World Tour Charlotte '18
Better Business Process - PBI World Tour Charlotte '18eamador1
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsInside Analysis
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
View Page Update Presentation Close Bangalore Executive Seminar 2015: Welcom...
 View Page Update Presentation Close Bangalore Executive Seminar 2015: Welcom... View Page Update Presentation Close Bangalore Executive Seminar 2015: Welcom...
View Page Update Presentation Close Bangalore Executive Seminar 2015: Welcom...MongoDB
 
Hacking your ConnectWise Manage by Stack Advisors - Automation Nation 2018
Hacking your ConnectWise Manage by Stack Advisors - Automation Nation 2018Hacking your ConnectWise Manage by Stack Advisors - Automation Nation 2018
Hacking your ConnectWise Manage by Stack Advisors - Automation Nation 2018Scott Wilson
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platformhadooparchbook
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?C4Media
 
Lecture about "Enterprise Architecture @ ING" given at Solvay Brussels School...
Lecture about "Enterprise Architecture @ ING" given at Solvay Brussels School...Lecture about "Enterprise Architecture @ ING" given at Solvay Brussels School...
Lecture about "Enterprise Architecture @ ING" given at Solvay Brussels School...Alain Heremans
 
RightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale
 
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...Gene Kim
 
Mirco hering devops for systems of record final
Mirco hering devops for systems of record finalMirco hering devops for systems of record final
Mirco hering devops for systems of record finalMirco Hering
 
From Spreadsheet Hell to Streamlined Automation with QuickBase
From Spreadsheet Hell to Streamlined Automation with QuickBaseFrom Spreadsheet Hell to Streamlined Automation with QuickBase
From Spreadsheet Hell to Streamlined Automation with QuickBaseQuickBase, Inc.
 
The Anchor Store: Four Confluence Examples to Root Your Deployment
The Anchor Store: Four Confluence Examples to Root Your DeploymentThe Anchor Store: Four Confluence Examples to Root Your Deployment
The Anchor Store: Four Confluence Examples to Root Your DeploymentAtlassian
 
Productionizing Data Science at Experience
Productionizing Data Science at ExperienceProductionizing Data Science at Experience
Productionizing Data Science at ExperienceMatt Mills
 

Similar to Introduction to Apache Beam (20)

Architectural Considerations for Startups
Architectural Considerations for StartupsArchitectural Considerations for Startups
Architectural Considerations for Startups
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloud
 
Natural born conversion killers - Conversion Jam
Natural born conversion killers - Conversion JamNatural born conversion killers - Conversion Jam
Natural born conversion killers - Conversion Jam
 
Racing for the Flexibility Integrating Aras into the IT Landscape
Racing for the Flexibility Integrating Aras into the IT LandscapeRacing for the Flexibility Integrating Aras into the IT Landscape
Racing for the Flexibility Integrating Aras into the IT Landscape
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
Better Business Process - PBI World Tour Charlotte '18
Better Business Process - PBI World Tour Charlotte '18Better Business Process - PBI World Tour Charlotte '18
Better Business Process - PBI World Tour Charlotte '18
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
View Page Update Presentation Close Bangalore Executive Seminar 2015: Welcom...
 View Page Update Presentation Close Bangalore Executive Seminar 2015: Welcom... View Page Update Presentation Close Bangalore Executive Seminar 2015: Welcom...
View Page Update Presentation Close Bangalore Executive Seminar 2015: Welcom...
 
Hacking your ConnectWise Manage by Stack Advisors - Automation Nation 2018
Hacking your ConnectWise Manage by Stack Advisors - Automation Nation 2018Hacking your ConnectWise Manage by Stack Advisors - Automation Nation 2018
Hacking your ConnectWise Manage by Stack Advisors - Automation Nation 2018
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?
 
Lecture about "Enterprise Architecture @ ING" given at Solvay Brussels School...
Lecture about "Enterprise Architecture @ ING" given at Solvay Brussels School...Lecture about "Enterprise Architecture @ ING" given at Solvay Brussels School...
Lecture about "Enterprise Architecture @ ING" given at Solvay Brussels School...
 
RightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to Cloud
 
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
 
Mirco hering devops for systems of record final
Mirco hering devops for systems of record finalMirco hering devops for systems of record final
Mirco hering devops for systems of record final
 
From Spreadsheet Hell to Streamlined Automation with QuickBase
From Spreadsheet Hell to Streamlined Automation with QuickBaseFrom Spreadsheet Hell to Streamlined Automation with QuickBase
From Spreadsheet Hell to Streamlined Automation with QuickBase
 
The Anchor Store: Four Confluence Examples to Root Your Deployment
The Anchor Store: Four Confluence Examples to Root Your DeploymentThe Anchor Store: Four Confluence Examples to Root Your Deployment
The Anchor Store: Four Confluence Examples to Root Your Deployment
 
Productionizing Data Science at Experience
Productionizing Data Science at ExperienceProductionizing Data Science at Experience
Productionizing Data Science at Experience
 

Recently uploaded

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Introduction to Apache Beam

  • 1. INTRODUCTION TO APACHE Leveraging unified analytics in stateful processing. MARCGONZALEZ.EU
  • 2. InnoIT: a disruptive IT company! INNOIT IS A CONSULTING COMPANY SPECIALISED IN IT We work with: Web & Mobile Development Systems & DevOPS Quality Assurance Testing Big Data & Machine Learning Methodologies: Agile, Lean, Product Owner We are specialists! 9 YEARS IN FRANCE, 3 YEARS IN SPAIN: In France, we are > 200 consultants In Spain, we have built a team of > 35 consultants in 3 years We reached the confidence of more than 25 multinational clients We have organized >20 technological meetups with >900 attendees
  • 3. We are simply different   COACHING OF OUR CONSULTANTS AS A “FOOTBALL AGENT SCOUT”: WE WORK “FROM THE CANDIDATE TO THE MARKET”. OUR CLIENT IS THE CANDIDATE. SOFT SKILLS: WE LOOK FOR THE BEST TALENTS WE CAN TRUST IN. WE FOCUS ON THE POTENTIAL OF A PERSON TRANSPARENCY: WE ONLY SAY WHAT WE ARE GOING TO COMPLETE AND WE COMPLY OUR PROMISE WE WON THE TRUST OF THE CANDIDATES: 1 / 2 CVS WE RECEIVED COMES FROM A “REFERRAL” THE SAME EFFECT WITH OUR CLIENTS OUR CONSULTANTS GIVE US REVIEWS & QUOTES ON SOCIAL MEDIA
  • 5. We are hiring people like you! YOU CAN A LOOK TO OUR OFFERS AND APPLICATE IN OUR WEBSITE: WWW.INNO-IT.ES YOU CAN ALSO SEND YOUR CV TO THE EMAIL: APPLY@INNO-IT.ES YOU CAN ALSO SIMPLY COME AND SPEAK WITH US ☺ MONIKA, MELISSA, GABRIEL AND MYSELF CAN EXPLAIN YOU OUR OPPORTUNITIES!
  • 6. Next Event in InnoIT JANUARY MEETUP: “INTRODUCTION TO APACHE BEAM. APACHE BEAM AND HOW TO LEVERAGE UNIFIED PROCESSING TO TACKLE NEW DEVELOPMENTS”? MIÉRCOLES 29/01 DE 19H A 21H (OFICINA URQUINAONA) ORGANIZADO POR INNOIT CONSULTING & UBEEQO/ EUROPCAR NEW MOBILITIES FEBRUARY DESAYUNO DE TRABAJO: “LEADERSHIP: CÓMO GESTIONAR CONFLICTOS EN EMPRESAS EN GRAN CRECIMIENTO?” MARTES 11/02 DE 9H A 11H (OFICINA POBLE NOU) ORGANIZADO POR INNOIT Y EL COACH (EX CTO) ALFONS FOUBERT
  • 7. INTRODUCTION TO APACHE Leveraging unified analytics in stateful processing. MARCGONZALEZ.EU
  • 8. Hi, I’m Marc! Freelance Data Engineer Developer, Consultant, Speaker 5+ years of big data experience, applied to classifieds market.
  • 9.
  • 10. Q? sli.do #BEAM About Ubeeqo goes here Building the new data platform: • 7.7 M customers • 40 M bookings • 350 K cars
  • 11. Q? sli.do #BEAM Audience • Experience with Spark. • Streams and tables theory. • Beam model. • Apache Beam in production.
  • 12. Q? sli.do #BEAM Motivation • Processing pipelines requirements change all the time. The “What” The “How” Change KPI A !-> KPI B MySQL !-> Redshift Json !-> Parquet Batch !-> Streaming Method DDD, Functional ?
  • 13. Q? sli.do #BEAM Motivation 00:00 00:00 crontab
 0 1 * * * spark-submit 00:00 00:00 01:00 02:00 03:00 04:00 05:00 14:13 16:46 19:14 X 12:00 crontab
 0 */1 * * * spark-submit airflow
 0 */1 * * * spark-submit 00:00 00:0006:00 12:00 18:00 crontab
 0 */6 * * * spark-submit
  • 14. Q? sli.do #BEAM Challenges • Processing out-of-order data based on application timestamps (also called event time) • Maintaining large amounts of state • Supporting high-data throughput • Processing each event exactly once despite machine failures • Handling load imbalance and stragglers • Responding to events at low latency • Joining with external data in other storage systems • Determining how to update output sinks as new events arrive • Writing data transactionally to output systems • Updating your application’s business logic at runtime
  • 15. Q? sli.do #BEAM TALK STRUCTURE Beam model through Streams & Tables theory. Getting started with Apache Beam. Apache Beam Barcelona meetup.3 2 1 Q Q&A sli.do code: BEAM
  • 16. Q? sli.do #BEAM Notes • Most material is from Tyler Akidau, either from his blog, talks or book.
  • 17. Q? sli.do #BEAM TALK STRUCTURE Beam model through Streams & Tables theory. Getting started with Apache Beam. Apache Beam Barcelona meetup.3 2 1
  • 18. Q? sli.do #BEAM Beam model • What results are calculated? • Where in event time are results calculated? • When in processing time are results materialized? • How do refinements of results relate?
  • 19. Q? sli.do #BEAM What results are calculated? Event vs Processing Times
  • 20. Q? sli.do #BEAM What results are calculated? Event vs Processing Times Example
  • 21. Q? sli.do #BEAM What results are calculated? Importance of Order
  • 22. Q? sli.do #BEAM What results are calculated? Importance of Order
  • 23. Q? sli.do #BEAM Beam model • What results are calculated? Insights • Where in event time are results calculated? • When in processing time are results materialized? • How do refinements of results relate?
  • 24. Q? sli.do #BEAM Where in event time are results calculated? Windowing • Partitioning a data set along temporal boundaries. Fixed Sliding Session Event-Time
  • 25. Q? sli.do #BEAM Where in event time are results calculated? 2 Minute Windowing Example
  • 26. Q? sli.do #BEAM Beam model • What results are calculated? Insights • Where in event time are results calculated? Windowing • When in processing time are results materialized? • How do refinements of results relate?
  • 27. Q? sli.do #BEAM When in processing time are results materialized? Triggers • Mechanism for declaring when the output for a window should be materialized (relative to some external signal). • Per element • Window completion • Fixed
  • 28. Q? sli.do #BEAM When in processing time are results materialized? 2 Minute Triggers Example
  • 29. Q? sli.do #BEAM Beam model • What results are calculated? Insights • Where in event time are results calculated? Windowing • When in processing time are results materialized? Triggers • How do refinements of results relate?
  • 30. Q? sli.do #BEAM How do refinements of results relate? State • Amount of context stored between runs.
  • 31. Q? sli.do #BEAM How do refinements of results relate? Watermarks • Temporal notions of input completeness in the event-time domain.
  • 32. Q? sli.do #BEAM How do refinements of results relate? Watermarks Example
  • 33. Q? sli.do #BEAM How do refinements of results relate? Handling late data • Firing functions when events are observed outside the state. Technique Side-effect Discarding Approximate Accumulation Duplicates Accumulation & Retraction Late updates
  • 34. Q? sli.do #BEAM How do refinements of results relate? Tagging late data Example
  • 35. Q? sli.do #BEAM Beam model • What results are calculated? Insights • Where in event time are results calculated? Windowing • When in processing time are results materialized? Triggers • How do refinements of results relate? Watermarks & Exactly Once
  • 36. Q? sli.do #BEAM “Every Stream can yield a Table at a certain time, & every Table can be observed into a Stream.” Streams & Tables theory
  • 37. Q? sli.do #BEAM Operation Stream !=> Stream Mapping Stream !=> Table Grouping Table !=> Stream Partitioning Streams & Tables theory: General approach
  • 38. Q? sli.do #BEAM Operation Stream !=> Stream Mapping Stream !=> Table Grouping Table !=> Stream Partitioning Table !=> Table Part+Group Streams & Tables theory: General approach
  • 39. Q? sli.do #BEAM Table #=> Table are Part+Group • Spark Dataframes • MapReduce • SQL Engines
  • 40. Q? sli.do #BEAM “Semantically batch is really just a (strict) subset of streaming.” Streams & Tables theory: Batch & Streaming Engines
  • 41. Q? sli.do #BEAM Streams & Tables theory: Bounded & Unbounded Tables struct !=>Insights Unbounded tableData stream
  • 42. Q? sli.do #BEAM Streams & Tables theory: Bounded & Unbounded Tables struct !=> Insights Data stream
  • 43. Q? sli.do #BEAM Batch+strEAM model • What results are calculated? Insights • Where in event time are results calculated? Windowing • When in processing time are results materialized? Triggers • How do refinements of results relate? Watermarks & Exactly Once
  • 44. Q? sli.do #BEAM Recap Part 1 • Beam model useful for processing of Bounded & Unbounded Tables. • Event vs Processing time & how it relates to Windowing and Triggering. • Stateful processing is useful when working to guarantee correctness. • State is managed with Watermarks, Late Data firings & Fault Tolerant Exactly One semantics.
  • 45. Q? sli.do #BEAM TALK STRUCTURE Beam model through Streams & Tables theory. Getting started with Apache Beam. Apache Beam Barcelona meetup.3 2 1
  • 46. Q? sli.do #BEAM Apache Beam • Unified model • Multiples languages • Portable runners! SQL
  • 47. Q? sli.do #BEAM Pipelines = PCollections + PTransforms
  • 48. Q? sli.do #BEAM PCollection • Distributed Dictionary (inspired from RDD, Dataframes) but can be bounded or unbounded. • Source Readers • Sink Writers
  • 49. Q? sli.do #BEAM IOConnectors for PCollections File Messaging Database FileIO (HDFS*), Text,Avro, Parquet,ML… Kafka, Kinesis, Pub/Sub, -MQ… JDBC,Redis, Cassandra, Hive,HBase, BQ,BT,Spanner, Solr…
  • 50. Q? sli.do #BEAM PTransform • Inmutable transformations: [Output PCollection] = [Input PCollection].apply([Transform]) • 6 primitives: Mapping Grouping Partitioning ParDo GroupByKey Partition !Flatten CoGroupByKey Combine
  • 51. Q? sli.do #BEAM ParDo • ParDo applies an DoFn in distributed fashion. • DoFn are User Dofined Functions. Which must be: • Serializable • Thread safe • Idempotent
  • 52. Q? sli.do #BEAM High-level PTransforms Filter ApproximateQuantiles Min !FlatMapElements ApproximateUnique Sample Keys CoGroupByKey Sum KvSwap Combine Top MapElements CombineWithContext Create ParDo Count !Flatten Partition Distinct PAssert Regex GroupByKey View Reify GroupIntoBatches Window ToString HllCount WithKeys Latest WithTimestamps Max Values Mean
  • 53. Q? sli.do #BEAM public class ScoreSum { public static void main(String[] args) { Options options = PipelineOptionsFactory.fromArgs(args) .withValidation().as(Options.class); Pipeline pipeline = Pipeline.create(options); !// Read events from a text file and parse them. PCollection<KV<String, Integer!>> input = pipeline .apply(TextIO.read().from(options.getInput())) .apply("ParseGameEvent", ParDo.of(new ParseEventFn())) !// Apply your complex transformation. PCollection<KV<String, Integer!>> scores = input    .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)))                .triggering(                  AtWatermark()                    .withEarlyFirings(AtPeriod(Duration.standardMinutes(2)))                    .withLateFirings(AtCount(1)))                .accumulatingAndRetractingFiredPanes())    .apply(Sum.integersPerKey()); scores.apply(TextIO.write().to(options.getOutput())); pipeline.run().waitUntilFinish(); } } 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. Code example • Generic pipeline creation (Spark context) • Reader (spark.read) • Where: 2 min fixed window transformation • When: Fixed Trigger • How: Correcteness • What: Sum integers • Writer (spark.write) • Execute lazy eval
  • 55. Q? sli.do #BEAM Recap Part 2 • Beam pipelines are combinations of PCollections + PTransformations • A lot of out-of-the-box IOTransforms • Separate your Readers and Writers for reusability & testability. • Use high-level Transforms for a jump start. • Identify in the model for good complex transforms design. • Language & Runner independent FTW
  • 56. Q? sli.do #BEAM TALK STRUCTURE Beam model through Streams & Tables theory. Getting started with Apache Beam. Apache Beam Barcelona meetup.3 2 1
  • 57. Q? sli.do #BEAM Apache Beam Barcelona Meetup Join for slides & more!
  • 58. Q? sli.do #BEAM Community • This is all about the community! • Despite level & background! • Call for speakers!
  • 59. Q? sli.do #BEAM Thank you & see you soon! sli.do