SlideShare a Scribd company logo

Arctic15 keynote

Jerry Jalava
Jerry Jalava
Jerry JalavaCloud Consultant, Google Developer Expert, Authorized Google Cloud Trainer (Nordics) at Jerryn Paja

Slides of my talk in Arctic15 2017

Arctic15 keynote

1 of 44
Download to read offline
Senior System Architect, Google
Developer Expert, Authorised Trainer
REAL-TIME DATA PROCESSING
AND ANALYSIS IN THE CLOUD
JERRY JALAVA - QVIK
JERRY@QVIK.FI | @W_I
MASSIVE AMOUNTS OF DATA
WE PRODUCE
@W_I @QVIK
HAS BEEN GENERATED IN THE
PAST FEW YEARS
OVER 90% OF ALL THE DATA
@W_I @QVIK
REQUIRES YOU TO BE ABLE
TO ANALYSE THAT FAST
BEING COMPETITIVE
@W_I @QVIK
BUILDING THESE KIND OF
INFRASTRUCTURES IS
EXPENSIVE
BUT,
@W_I @QVIK
MANY GREAT OPEN-SOURCE
PROJECTS AVAILABLE
THERE ARE
@W_I @QVIK

Recommended

Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovDatabricks
 
GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxDatabricks
 
Aerospike User Group: Exploring Data Modeling
Aerospike User Group: Exploring Data ModelingAerospike User Group: Exploring Data Modeling
Aerospike User Group: Exploring Data ModelingBrillix
 
Spark Streaming - Meetup Data Analysis
Spark Streaming - Meetup Data AnalysisSpark Streaming - Meetup Data Analysis
Spark Streaming - Meetup Data AnalysisSushmanth Sagala
 
UIKonf App & Data Driven Design @swift.berlin
UIKonf App & Data Driven Design @swift.berlinUIKonf App & Data Driven Design @swift.berlin
UIKonf App & Data Driven Design @swift.berlinMaxim Zaks
 
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)DataStax Academy
 
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkNLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkMartin Goodson
 

More Related Content

Viewers also liked

Real-time image sharing
Real-time image sharingReal-time image sharing
Real-time image sharingJerry Jalava
 
Secrets in Kubernetes
Secrets in KubernetesSecrets in Kubernetes
Secrets in KubernetesJerry Jalava
 
Going Serverless with Kubeless In Google Container Engine (GKE)
Going Serverless with Kubeless In Google Container Engine (GKE)Going Serverless with Kubeless In Google Container Engine (GKE)
Going Serverless with Kubeless In Google Container Engine (GKE)Bitnami
 
Continous Delivery to Kubernetes using Helm
Continous Delivery to Kubernetes using HelmContinous Delivery to Kubernetes using Helm
Continous Delivery to Kubernetes using HelmBitnami
 
Serverless with Google Cloud Functions
Serverless with Google Cloud FunctionsServerless with Google Cloud Functions
Serverless with Google Cloud FunctionsJerry Jalava
 
Building Resilient Cloud Native Apps in GKE
Building Resilient Cloud Native Apps in GKEBuilding Resilient Cloud Native Apps in GKE
Building Resilient Cloud Native Apps in GKEJerry Jalava
 

Viewers also liked (6)

Real-time image sharing
Real-time image sharingReal-time image sharing
Real-time image sharing
 
Secrets in Kubernetes
Secrets in KubernetesSecrets in Kubernetes
Secrets in Kubernetes
 
Going Serverless with Kubeless In Google Container Engine (GKE)
Going Serverless with Kubeless In Google Container Engine (GKE)Going Serverless with Kubeless In Google Container Engine (GKE)
Going Serverless with Kubeless In Google Container Engine (GKE)
 
Continous Delivery to Kubernetes using Helm
Continous Delivery to Kubernetes using HelmContinous Delivery to Kubernetes using Helm
Continous Delivery to Kubernetes using Helm
 
Serverless with Google Cloud Functions
Serverless with Google Cloud FunctionsServerless with Google Cloud Functions
Serverless with Google Cloud Functions
 
Building Resilient Cloud Native Apps in GKE
Building Resilient Cloud Native Apps in GKEBuilding Resilient Cloud Native Apps in GKE
Building Resilient Cloud Native Apps in GKE
 

Similar to Arctic15 keynote

Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every dayVadym Khondar
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Streaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani KokhreidzeStreaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani KokhreidzeHostedbyConfluent
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleSingleStore
 
JS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless BebopJS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless BebopJSFestUA
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopSages
 
Por que Criamos uma Ferramenta de Load Testing utilizando Playwright e AWS Ba...
Por que Criamos uma Ferramenta de Load Testing utilizando Playwright e AWS Ba...Por que Criamos uma Ferramenta de Load Testing utilizando Playwright e AWS Ba...
Por que Criamos uma Ferramenta de Load Testing utilizando Playwright e AWS Ba...anderparra
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabRoman
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorWSO2
 
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseCodepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseSages
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joinedennui2342
 
How to ship customer value faster with step functions
How to ship customer value faster with step functionsHow to ship customer value faster with step functions
How to ship customer value faster with step functionsYan Cui
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
 
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...Flink Forward
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)Craig Chao
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB
 

Similar to Arctic15 keynote (20)

Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Streaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani KokhreidzeStreaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani Kokhreidze
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 
JS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless BebopJS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless Bebop
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Por que Criamos uma Ferramenta de Load Testing utilizando Playwright e AWS Ba...
Por que Criamos uma Ferramenta de Load Testing utilizando Playwright e AWS Ba...Por que Criamos uma Ferramenta de Load Testing utilizando Playwright e AWS Ba...
Por que Criamos uma Ferramenta de Load Testing utilizando Playwright e AWS Ba...
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at Grab
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
 
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseCodepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joined
 
How to ship customer value faster with step functions
How to ship customer value faster with step functionsHow to ship customer value faster with step functions
How to ship customer value faster with step functions
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
 
Intro to Akka Streams
Intro to Akka StreamsIntro to Akka Streams
Intro to Akka Streams
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
 

Recently uploaded

SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxKapilSinghal47
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfJaithoonBibi
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...Mesum Raza Hemani
 
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...AkbarHidayatullah11
 
GDSC Machine Learning Session Presentation
GDSC Machine Learning Session PresentationGDSC Machine Learning Session Presentation
GDSC Machine Learning Session Presentationgdsclavasa
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...Daniele Malitesta
 
Big Data Foundations Level 1-IBM SkillsBuild
Big Data Foundations Level 1-IBM SkillsBuildBig Data Foundations Level 1-IBM SkillsBuild
Big Data Foundations Level 1-IBM SkillsBuildOshri Bitton
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
chatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfchatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfMuntherMurjan1
 

Recently uploaded (17)

SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptx
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdf
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
DELHI URBANIZATION
DELHI URBANIZATIONDELHI URBANIZATION
DELHI URBANIZATION
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
 
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
 
GDSC Machine Learning Session Presentation
GDSC Machine Learning Session PresentationGDSC Machine Learning Session Presentation
GDSC Machine Learning Session Presentation
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
 
Big Data Foundations Level 1-IBM SkillsBuild
Big Data Foundations Level 1-IBM SkillsBuildBig Data Foundations Level 1-IBM SkillsBuild
Big Data Foundations Level 1-IBM SkillsBuild
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
chatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfchatgpt-prompts (1).pdf
chatgpt-prompts (1).pdf
 

Arctic15 keynote

  • 1. Senior System Architect, Google Developer Expert, Authorised Trainer REAL-TIME DATA PROCESSING AND ANALYSIS IN THE CLOUD JERRY JALAVA - QVIK JERRY@QVIK.FI | @W_I
  • 2. MASSIVE AMOUNTS OF DATA WE PRODUCE @W_I @QVIK
  • 3. HAS BEEN GENERATED IN THE PAST FEW YEARS OVER 90% OF ALL THE DATA @W_I @QVIK
  • 4. REQUIRES YOU TO BE ABLE TO ANALYSE THAT FAST BEING COMPETITIVE @W_I @QVIK
  • 5. BUILDING THESE KIND OF INFRASTRUCTURES IS EXPENSIVE BUT, @W_I @QVIK
  • 6. MANY GREAT OPEN-SOURCE PROJECTS AVAILABLE THERE ARE @W_I @QVIK
  • 7. REFERENCE ARCHITECTURE @W_I @QVIK Ingest Devices / Systems generating events Message Queue Processing Data Processing Storage Time-series Database Data Warehouse
  • 8. @W_I @QVIK Cloud Pub/Sub - A fully managed, global and scalable publish and subscribe service with guaranteed at-least-once message delivery Cloud Dataflow - A fully managed, auto-scalable service for pipeline data processing in batch or streaming mode BigQuery - A fully managed, petabyte scale, low-cost enterprise data warehouse for analytics
  • 9. REFERENCE ARCHITECTURE @W_I @QVIK Ingest Devices / Systems generating events Processing Data Processing Storage Time-series Database Data Warehouse Cloud Pub/Sub
  • 10. REFERENCE ARCHITECTURE @W_I @QVIK Ingest Devices / Systems generating events Processing Data Processing Storage Cloud Pub/Sub BigQuery Cloud Bigtable
  • 11. REFERENCE ARCHITECTURE @W_I @QVIK Ingest Devices / Systems generating events Processing Storage Cloud Pub/Sub BigQuery Cloud Bigtable Cloud Dataflow
  • 12. DEMO ‣ Analyse “real-time” Taxi data from NYC ‣ >20000 events/s incoming ‣ 3M Taxi rides (1 week of data) ‣ Get insights ‣ Live visualisation of the rides ‣ How do the taxi rides from airports compare to taxi rides overall ‣ Analyse archived data @W_I @QVIK
  • 13. DEMO ARCHITECTURE @W_I @QVIK Ingest Processing Messaging Cloud Pub/Sub Telemetry Rides Cloud Dataflow Aggregate Dashboard Application Taxis Messaging Cloud Pub/Sub Display Data
  • 16. MULTIPLE DATA PROCESSING REQUIREMENTS ‣ Correctness, completeness, reliability, scalability, and performance
 ‣ Continuous event processing ‣ Continuous result delivery ‣ Scalable ETL for continuous archival
 ‣ Analyst-ready big data sets @W_I @QVIK
  • 17. @W_I @QVIK COUNT RIDES Taxi Data Output (Lax X, Lon Y) @1:00, (Lat X, Lon Y) @1:01,
 (Lat K, Lon M) @1:03, (Lat K, Lon M) @ 2:30
  • 18. @W_I @QVIK COUNT RIDES Taxi Data Output (Lax X, Lon Y) @1:00, (Lat X, Lon Y) @1:01,
 (Lat K, Lon M) @1:03, (Lat K, Lon M) @ 2:30 Window In Time {[1:00, 2:00) → (X, Y) @1:00, (X, Y) @1:01, (K, M) @1:03 } {[2:00, 2:30) → (K, M) @2:30}
  • 19. @W_I @QVIK COUNT RIDES Taxi Data Output (Lax X, Lon Y) @1:00, (Lat X, Lon Y) @1:01,
 (Lat K, Lon M) @1:03, (Lat K, Lon M) @ 2:30 Window In Time {[1:00, 2:00) → (X, Y) @1:00, (X, Y) @1:01, (K, M) @1:03 } {[2:00, 2:30) → (K, M) @2:30} Group In Space { (X, Y), [1:00, 2:00) → (X, Y) @1:00, (X, Y) @1:01} { (K, M), [1:00, 2:00) → (K, M) @1:03 } { (K, M), [2:00, 3:00) → (K, M) @2:30 }
  • 20. @W_I @QVIK COUNT RIDES Taxi Data Output (Lax X, Lon Y) @1:00, (Lat X, Lon Y) @1:01,
 (Lat K, Lon M) @1:03, (Lat K, Lon M) @ 2:30 Window In Time {[1:00, 2:00) → (X, Y) @1:00, (X, Y) @1:01, (K, M) @1:03 } {[2:00, 2:30) → (K, M) @2:30} Group In Space { (X, Y), [1:00, 2:00) → (X, Y) @1:00, (X, Y) @1:01} { (K, M), [1:00, 2:00) → (K, M) @1:03 } { (K, M), [2:00, 3:00) → (K, M) @2:30 } Count { (X, Y), [1:00, 2:00) → 2} { (K, M), [1:00, 2:00) → 1} { (K, M), [2:00, 3:00) → 1}
  • 21. @W_I @QVIK COUNT RIDES Taxi Data Output Window In Time Group In Space Count p.apply(PubsubIO.Read.topic(taxiInTopic) .apply("window 1s", Window.into(FixedWindows.of(
 Duration.standardSeconds(1)) ) ) .apply("condense rides",
 MapElements.via(new CondenseRides()) ) .apply("count similar", Count.perKey()) .apply(PubsubIO.Write.topic(taxiOutTopic);
  • 22. @W_I @QVIK COUNT RIDES Taxi Data Output Window In Time Group In Space Count p.apply(PubsubIO.Read.topic(taxiInTopic) .apply("window 1s", Window.into(FixedWindows.of(
 Duration.standardSeconds(1)) ) ) .apply("condense rides",
 MapElements.via(new CondenseRides()) ) .apply("count similar", Count.perKey()) .apply(PubsubIO.Write.topic(taxiOutTopic); private static class CondenseRides
 extends SimpleFunction<TableRow, KV<LatLon, TableRow>> { public KV<LatLon, TableRow> apply(TableRow t) { final float box = 0.001f; // very approximately 100m float roundedLat = Math.floor(t.get("latitude") / box) * box + box / 2; float roundedLon = Math.floor(t.get("longitude"). / box) * box + box / 2; LatLon key = new LatLon(roundedLat, roundedLon); return KV.of(key, t); } }
  • 23. @W_I @QVIK #java com.google.codelabs.dataflow.CountRides —streaming=true —project=arctic15-demo --sourceProject=arctic15-demo —sourceTopic=taxifeed1 --sinkProject=arctic15-demo --runner=DataflowPipelineRunner —zone=eu-west1-c --numWorkers=3 --stagingLocation=gs://arctic15-demo —sinkTopic=visualisation-sink-1
  • 25. @W_I @QVIK 10X Reduction In Messages per Second
  • 28. HOW DO THE TAXI RIDES FROM AIRPORTS COMPARE TO OVERALL TAXI RIDES GETTING INSIGHTS @W_I @QVIK
  • 29. @W_I @QVIK AIRPORT RIDES Read from PubSub p.apply(PubsubIO.Read(inputTopic)) .apply(“Key By Ride ID”, MapElements.via(
 (TableRow ride) -> KV.of(ride.get("ride_id"), ride))) .apply(Window.into( Sessions.withGapDuration(TEN_MIN))) 
 .apply(Window.triggering(Repeatedly.forever( AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes()) .apply(Combine.perKey(new AccumulatePoints())) .apply(ParDo.of(new FilterAtAirport())) .apply(ParDo.of(new ExtractLatest()) .apply(PubsubIO.Write(outputTopic));
  • 30. @W_I @QVIK AIRPORT RIDES Key By Ride ID to group together ride points from the same ride p.apply(PubsubIO.Read(inputTopic)) .apply(“Key By Ride ID”, MapElements.via(
 (TableRow ride) -> KV.of(ride.get("ride_id"), ride))) .apply(Window.into( Sessions.withGapDuration(TEN_MIN))) 
 .apply(Window.triggering(Repeatedly.forever( AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes()) .apply(Combine.perKey(new AccumulatePoints())) .apply(ParDo.of(new FilterAtAirport())) .apply(ParDo.of(new ExtractLatest()) .apply(PubsubIO.Write(outputTopic));
  • 31. @W_I @QVIK AIRPORT RIDES Sessions allow us to create a window with all the same rides grouped together, and then GC the ride data once no more ride points show up for ten minutes p.apply(PubsubIO.Read(inputTopic)) .apply(“Key By Ride ID”, MapElements.via(
 (TableRow ride) -> KV.of(ride.get("ride_id"), ride))) .apply(Window.into( Sessions.withGapDuration(TEN_MIN))) 
 .apply(Window.triggering(Repeatedly.forever( AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes()) .apply(Combine.perKey(new AccumulatePoints())) .apply(ParDo.of(new FilterAtAirport())) .apply(ParDo.of(new ExtractLatest()) .apply(PubsubIO.Write(outputTopic));
  • 32. @W_I @QVIK AIRPORT RIDES Triggering delivers the contents of the ride window early and often: elementCountAtLeast(1) ensures that we get the first values after even a single element shows up Repeatedly.forever ensures we keep getting updates accumulatingFiredPanes ensures we get full view of data p.apply(PubsubIO.Read(inputTopic)) .apply(“Key By Ride ID”, MapElements.via(
 (TableRow ride) -> KV.of(ride.get("ride_id"), ride))) .apply(Window.into( Sessions.withGapDuration(TEN_MIN))) 
 .apply(Window.triggering(Repeatedly.forever( AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes()) .apply(Combine.perKey(new AccumulatePoints())) .apply(ParDo.of(new FilterAtAirport())) .apply(ParDo.of(new ExtractLatest()) .apply(PubsubIO.Write(outputTopic));
  • 33. @W_I @QVIK AIRPORT RIDES Every time our window is triggered, the Accumulator determines how the data points in the window are combined AccumulatePoints(): - Keeps the pickup location, necessary to know if the ride started at the airport - Keeps the most recent value, to continuously emit update about the ride p.apply(PubsubIO.Read(inputTopic)) .apply(“Key By Ride ID”, MapElements.via(
 (TableRow ride) -> KV.of(ride.get("ride_id"), ride))) .apply(Window.into( Sessions.withGapDuration(TEN_MIN))) 
 .apply(Window.triggering(Repeatedly.forever( AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes()) .apply(Combine.perKey(new AccumulatePoints())) .apply(ParDo.of(new FilterAtAirport())) .apply(ParDo.of(new ExtractLatest()) .apply(PubsubIO.Write(outputTopic));
  • 34. @W_I @QVIK AIRPORT RIDES Look at the pickup point in the accumulator, and compare it with Lat/Long coordinates to determine if its an airport pickup p.apply(PubsubIO.Read(inputTopic)) .apply(“Key By Ride ID”, MapElements.via(
 (TableRow ride) -> KV.of(ride.get("ride_id"), ride))) .apply(Window.into( Sessions.withGapDuration(TEN_MIN))) 
 .apply(Window.triggering(Repeatedly.forever( AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes()) .apply(Combine.perKey(new AccumulatePoints())) .apply(ParDo.of(new FilterAtAirport())) .apply(ParDo.of(new ExtractLatest()) .apply(PubsubIO.Write(outputTopic));
  • 35. @W_I @QVIK AIRPORT RIDES For writing output, we only care about the latest point from the accumulator p.apply(PubsubIO.Read(inputTopic)) .apply(“Key By Ride ID”, MapElements.via(
 (TableRow ride) -> KV.of(ride.get("ride_id"), ride))) .apply(Window.into( Sessions.withGapDuration(TEN_MIN))) 
 .apply(Window.triggering(Repeatedly.forever( AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes()) .apply(Combine.perKey(new AccumulatePoints())) .apply(ParDo.of(new FilterAtAirport())) .apply(ParDo.of(new ExtractLatest()) .apply(PubsubIO.Write(outputTopic));
  • 36. @W_I @QVIK AIRPORT RIDES We write the resulting latest point to Pub/Sub p.apply(PubsubIO.Read(inputTopic)) .apply(“Key By Ride ID”, MapElements.via(
 (TableRow ride) -> KV.of(ride.get("ride_id"), ride))) .apply(Window.into( Sessions.withGapDuration(TEN_MIN))) 
 .apply(Window.triggering(Repeatedly.forever( AfterPane.elementCountAtLeast(1))) .accumulatingFiredPanes()) .apply(Combine.perKey(new AccumulatePoints())) .apply(ParDo.of(new FilterAtAirport())) .apply(ParDo.of(new ExtractLatest()) .apply(PubsubIO.Write(outputTopic));
  • 38. UPDATED DEMO ARCHITECTURE @W_I @QVIK Ingest Processing Messaging Cloud Pub/Sub Telemetry Rides Cloud Dataflow Aggregate Dashboard Application Taxis Messaging Cloud Pub/Sub Display Data Insights Analytics BigQuery Data Warehouse ETL Pipeline Cloud Dataflow Archival-grade aggregates ●Create another ETL pipeline PubSub <-> BigQuery ●Composition: save output of regular taxi data and filtered airport data
  • 41. @W_I @QVIK RIDE DATA OVER TIME
  • 42. APACHE BEAM ‣ In early 2016,
 Google announced their intention to move the Dataflow programming model and SDKs to the
 Apache Software Foundation ‣ Apache Beam is now a top level project @QVIK
  • 43. SOME RESOURCES ‣ cloud.google.com/dataflow ‣ beam.apache.org ‣ codelabs.developers.google.com ‣ Big Data facts (forbes.com) @QVIK
  • 44. THANK YOU! LET’S CREATE IT TOGETHER jerry@qvik.fi | @W_I