SlideShare a Scribd company logo
©2017 Cloudreach
Learn Big Data with Uber
Presented by Mark Thebault | 19/01/2018
©2017 Cloudreach
Agenda
Big Data, real use case
How to get an Uber ?
Use case Uber
IoT, Handle high throughput
Process data, real time, batch
Let’s build a Big Data Platform
©2017 Cloudreach 3
Let’s talk about Uber
©2017 Cloudreach
● 8 Million users
● 160k drivers
● 400 cities around the world
● 1 million rides per day
● 2 billion rides recorded
Big data, real use case 4
©2017 Cloudreach
Dozens of cities opened each year
Big data, real use case 5
©2017 Cloudreach Big data, real use case 6
Drivers location
Courses requests
Payments
User feedback
Metrics...
Driver position every 4 seconds
Riders requests
Million requests per seconds
©2017 Cloudreach Big data, real use case 7
The architecture
● Handle throughput variation
● Fraud detection
● Low latency
● Scalability
©2017 Cloudreach 8
IoT - Imbibed of Tequila
©2017 Cloudreach Big data, real use case 9
IoT, How to Handle data?
Uber use case:
● Different sources / Destination
● Different throughput
● Different format
● Several consumers for the same data
©2017 Cloudreach Big data, real use case 10
How to have good performances ?
● Distributed messaging Platform
● Created in 2009 by LinkedIn
● Known for its
○ Performances
○ Scalability
● Target: Centralise all data exchange
©2017 Cloudreach Big data, real use case 11
Clean the mess: Centralize data!
©2017 Cloudreach Big data, real use case 12
Why Kafka ?
● Horizontally scalable by adding new servers
● Enable very high throughput: Allow real time data flows
● Better than traditional brokers RabbitMQ, ActiveMQ
○ Less expensive
○ Better performance
○ Easily scalable
©2017 Cloudreach Big data, real use case 13
How it works ?
● Publisher / subscriber model
● Message are sent in Topics
● Producers inject data
● Consumers read data with a given offset
● A node is called a Broker
©2017 Cloudreach Big data, real use case 14
Stream data with Kafka
● Messages are stored on the hard drive
● All writings are executed in RAM
● By default a message is stored 7 days
● Topics are replicated into partitions
● Messages are stored in a Log file
○ Messages have an offset
○ Consumer handles its own offset
(Better read performances)
©2017 Cloudreach Big data, real use case 15
Kafka topic anatomy
● Topic are split into partitions
● Partition are for fault tolerance
● Partitions have a leader server
and zero or more follower
servers
● Writes are handled by the leader
● Reads can be handled by the
followers
©2017 Cloudreach 16
How to get an Uber ?
©2017 Cloudreach Big data, real use case 17
NoSQL Databases
Uber’s database use case: Read user data, read driver’s positions…
● Fast response time, high throughput
● Your application scales, your database need also to
● Be always available - No downtime
● Storing large amount of data
● Reduce the price of using RDBMS for the same purpose
©2017 Cloudreach Big data, real use case 18
Apache Cassandra
● Initially developed by Facebook in 2008
● Column oriented by default tables use schemas
● Built to be deployed in very large scale across different data-centers
● Values are identified by a unique key
○ RowKey (unique ID)
○ Column Name
○ Column Value
○ Default timestamp created by Cassandra
○ Expiration date (optional)
©2017 Cloudreach Big data, real use case 19
Cassandra Behind the scenes
● Peer-to-Peer (P2P)
○ No master, no slaves
● Multi-Datacenter
○ Geographical distribution
○ Segregation operational / analytic
● Gossip protocol
○ Once per second
○ Exchange cluster informations
○ With at least 3 randomly chosen nodes
©2017 Cloudreach Big data, real use case 20
Cassandra Data Model
● Columns with defined Schema
● Partition key
○ Same Pkeys on one node
○ Choose wisely the partition key
performances depends on it !
● Clustering key
○ Get an extract of columns
○ Used in WHERE clauses
● Static columns
○ Values shared across all lines
©2017 Cloudreach Big data, real use case 21
Cassandra CQL Language
©2017 Cloudreach Big data, real use case 22
Cassandra Consistency Management
● Customisable consistency
○ Writing: customise number of acknowledgments
○ Reading: number of reads done
● Levels
○ ONE
○ QUORUM
○ ALL…
● Consistency level defined at level query
©2017 Cloudreach 23
Process data, real time, batch
©2017 Cloudreach Big data, real use case 24
What the data is telling ?
Real Time processing
● Create a multiplication factor when there is an event (Football party…)
● Find the best customer to fit in one uberpool
Batch processing
● Get daily analytics
● Find fraud drivers
Problem: Gigabytes per second of data, teras of data process each day
©2017 Cloudreach Big data, real use case 25
Distributed processing
● Involve large number of computer system
○ Computers in a same area network
○ Large bandwidth is required
● Parallel processing
○ Split the processing in different tasks
○ Each computer does its own calculation
○ Results are merged
● Not all processing are suitable to be parallelized, be aware when you model them
©2017 Cloudreach Big data, real use case 26
What is Spark ?
● Open source framework (Apache)
● Processing of large volumes of data
● Faster than Hadoop MapReduce
● Distributed processing framework
● Main Focuses
○ Streaming
○ Machine Learning
○ Extract Transform Load (ETL)
©2017 Cloudreach Big data, real use case 27
Spark Architecture
Cluster Management
● Spark Standalone
● Mesos
● Yarn
● Kubernetes (beta)
©2017 Cloudreach Big data, real use case 28
What is MapReduce ?
● Pattern invented in 2004 by Google
● A dataset is split into partitions:
○ Map: applies a transformation
○ Reduce: aggregates the partitions
● Items are distributed across the network: distributed processing
● Hadoop MapReduce is an implementation of this pattern
©2017 Cloudreach Big data, real use case 29
What is MapReduce ? - Word count
To be, or not to be, that is the Question:
Whether 'tis Nobler in the minde to suffer
The Slings and Arrowes of outragious Fortune,
Or to take Armes against a Sea of troubles,
And by opposing end them: to dye, to sleepe
No more; and by a sleepe, to say we end
The Heart-ake, and the thousand Naturall shockes
That Flesh is heyre too? 'Tis a consummation
Deuoutly to be wish'd. To dye to sleepe,
To sleepe, perchance to Dreame; I, there's the rub,
For in that sleepe of death, what dreames may come,
When we haue shuffel'd off this mortall coile,
Must giue vs pawse. There's the respect
That makes Calamity of so long life
[to, 1]
[be, 1]
[or, 1]
[not, 1]
[to, 1]
…
[and, 1]
[by, 1]
[opposing, 1]
[end, 1]
…
[that, 1]
[flesh, 1]
[is, 1]
[heyre, 1]
…
[for, 1]
[in, 1]
[that, 1]
[sleepe, 1]
[of, 1]
…
[to, 4]
[be, 2]
[or, 2]
[not, 1]
…
[to, 3]
…
[to, 5]
[be, 1]
…
…
[to, 7]
[be, 2]
[or, 2]
[not, 1]
…
[to, 5]
[be, 1]
…
[to, 12]
[be, 3]
[or, 2]
[not, 1]
…
Partition
Map Reduce Reduce Reduce
Node1Node2
©2017 Cloudreach 30
Let’s build Cablito
©2017 Cloudreach Big data, real use case 31
Cablito - Personal Data
User Personal information
Drives History
Drivers position (Per area)
©2017 Cloudreach Big data, real use case 32
Cablito - Event Processing
Booking requests
Rides Info
Logs
IN: Booking requests
OUT: Booking Accepts
Logs
Rides Infos
Request
Machine Learning Models
for Fraud detection
©2017 Cloudreach Big data, real use case 33
Cablito - Analytics
Real time Metrics
Historical Data
Aggregated data
BigBoss
Crazy Data Scientist
Building ML Models
Aggregating data...
©2017 Cloudreach
Questions?
Big data, real use case

More Related Content

What's hot

The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
Minhazul Arefin
 
Green cloud computing
Green cloud computingGreen cloud computing
Green cloud computing
Shreyas Khare
 
Car insurance - data visualization
Car insurance - data visualizationCar insurance - data visualization
Car insurance - data visualization
Saleesh Satheeshchandran
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
McKinsey on Marketing & Sales
 
Presentation uber
Presentation uberPresentation uber
Presentation uber
Souarv Dhar
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
Scaleway
 
Green Cloud Computing
Green Cloud ComputingGreen Cloud Computing
Green Cloud Computing
Seungyun Lee
 
From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...
Amazon Web Services Korea
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
Atidan Technologies Pvt Ltd (India)
 
Alibaba Cloud AI Solutions - ET Brain
Alibaba Cloud AI Solutions - ET BrainAlibaba Cloud AI Solutions - ET Brain
Alibaba Cloud AI Solutions - ET Brain
Eficode
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Neo4j
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
Viet-Trung TRAN
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
Amazon Web Services
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
In Marketing We Trust
 
Longenesis_Investors_TechChill.pdf
Longenesis_Investors_TechChill.pdfLongenesis_Investors_TechChill.pdf
Longenesis_Investors_TechChill.pdf
PaoloMalerba9
 
Identifying customer segments using machine learning
Identifying customer segments using machine learningIdentifying customer segments using machine learning
Identifying customer segments using machine learning
Knoldus Inc.
 
Big data
Big dataBig data
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
Alexey Grigorev
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
Umma Khatuna Jannat
 

What's hot (20)

The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Green cloud computing
Green cloud computingGreen cloud computing
Green cloud computing
 
Car insurance - data visualization
Car insurance - data visualizationCar insurance - data visualization
Car insurance - data visualization
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
Presentation uber
Presentation uberPresentation uber
Presentation uber
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Green Cloud Computing
Green Cloud ComputingGreen Cloud Computing
Green Cloud Computing
 
From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Alibaba Cloud AI Solutions - ET Brain
Alibaba Cloud AI Solutions - ET BrainAlibaba Cloud AI Solutions - ET Brain
Alibaba Cloud AI Solutions - ET Brain
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
 
Longenesis_Investors_TechChill.pdf
Longenesis_Investors_TechChill.pdfLongenesis_Investors_TechChill.pdf
Longenesis_Investors_TechChill.pdf
 
Identifying customer segments using machine learning
Identifying customer segments using machine learningIdentifying customer segments using machine learning
Identifying customer segments using machine learning
 
Big data
Big dataBig data
Big data
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 

Similar to Learn big data with Uber

Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
Nguyen Cao
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
rajkamaltibacademy
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
Kumari Surabhi
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
Rommel Garcia
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
Corey Huinker
 
bigdata 2.pptx
bigdata 2.pptxbigdata 2.pptx
bigdata 2.pptx
AjayAgarwal107
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
Gerd Prüßmann
 
Introduction to Big Data
Introduction  to Big DataIntroduction  to Big Data
Introduction to Big Data
Mike Frampton
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Ian Downard
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
 
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise ArchitecturesOvercoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
VMware Tanzu
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
ridhav
 
Big datainmemory pub
Big datainmemory pubBig datainmemory pub
Big datainmemory pub
Alexander Shvid
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
Ozgun Erdogan
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
Amihay Zer-Kavod
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
 

Similar to Learn big data with Uber (20)

Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
bigdata 2.pptx
bigdata 2.pptxbigdata 2.pptx
bigdata 2.pptx
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
 
Introduction to Big Data
Introduction  to Big DataIntroduction  to Big Data
Introduction to Big Data
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise ArchitecturesOvercoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
Big datainmemory pub
Big datainmemory pubBig datainmemory pub
Big datainmemory pub
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 

Recently uploaded

Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Learn big data with Uber

  • 1. ©2017 Cloudreach Learn Big Data with Uber Presented by Mark Thebault | 19/01/2018
  • 2. ©2017 Cloudreach Agenda Big Data, real use case How to get an Uber ? Use case Uber IoT, Handle high throughput Process data, real time, batch Let’s build a Big Data Platform
  • 3. ©2017 Cloudreach 3 Let’s talk about Uber
  • 4. ©2017 Cloudreach ● 8 Million users ● 160k drivers ● 400 cities around the world ● 1 million rides per day ● 2 billion rides recorded Big data, real use case 4
  • 5. ©2017 Cloudreach Dozens of cities opened each year Big data, real use case 5
  • 6. ©2017 Cloudreach Big data, real use case 6 Drivers location Courses requests Payments User feedback Metrics... Driver position every 4 seconds Riders requests Million requests per seconds
  • 7. ©2017 Cloudreach Big data, real use case 7 The architecture ● Handle throughput variation ● Fraud detection ● Low latency ● Scalability
  • 8. ©2017 Cloudreach 8 IoT - Imbibed of Tequila
  • 9. ©2017 Cloudreach Big data, real use case 9 IoT, How to Handle data? Uber use case: ● Different sources / Destination ● Different throughput ● Different format ● Several consumers for the same data
  • 10. ©2017 Cloudreach Big data, real use case 10 How to have good performances ? ● Distributed messaging Platform ● Created in 2009 by LinkedIn ● Known for its ○ Performances ○ Scalability ● Target: Centralise all data exchange
  • 11. ©2017 Cloudreach Big data, real use case 11 Clean the mess: Centralize data!
  • 12. ©2017 Cloudreach Big data, real use case 12 Why Kafka ? ● Horizontally scalable by adding new servers ● Enable very high throughput: Allow real time data flows ● Better than traditional brokers RabbitMQ, ActiveMQ ○ Less expensive ○ Better performance ○ Easily scalable
  • 13. ©2017 Cloudreach Big data, real use case 13 How it works ? ● Publisher / subscriber model ● Message are sent in Topics ● Producers inject data ● Consumers read data with a given offset ● A node is called a Broker
  • 14. ©2017 Cloudreach Big data, real use case 14 Stream data with Kafka ● Messages are stored on the hard drive ● All writings are executed in RAM ● By default a message is stored 7 days ● Topics are replicated into partitions ● Messages are stored in a Log file ○ Messages have an offset ○ Consumer handles its own offset (Better read performances)
  • 15. ©2017 Cloudreach Big data, real use case 15 Kafka topic anatomy ● Topic are split into partitions ● Partition are for fault tolerance ● Partitions have a leader server and zero or more follower servers ● Writes are handled by the leader ● Reads can be handled by the followers
  • 16. ©2017 Cloudreach 16 How to get an Uber ?
  • 17. ©2017 Cloudreach Big data, real use case 17 NoSQL Databases Uber’s database use case: Read user data, read driver’s positions… ● Fast response time, high throughput ● Your application scales, your database need also to ● Be always available - No downtime ● Storing large amount of data ● Reduce the price of using RDBMS for the same purpose
  • 18. ©2017 Cloudreach Big data, real use case 18 Apache Cassandra ● Initially developed by Facebook in 2008 ● Column oriented by default tables use schemas ● Built to be deployed in very large scale across different data-centers ● Values are identified by a unique key ○ RowKey (unique ID) ○ Column Name ○ Column Value ○ Default timestamp created by Cassandra ○ Expiration date (optional)
  • 19. ©2017 Cloudreach Big data, real use case 19 Cassandra Behind the scenes ● Peer-to-Peer (P2P) ○ No master, no slaves ● Multi-Datacenter ○ Geographical distribution ○ Segregation operational / analytic ● Gossip protocol ○ Once per second ○ Exchange cluster informations ○ With at least 3 randomly chosen nodes
  • 20. ©2017 Cloudreach Big data, real use case 20 Cassandra Data Model ● Columns with defined Schema ● Partition key ○ Same Pkeys on one node ○ Choose wisely the partition key performances depends on it ! ● Clustering key ○ Get an extract of columns ○ Used in WHERE clauses ● Static columns ○ Values shared across all lines
  • 21. ©2017 Cloudreach Big data, real use case 21 Cassandra CQL Language
  • 22. ©2017 Cloudreach Big data, real use case 22 Cassandra Consistency Management ● Customisable consistency ○ Writing: customise number of acknowledgments ○ Reading: number of reads done ● Levels ○ ONE ○ QUORUM ○ ALL… ● Consistency level defined at level query
  • 23. ©2017 Cloudreach 23 Process data, real time, batch
  • 24. ©2017 Cloudreach Big data, real use case 24 What the data is telling ? Real Time processing ● Create a multiplication factor when there is an event (Football party…) ● Find the best customer to fit in one uberpool Batch processing ● Get daily analytics ● Find fraud drivers Problem: Gigabytes per second of data, teras of data process each day
  • 25. ©2017 Cloudreach Big data, real use case 25 Distributed processing ● Involve large number of computer system ○ Computers in a same area network ○ Large bandwidth is required ● Parallel processing ○ Split the processing in different tasks ○ Each computer does its own calculation ○ Results are merged ● Not all processing are suitable to be parallelized, be aware when you model them
  • 26. ©2017 Cloudreach Big data, real use case 26 What is Spark ? ● Open source framework (Apache) ● Processing of large volumes of data ● Faster than Hadoop MapReduce ● Distributed processing framework ● Main Focuses ○ Streaming ○ Machine Learning ○ Extract Transform Load (ETL)
  • 27. ©2017 Cloudreach Big data, real use case 27 Spark Architecture Cluster Management ● Spark Standalone ● Mesos ● Yarn ● Kubernetes (beta)
  • 28. ©2017 Cloudreach Big data, real use case 28 What is MapReduce ? ● Pattern invented in 2004 by Google ● A dataset is split into partitions: ○ Map: applies a transformation ○ Reduce: aggregates the partitions ● Items are distributed across the network: distributed processing ● Hadoop MapReduce is an implementation of this pattern
  • 29. ©2017 Cloudreach Big data, real use case 29 What is MapReduce ? - Word count To be, or not to be, that is the Question: Whether 'tis Nobler in the minde to suffer The Slings and Arrowes of outragious Fortune, Or to take Armes against a Sea of troubles, And by opposing end them: to dye, to sleepe No more; and by a sleepe, to say we end The Heart-ake, and the thousand Naturall shockes That Flesh is heyre too? 'Tis a consummation Deuoutly to be wish'd. To dye to sleepe, To sleepe, perchance to Dreame; I, there's the rub, For in that sleepe of death, what dreames may come, When we haue shuffel'd off this mortall coile, Must giue vs pawse. There's the respect That makes Calamity of so long life [to, 1] [be, 1] [or, 1] [not, 1] [to, 1] … [and, 1] [by, 1] [opposing, 1] [end, 1] … [that, 1] [flesh, 1] [is, 1] [heyre, 1] … [for, 1] [in, 1] [that, 1] [sleepe, 1] [of, 1] … [to, 4] [be, 2] [or, 2] [not, 1] … [to, 3] … [to, 5] [be, 1] … … [to, 7] [be, 2] [or, 2] [not, 1] … [to, 5] [be, 1] … [to, 12] [be, 3] [or, 2] [not, 1] … Partition Map Reduce Reduce Reduce Node1Node2
  • 31. ©2017 Cloudreach Big data, real use case 31 Cablito - Personal Data User Personal information Drives History Drivers position (Per area)
  • 32. ©2017 Cloudreach Big data, real use case 32 Cablito - Event Processing Booking requests Rides Info Logs IN: Booking requests OUT: Booking Accepts Logs Rides Infos Request Machine Learning Models for Fraud detection
  • 33. ©2017 Cloudreach Big data, real use case 33 Cablito - Analytics Real time Metrics Historical Data Aggregated data BigBoss Crazy Data Scientist Building ML Models Aggregating data...