SlideShare a Scribd company logo
Scaling up Uber’s Real-time Data Analytics
Xiang Fu
James Shao
Agenda
● Use Cases
● Streaming Data Infrastructure
● Streaming Processing Platform
● Streaming Analytics Platform
● Future Work
Agenda
● Use Cases
● Streaming Data Infrastructure
● Streaming Processing Platform
● Streaming Analytics Platform
● Future Work
Stream
Processing
- Driver-Rider Match
- ETA
App Views
Vehicle information
KAFKA
Real-time Driver-Rider Matching
UberEATS - Real-Time ETAs
UberEATS - Real-Time Analytics
A bunch more...
● Fraud Detection
● Share My ETA
● Safety
● Etc.
Agenda
● Use Cases
● Streaming Data Infrastructure
● Streaming Processing Platform
● Streaming Analytics Platform
● Future Work
Trillion+ ~PBs
Messages/Day Data Volume
Scale
excluding replication
Tens of Thousands
Topics
Requirements
● High Throughput
● Low Latency for most use cases (<1ms )
● Reliability - At least 99.99%, and 100% for critical use cases
● At-least-once/Cross-DC pipeline for business critical use case
● Multi-Language Support (Go/Java/Python/C++)
● Tens of thousands of simultaneous clients
● Reliable data replication across DC
PRODUCERS
CONSUMERS
Real-time
Analytics, Alerts,
Dashboards
Samza / Flink
Applications
Data Science
Analytics
Reporting
Kafka
Vertica / Hive
Rider App
Driver App
API / Services
Etc.
Ad-hoc Exploration
ELK
Ecosystem @ Uber
Debugging
Hadoop
Payment
Payment
processing
Cassandra
Schemaless
MySQL
DATABASES
AWS S3
Kafka Pipeline
DC2
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Aggregate
Kafka
uReplicator
Offset Sync Service
Aggregate
Kafka
uReplicator
Regular Kafka Data Flow
● Provide 99.99% data durability guarantee, latency < 1ms
● Relies heavily on Batching/buffering
● Cost-effective storage and guarantee performance for critical components
● Target majority of use cases:
○ ETA
○ Logging
○ Business events
At-least-once/XDC Kafka Data flow
● At-least-once Kafka cluster
○ 100% Durability, 10-20 ms produce latency
○ Expensive to operate (harder to scale)
○ Use case: critical business events, driver/rider signup, DB
change-log
● Cross-DC Kafka cluster
○ Cluster consist of machines from multiple Datacenter
○ 100% durability even one DC is gone
○ Most expensive to operate
○ Use case: payments, insurance sign-up, etc
Auditing - Chaperone
Auditing - Chaperone
● Small embedded client in each layer of Kafka components
● Collect and aggregate data for each Kafka topic
● Provide report on data completeness and latency
● Alert developers if completeness/latency metrics is below SLA
Agenda
● Use Cases
● Streaming Data Infrastructure
● Streaming Processing Platform
● Streaming Analytics Platform
● Future Work
Uber’s Business is Real-Time
Challenges
Infrastructure
● 100s of Billions of
messages/day
● At-least-once
Processing
● Exact-once state
Processing
● 99.99% SLA on
Availability
● 99.99% SLA on
Latency
Operation
● ~200+ Streaming jobs
● Multiple Data Centers
Productivity
● Target Audience
○ Ops
○ Data Scientists
○ Engineers
● Integration
○ Logging
○ Backend Services
○ Storage Systems
○ Data Management
○ Monitoring
○ Reporting
Streaming Job Lifecycle
Job
Resource
Estimation
Streaming
Job Config
Job
Metadata
Config
Job Profiling
Monitoring
and Alerts
Logging
Business
Logic
Deployment Maintenance
Upgrade
All Active/
Failovers
Security
Testing &
Debugging
Blue: Job Specific
Orange: Common Modules
Job Definition
Job Deployment and Maintenance
SQL to be the savior
60-70% of jobs could be
expressed as SQL
AthenaX Approach
Write SQLs to build streaming applications
Why Flink
● Apache Calcite (SQL) Integration
● Easy to manage and scale
● Stateful and fault tolerant
● Accurate (Exactly Once Semantics)
● HDFS integration
● Not dependent on Kafka
● Active Community
Case study of UberEats
Predict the ETD
● Key metric: time to prepare a meal (tprep
)
● Learn a function f: (order status) → tprep
periodically
● Predict the ETD for current orders using f
● AthenaX extracts features for both learnings and predictions
Architecture of ETD services
Job Definition
User Defined
Functions
Window based
aggregation
Input Connector Output Connector Environments
Job Resource
Estimation
Job Validation & Resource Estimation
Job Generator
Deployment
WatchDog
Job Validation
Resource
Estimation
UI● Validations
○ SQL Validation
■ Syntax
■ Semantics
○ Input Source Validation
○ Destination Validation
● Resource Estimation
○ Kafka input rate
○ Kafka peak rate
○ Kafka partitions
○ Type of Query
○ Output connector type
Executing AthenaX Applications
Compile SQLs to Flink Job
Job Generator
Deployment
WatchDog
Job Validation
Resource
Estimation
UI● Compilation & Job Generation
■ Compiler: SQL -> Logical plan -> Flink app
■ Optimizer: Flink app -> Optimized Logical
plan -> Physical plan -> Job Graph
SELECT AVG(meal_prep_time) FROM
eats_order
GROUP BY HOP(proctime(),
INTERVAL ‘1’ MINUTE,
INTERVAL ‘15’ MINUTE)
val eats = getEatsOrder()
eats.window(Slide.over(“15.minutes”)
.every(“1.minute”))
 .avg(“meal_prep_time”)
AthenaX Deployment
Job Generator
Deployment
WatchDog
Job Validation
Resource
Estimation
UI● Job Data store (Mysql)
○ Job Instances
○ Job Config
○ Instance Config
● Resource Management
○ Isolation
○ Validation
○ Utilization
● Job Promotion
○ Self-Serve Flink on YARN
HDFS
WatchDog
Job Generator
Deployment
WatchDog
Job Validation
Resource
Estimation
UI
Operational Work
● Monitoring and Alerting
● Auto Scaling
○ Organic growth
○ Bounded Resources
increase
● Failover handling
● Failure recovery
100s of jobs - Operational nightmare
Conclusion
● AthenaX: write SQL to build streaming applications
○ Treat table as a generic concept
○ Productivity: development -> production in hours
● The AthenaX Approach
○ SQL on streams as a platform
○ Self-serving production support end-to-end
Agenda
● Use Cases & Scale
● Streaming Data Infrastructure
● Streaming Processing Platform
● Streaming Analytics Platform
● Future Work
Real-Time Analytics Use Cases - Dashboarding
● Target Users:
○ CityOps
○ Executives
● Ingestion latency
○ secs to mins
● Query latency
○ < 1s
● QPS: medium
Use Cases - Adhoc Queries
● Target Users:
○ Data Scientists
○ CityOps
● Ingestion latency
○ mins
● Query latency
○ A few seconds
● QPS: low
Use Cases - Machine Decisions
● Target Users:
○ Applications
● Ingestion latency
○ secs to mins
● Query latency
○ ms
● QPS: high
Challenges
Infrastructure
● 100+TB Storage
● Multi-tenancy
● 99.99% SLA on
availability
● 99.9% SLA on
data accuracy
● ms to sec level
query latency
● sec to min level
ingestion latency
● Geo-spatial query
● GDPR
Accessibility
● Query Language
● Table DDL
● Table SLA
Operation
● 100+ Tables
● Multiple Data Centers
● Schema Evolution
● Data Backfill
Productivity
● Target Audience
○ Ops
○ Data Scientists
○ Engineers
● Integration
○ Data Management
○ Dashboarding
○ Reporting
○ Monitoring
RTA Architectural Overview
RTA Query
● Adhoc
○ Presto as Federation layer
○ Joins
● Pre-defined
○ Optimization
○ Multi-tenancy
○ Rate-Limiting
○ Caching
Facets of Analytical Data
Data Freshness
Query Latency
Data Retention
Accuracy
Cost
Primary Facets
Secondary Facet
Facets of Analytical Data
Cost
Fresh Data
+
Accuracy
+
High Retention
RTA Storage
● Columnar OLAP
open-sourced by LinkedIn
● Intended for low qps, large
data volume with
low-medium query latency
● Use cases: ad-hoc queries
that are not highly latency
sensitive
● GPU-based analytical database
built in-house
● Intended for high qps, low data
volume with very low query
latency
● Use cases: predefined queries
that are latency sensitive
g DB
RTA UMS (Unified Metadata Service)
Logical Schema
Logical DDL
gForceDB
Pinot Schema
Pinot DDL
gForceDB Schema
gForceDB DDL
● Onboarding
● Query Routing
● Federation
RTA Ingestion
● Leverage Existing AthenaX
Framework
● One SQL for Streaming & Batch
Conclusion
● One onboarding story
○ Unified ingestion pipeline
○ SQL as only query language
○ Hide storage complexity from the end users
● Cost efficiency
Agenda
● Use Cases
● Streaming Data Infrastructure
● Streaming Processing Platform
● Streaming Analytics Platform
● Future Work
Future Work
● Multi-Zone
● Streaming-Batch Unification
● Open Source
Links
Blog Post
● uReplicator: Uber Engineering’s Robust Kafka Replicator
● Introducing Chaperone: How Uber Engineering Audits Kafka End-to-End
● Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform
● Engineering Restaurant Manager, our UberEATS Analytics Dashboard
Open Source
● Kafka ​uReplicator​ open sourced in Aug 2016
● Kafka Chaperone open sourced in Dec 2016
● AthenaX​ open sourced in Oct 2017
Thank you
More open-source projects at eng.uber.com

More Related Content

What's hot

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Dimitris Kontokostas
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
Kishore Gopalakrishna
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Ilias Okacha
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Amazon Web Services
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Slim Baltagi
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
Mohammed Fazuluddin
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Sumit Maheshwari
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
Martin Traverso
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To Kibana
Jen Stirrup
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
Dhrubaji Mandal ♛
 
美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化
confluent
 

What's hot (20)

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To Kibana
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化
 

Similar to Scaling up uber's real time data analytics

Druid Optimizations for Scaling Customer Facing Analytics
Druid Optimizations for Scaling Customer Facing AnalyticsDruid Optimizations for Scaling Customer Facing Analytics
Druid Optimizations for Scaling Customer Facing Analytics
Amir Youssefi
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
markgrover
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
Bowen Li
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
Karthik Murugesan
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
Lars Albertsson
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
Digital Vidya
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Navina Ramesh
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
Cesar Cardenas Desales
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Value Association
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
Prateek Maheshwari
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
Shubham Tagra
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
Lars Albertsson
 

Similar to Scaling up uber's real time data analytics (20)

Druid Optimizations for Scaling Customer Facing Analytics
Druid Optimizations for Scaling Customer Facing AnalyticsDruid Optimizations for Scaling Customer Facing Analytics
Druid Optimizations for Scaling Customer Facing Analytics
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 

Scaling up uber's real time data analytics

  • 1. Scaling up Uber’s Real-time Data Analytics Xiang Fu James Shao
  • 2. Agenda ● Use Cases ● Streaming Data Infrastructure ● Streaming Processing Platform ● Streaming Analytics Platform ● Future Work
  • 3. Agenda ● Use Cases ● Streaming Data Infrastructure ● Streaming Processing Platform ● Streaming Analytics Platform ● Future Work
  • 4. Stream Processing - Driver-Rider Match - ETA App Views Vehicle information KAFKA Real-time Driver-Rider Matching
  • 7. A bunch more... ● Fraud Detection ● Share My ETA ● Safety ● Etc.
  • 8. Agenda ● Use Cases ● Streaming Data Infrastructure ● Streaming Processing Platform ● Streaming Analytics Platform ● Future Work
  • 9. Trillion+ ~PBs Messages/Day Data Volume Scale excluding replication Tens of Thousands Topics
  • 10. Requirements ● High Throughput ● Low Latency for most use cases (<1ms ) ● Reliability - At least 99.99%, and 100% for critical use cases ● At-least-once/Cross-DC pipeline for business critical use case ● Multi-Language Support (Go/Java/Python/C++) ● Tens of thousands of simultaneous clients ● Reliable data replication across DC
  • 11. PRODUCERS CONSUMERS Real-time Analytics, Alerts, Dashboards Samza / Flink Applications Data Science Analytics Reporting Kafka Vertica / Hive Rider App Driver App API / Services Etc. Ad-hoc Exploration ELK Ecosystem @ Uber Debugging Hadoop Payment Payment processing Cassandra Schemaless MySQL DATABASES AWS S3
  • 12. Kafka Pipeline DC2 DC1 Applications [ProxyClient] Kafka REST Proxy Regional Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka Local Agent Aggregate Kafka uReplicator Offset Sync Service Aggregate Kafka uReplicator
  • 13. Regular Kafka Data Flow ● Provide 99.99% data durability guarantee, latency < 1ms ● Relies heavily on Batching/buffering ● Cost-effective storage and guarantee performance for critical components ● Target majority of use cases: ○ ETA ○ Logging ○ Business events
  • 14. At-least-once/XDC Kafka Data flow ● At-least-once Kafka cluster ○ 100% Durability, 10-20 ms produce latency ○ Expensive to operate (harder to scale) ○ Use case: critical business events, driver/rider signup, DB change-log ● Cross-DC Kafka cluster ○ Cluster consist of machines from multiple Datacenter ○ 100% durability even one DC is gone ○ Most expensive to operate ○ Use case: payments, insurance sign-up, etc
  • 16. Auditing - Chaperone ● Small embedded client in each layer of Kafka components ● Collect and aggregate data for each Kafka topic ● Provide report on data completeness and latency ● Alert developers if completeness/latency metrics is below SLA
  • 17. Agenda ● Use Cases ● Streaming Data Infrastructure ● Streaming Processing Platform ● Streaming Analytics Platform ● Future Work
  • 18. Uber’s Business is Real-Time
  • 19. Challenges Infrastructure ● 100s of Billions of messages/day ● At-least-once Processing ● Exact-once state Processing ● 99.99% SLA on Availability ● 99.99% SLA on Latency Operation ● ~200+ Streaming jobs ● Multiple Data Centers Productivity ● Target Audience ○ Ops ○ Data Scientists ○ Engineers ● Integration ○ Logging ○ Backend Services ○ Storage Systems ○ Data Management ○ Monitoring ○ Reporting
  • 20. Streaming Job Lifecycle Job Resource Estimation Streaming Job Config Job Metadata Config Job Profiling Monitoring and Alerts Logging Business Logic Deployment Maintenance Upgrade All Active/ Failovers Security Testing & Debugging Blue: Job Specific Orange: Common Modules Job Definition Job Deployment and Maintenance
  • 21. SQL to be the savior 60-70% of jobs could be expressed as SQL
  • 22. AthenaX Approach Write SQLs to build streaming applications
  • 23. Why Flink ● Apache Calcite (SQL) Integration ● Easy to manage and scale ● Stateful and fault tolerant ● Accurate (Exactly Once Semantics) ● HDFS integration ● Not dependent on Kafka ● Active Community
  • 24. Case study of UberEats
  • 25. Predict the ETD ● Key metric: time to prepare a meal (tprep ) ● Learn a function f: (order status) → tprep periodically ● Predict the ETD for current orders using f ● AthenaX extracts features for both learnings and predictions
  • 27. Job Definition User Defined Functions Window based aggregation Input Connector Output Connector Environments Job Resource Estimation
  • 28. Job Validation & Resource Estimation Job Generator Deployment WatchDog Job Validation Resource Estimation UI● Validations ○ SQL Validation ■ Syntax ■ Semantics ○ Input Source Validation ○ Destination Validation ● Resource Estimation ○ Kafka input rate ○ Kafka peak rate ○ Kafka partitions ○ Type of Query ○ Output connector type
  • 29. Executing AthenaX Applications Compile SQLs to Flink Job Job Generator Deployment WatchDog Job Validation Resource Estimation UI● Compilation & Job Generation ■ Compiler: SQL -> Logical plan -> Flink app ■ Optimizer: Flink app -> Optimized Logical plan -> Physical plan -> Job Graph SELECT AVG(meal_prep_time) FROM eats_order GROUP BY HOP(proctime(), INTERVAL ‘1’ MINUTE, INTERVAL ‘15’ MINUTE) val eats = getEatsOrder() eats.window(Slide.over(“15.minutes”) .every(“1.minute”))  .avg(“meal_prep_time”)
  • 30. AthenaX Deployment Job Generator Deployment WatchDog Job Validation Resource Estimation UI● Job Data store (Mysql) ○ Job Instances ○ Job Config ○ Instance Config ● Resource Management ○ Isolation ○ Validation ○ Utilization ● Job Promotion ○ Self-Serve Flink on YARN HDFS
  • 31. WatchDog Job Generator Deployment WatchDog Job Validation Resource Estimation UI Operational Work ● Monitoring and Alerting ● Auto Scaling ○ Organic growth ○ Bounded Resources increase ● Failover handling ● Failure recovery 100s of jobs - Operational nightmare
  • 32. Conclusion ● AthenaX: write SQL to build streaming applications ○ Treat table as a generic concept ○ Productivity: development -> production in hours ● The AthenaX Approach ○ SQL on streams as a platform ○ Self-serving production support end-to-end
  • 33. Agenda ● Use Cases & Scale ● Streaming Data Infrastructure ● Streaming Processing Platform ● Streaming Analytics Platform ● Future Work
  • 34. Real-Time Analytics Use Cases - Dashboarding ● Target Users: ○ CityOps ○ Executives ● Ingestion latency ○ secs to mins ● Query latency ○ < 1s ● QPS: medium
  • 35. Use Cases - Adhoc Queries ● Target Users: ○ Data Scientists ○ CityOps ● Ingestion latency ○ mins ● Query latency ○ A few seconds ● QPS: low
  • 36. Use Cases - Machine Decisions ● Target Users: ○ Applications ● Ingestion latency ○ secs to mins ● Query latency ○ ms ● QPS: high
  • 37. Challenges Infrastructure ● 100+TB Storage ● Multi-tenancy ● 99.99% SLA on availability ● 99.9% SLA on data accuracy ● ms to sec level query latency ● sec to min level ingestion latency ● Geo-spatial query ● GDPR Accessibility ● Query Language ● Table DDL ● Table SLA Operation ● 100+ Tables ● Multiple Data Centers ● Schema Evolution ● Data Backfill Productivity ● Target Audience ○ Ops ○ Data Scientists ○ Engineers ● Integration ○ Data Management ○ Dashboarding ○ Reporting ○ Monitoring
  • 39. RTA Query ● Adhoc ○ Presto as Federation layer ○ Joins ● Pre-defined ○ Optimization ○ Multi-tenancy ○ Rate-Limiting ○ Caching
  • 40. Facets of Analytical Data Data Freshness Query Latency Data Retention Accuracy Cost Primary Facets Secondary Facet
  • 41. Facets of Analytical Data Cost Fresh Data + Accuracy + High Retention
  • 42. RTA Storage ● Columnar OLAP open-sourced by LinkedIn ● Intended for low qps, large data volume with low-medium query latency ● Use cases: ad-hoc queries that are not highly latency sensitive ● GPU-based analytical database built in-house ● Intended for high qps, low data volume with very low query latency ● Use cases: predefined queries that are latency sensitive g DB
  • 43. RTA UMS (Unified Metadata Service) Logical Schema Logical DDL gForceDB Pinot Schema Pinot DDL gForceDB Schema gForceDB DDL ● Onboarding ● Query Routing ● Federation
  • 44. RTA Ingestion ● Leverage Existing AthenaX Framework ● One SQL for Streaming & Batch
  • 45. Conclusion ● One onboarding story ○ Unified ingestion pipeline ○ SQL as only query language ○ Hide storage complexity from the end users ● Cost efficiency
  • 46. Agenda ● Use Cases ● Streaming Data Infrastructure ● Streaming Processing Platform ● Streaming Analytics Platform ● Future Work
  • 47. Future Work ● Multi-Zone ● Streaming-Batch Unification ● Open Source
  • 48. Links Blog Post ● uReplicator: Uber Engineering’s Robust Kafka Replicator ● Introducing Chaperone: How Uber Engineering Audits Kafka End-to-End ● Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform ● Engineering Restaurant Manager, our UberEATS Analytics Dashboard Open Source ● Kafka ​uReplicator​ open sourced in Aug 2016 ● Kafka Chaperone open sourced in Dec 2016 ● AthenaX​ open sourced in Oct 2017
  • 49. Thank you More open-source projects at eng.uber.com