SlideShare a Scribd company logo
1 of 43
Download to read offline
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Zeus: Uber’s Highly Scalable and
Distributed Shuffle as a Service
Mayank Bansal, Data Infra, Uber
Bo Yang, Data Infra, Uber
Igniting opportunity by setting the world in motion
15 billion trips
18M trips per day
6 continents, 69 countries and 10,000 cities
103M active monthly users
5M active drivers
22,000 employees worldwide
3,700 developers worldwide
2
Data and ML Use Cases at Uber
○ Uber Eats
○ ETAs
○ Self-Driving Vehicles
○ Customer Support
○ Driver/Rider Match
○ Personalization
○ Demand Modeling
○ Dynamic Pricing
○ Forecasting
○ Maps
○ Fraud
○ Anomaly Detection
○ Capacity Planning
○ And many more...
Data and ML at Uber - ETAs
○ ETAs are core to the Uber customer experience
○ ETAs used by myriad internal systems
○ ETA are generated by route-based algorithms
○ ML models predict the route-based ETA error
○ Uber uses the predicted error to correct the
ETA
○ ETAs now dramatically more accurate
Data and ML at Uber - Driver/Rider Match
○ Optimize matchings of riders and drivers
on the Uber platform
○ Predict if open rider app will make trip
request
Data and ML at Uber - Eats
○ Models used for
○ Ranking of restaurants and
dishes
○ Delivery times
○ Search ranking
○ 100s of ML models called to
render Eats homepage
Data and ML at Uber - Self-Driving Vehicles
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Uber’s Data Stack
Mobile App Events
Device Telemetry
Micro-Service Events
Database Events
3rd Party Feeds
Bulk Uploads
Incremental
Ingestion
Kafka
Realtime, Pre-Aggregated
(AthenaX)
Ad hoc, Interactive
(Presto, Vertica)
Complex, Batch
(Hive)
Dashboards
(Summary, Dashbuilder)
Ad hoc Query
(QueryBuilder)
Data Preparation
(Piper, uWorc)
BI Tools
(Tableau, DSW)
Stream Processing
(Flink)
Batch Processing
(Spark, Tez, Map Reduce)
Compute Fabric (YARN / Mesos + Peloton)
Data Analytics Tools
In-memory
(Pinot,
AresDB)
Hot
(HDFS)
Warm
(HDFS)
Archival
(Cloud)
Query Engines
Data Processing Engines
Tiered Data Lake
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Uber’s ML Stack - Michelangelo
Kafka
Compute Fabric (YARN / Peloton+Mesos)
Data Analytics Tools
Query EnginesStream
Processing
(Flink)
Batch
Processing
(Hive,
Spark, Tez)
Data
Preparation
Jupyter
Notebook
Spark
Magic
Prototype
Tensorflow
Training
Pytorch
XGBoost
SparkML
Feature
Store
Model
Store
Metrics
Store
DataLake
(HDFS)
Inference
Realtime
Prediction
Service
Batch
Prediction
Jobs
Apache Spark
@Uber
Image Source: www.mindproject.io
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
○ Apache Spark is the primary analytics execution
engine teams at Uber use
○ At Uber, 95% batch and ML jobs run on Spark
○ We run Spark on YARN and Peloton/Mesos
○ We use external shuffle service for the
shuffle data
Apache Spark @ Uber
11
* Apache Hadoop,, Spark, and Mesos logos are either registered trademarks or trademarks of the Apache
Software Foundation in the United States and/or other countries. No endorsement by The Apache
Software Foundation is implied by the use of these marks. TensorFlow and the TensorFlow logo are
trademarks of Google Inc. Redis is a trademark of Redis Labs Ltd. Any rights therein are reserved to Redis
Labs Ltd. Any use by Uber Technologies is for referential purposes only and does not indicate any
sponsorship, endorsement or affiliation between Redis and Uber Technologies.
12
How Does Apache Spark Shuffle Service Work?
Limitations of Apache Spark Shuffle Service
13
● SSD wearing out Issues
● Reliability
● Kubernetes dynamic allocation
● Collocation
Different Approaches
14
● Shuffle manager to external storage
○ Synchronous writes
■ NFS
● 2X slow
■ HDFS
● 5X slow
Different Approaches
15
● Shuffle manager to external storage
○ Semi-asynchronous writes
■ HDFS
● 4x slow
Different Approaches
16
● Remote Shuffle Service
○ Streaming writes to HDFS
■ 1.5x slower than writing to local storage
○ Streaming Writes to Local
■ ~Same Performance like external shuffle service
Remote Shuffle Service
17
● Remote Shuffle Service
○ Streaming Writes to Local Storage
■ Changed Mapreduce paradigm
■ Record Stream -> Shuffle Server -> Disk
■ No temporary spill files in executor side
Architecture - Remote Spark Shuffle Service
18
Deep Dive
Image Source: www.mindproject.io
Design Principles
20
● Scale out horizontally
○ Each server instance works independently
○ Avoid centralized state/storage
● Tackle network latency
○ Reduce waiting times for server response
○ Stream data
● Performance optimization
○ Most Spark Apps optimized for similar performance
○ Rely on YARN/Apache Spark retry for failure recovery
Scale Out
Tackle Network Latency
Performance Optimization
21
Horizontal Scalable
22
● Spark applications share/use different shuffle servers
● No shared state among shuffle servers
● More shuffle servers to scale out
Shuffle Server Distribution
23
● Mappers: m=4
● Reducers: r=5
● Shuffle Servers: s=3
Shuffle Server Distribution in General
24
● Mappers: m
● Reducers: r
● Shuffle Servers: s
● Network Connections
○ Mappers: m*s connections
○ Reducers: r connections
Scale Out
Tackle Network Latency
Performance Optimization
25
Server Implementation
26
● Use Netty
○ High performance asynchronous server framework
● Two thread groups
○ Group 1: Accept new socket connection
○ Group 2: Read socket data
○ Thread groups not block each other
● Binary network protocol
○ Efficient encoding/compression
Direct Write/Read on Disk File
27
● Write to OS file directly
○ No application level buffering
● Zero copy
○ Transfer data from disk file to shuffle reader without user space
memory
● Sequential write/read
○ No random disk IO
Client Side Compression
28
● Shuffle client compress/decompress data
● Reduce network transport data size
● Reduce CPU usage on shuffle server
● Support client side encryption
○ Encryption key inside each application
○ Encryption key not distributed to shuffle server
Parallel Serialization and Network IO
29
● Shuffle data serialization takes time
● Serialization in executor thread
● Network IO in another thread
Connection Pool
30
● Socket connect latency is not trivial
● Reuse client/server connections
Scale Out
Tackle Network Latency
Performance Optimization
31
Asynchronous Shuffle Data Commit
32
● Map task
○ Stream data to server
○ Not wait for response
● Server flushes (commits) data asynchronously
● Reduce task queries data availability when fetching data
Fault Tolerance
33
Shuffle Server Discovery/Health Check
34
● ZooKeeper as Server
Registry
Data Replica
35
● Server Replication Group
● Duplicate Write in Parallel
● Read from Single Server,
switch to another server on
failure
Local State Flush
36
● Local state persistence in batch
○ Avoid flushing state for each map task
○ Flush when shuffle stage finishes
● Client not waiting for server side state flush
Production Status
37
Compatible with Open Source Apache Spark
38
● Shuffle Manager Plugin
○ spark.shuffle.manager=
org.apache.spark.shuffle.RssShuffleManager
● MapStatus / MapOutputTracker
○ Embed remote shuffle service related data inside MapStatus
○ Query MapOutputTracker to retrieve needed information
Metrics/Monitoring
39
● Uber’s open source M3 metrics library
● Important metrics
○ Network connections
○ File descriptors
○ Disk utilization
Test Strategy
40
● Unit Test
● Stress/Random Test
● Production Query Sampling
Remote Spark Shuffle Service - Production Status
41
● In production in last 8+ months for
YARN
● Thousand’s of application running
every day
● Job latencies are on par with
external shuffle
● Open sourcing it soon!
Roadmap
42
● Support all Spark workloads including HiveOnSpark
● Multi-tenancy (quota)
● Load balancing
● Integrate with incoming Spark shuffle metadata APIs
Proprietary and confidential © 2020 Uber Technologies, Inc. All rights reserved. No part of this
document may be reproduced or utilized in any form or by any means, electronic or mechanical,
including photocopying, recording, or by any information storage or retrieval systems, without
permission in writing from Uber. This document is intended only for the use of the individual or entity
to whom it is addressed and contains information that is privileged, confidential or otherwise exempt
from disclosure under applicable law. All recipients of this document are notified that the information
contained herein includes proprietary and confidential information of Uber, and recipient may not
make use of, disseminate, or in any way disclose this document or any of the enclosed information
to any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.
4
3
Thank you !!!

More Related Content

What's hot

Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdfRun Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdfAnya Bida
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystifiedOmid Vahdaty
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan BlueDatabricks
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeDremio Corporation
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 

What's hot (20)

Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdfRun Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 

Similar to Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service

Uber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitUber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitZhenxiao Luo
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Even Faster: When Presto meets Parquet @ Uber
Even Faster: When Presto meets Parquet @ UberEven Faster: When Presto meets Parquet @ Uber
Even Faster: When Presto meets Parquet @ UberDataWorks Summit
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Zhenxiao Luo
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at UberDataWorks Summit
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Zhenxiao Luo
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017Zhenxiao Luo
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the CloudAmihay Zer-Kavod
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Zhenxiao Luo
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uberconfluent
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016Mark Smith
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 

Similar to Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service (20)

Uber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitUber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks Summit
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Even Faster: When Presto meets Parquet @ Uber
Even Faster: When Presto meets Parquet @ UberEven Faster: When Presto meets Parquet @ Uber
Even Faster: When Presto meets Parquet @ Uber
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at Uber
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service

  • 1. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service Mayank Bansal, Data Infra, Uber Bo Yang, Data Infra, Uber Igniting opportunity by setting the world in motion
  • 2. 15 billion trips 18M trips per day 6 continents, 69 countries and 10,000 cities 103M active monthly users 5M active drivers 22,000 employees worldwide 3,700 developers worldwide 2
  • 3. Data and ML Use Cases at Uber ○ Uber Eats ○ ETAs ○ Self-Driving Vehicles ○ Customer Support ○ Driver/Rider Match ○ Personalization ○ Demand Modeling ○ Dynamic Pricing ○ Forecasting ○ Maps ○ Fraud ○ Anomaly Detection ○ Capacity Planning ○ And many more...
  • 4. Data and ML at Uber - ETAs ○ ETAs are core to the Uber customer experience ○ ETAs used by myriad internal systems ○ ETA are generated by route-based algorithms ○ ML models predict the route-based ETA error ○ Uber uses the predicted error to correct the ETA ○ ETAs now dramatically more accurate
  • 5. Data and ML at Uber - Driver/Rider Match ○ Optimize matchings of riders and drivers on the Uber platform ○ Predict if open rider app will make trip request
  • 6. Data and ML at Uber - Eats ○ Models used for ○ Ranking of restaurants and dishes ○ Delivery times ○ Search ranking ○ 100s of ML models called to render Eats homepage
  • 7. Data and ML at Uber - Self-Driving Vehicles
  • 8. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Uber’s Data Stack Mobile App Events Device Telemetry Micro-Service Events Database Events 3rd Party Feeds Bulk Uploads Incremental Ingestion Kafka Realtime, Pre-Aggregated (AthenaX) Ad hoc, Interactive (Presto, Vertica) Complex, Batch (Hive) Dashboards (Summary, Dashbuilder) Ad hoc Query (QueryBuilder) Data Preparation (Piper, uWorc) BI Tools (Tableau, DSW) Stream Processing (Flink) Batch Processing (Spark, Tez, Map Reduce) Compute Fabric (YARN / Mesos + Peloton) Data Analytics Tools In-memory (Pinot, AresDB) Hot (HDFS) Warm (HDFS) Archival (Cloud) Query Engines Data Processing Engines Tiered Data Lake
  • 9. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Uber’s ML Stack - Michelangelo Kafka Compute Fabric (YARN / Peloton+Mesos) Data Analytics Tools Query EnginesStream Processing (Flink) Batch Processing (Hive, Spark, Tez) Data Preparation Jupyter Notebook Spark Magic Prototype Tensorflow Training Pytorch XGBoost SparkML Feature Store Model Store Metrics Store DataLake (HDFS) Inference Realtime Prediction Service Batch Prediction Jobs
  • 10. Apache Spark @Uber Image Source: www.mindproject.io
  • 11. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. ○ Apache Spark is the primary analytics execution engine teams at Uber use ○ At Uber, 95% batch and ML jobs run on Spark ○ We run Spark on YARN and Peloton/Mesos ○ We use external shuffle service for the shuffle data Apache Spark @ Uber 11 * Apache Hadoop,, Spark, and Mesos logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks. TensorFlow and the TensorFlow logo are trademarks of Google Inc. Redis is a trademark of Redis Labs Ltd. Any rights therein are reserved to Redis Labs Ltd. Any use by Uber Technologies is for referential purposes only and does not indicate any sponsorship, endorsement or affiliation between Redis and Uber Technologies.
  • 12. 12 How Does Apache Spark Shuffle Service Work?
  • 13. Limitations of Apache Spark Shuffle Service 13 ● SSD wearing out Issues ● Reliability ● Kubernetes dynamic allocation ● Collocation
  • 14. Different Approaches 14 ● Shuffle manager to external storage ○ Synchronous writes ■ NFS ● 2X slow ■ HDFS ● 5X slow
  • 15. Different Approaches 15 ● Shuffle manager to external storage ○ Semi-asynchronous writes ■ HDFS ● 4x slow
  • 16. Different Approaches 16 ● Remote Shuffle Service ○ Streaming writes to HDFS ■ 1.5x slower than writing to local storage ○ Streaming Writes to Local ■ ~Same Performance like external shuffle service
  • 17. Remote Shuffle Service 17 ● Remote Shuffle Service ○ Streaming Writes to Local Storage ■ Changed Mapreduce paradigm ■ Record Stream -> Shuffle Server -> Disk ■ No temporary spill files in executor side
  • 18. Architecture - Remote Spark Shuffle Service 18
  • 19. Deep Dive Image Source: www.mindproject.io
  • 20. Design Principles 20 ● Scale out horizontally ○ Each server instance works independently ○ Avoid centralized state/storage ● Tackle network latency ○ Reduce waiting times for server response ○ Stream data ● Performance optimization ○ Most Spark Apps optimized for similar performance ○ Rely on YARN/Apache Spark retry for failure recovery
  • 21. Scale Out Tackle Network Latency Performance Optimization 21
  • 22. Horizontal Scalable 22 ● Spark applications share/use different shuffle servers ● No shared state among shuffle servers ● More shuffle servers to scale out
  • 23. Shuffle Server Distribution 23 ● Mappers: m=4 ● Reducers: r=5 ● Shuffle Servers: s=3
  • 24. Shuffle Server Distribution in General 24 ● Mappers: m ● Reducers: r ● Shuffle Servers: s ● Network Connections ○ Mappers: m*s connections ○ Reducers: r connections
  • 25. Scale Out Tackle Network Latency Performance Optimization 25
  • 26. Server Implementation 26 ● Use Netty ○ High performance asynchronous server framework ● Two thread groups ○ Group 1: Accept new socket connection ○ Group 2: Read socket data ○ Thread groups not block each other ● Binary network protocol ○ Efficient encoding/compression
  • 27. Direct Write/Read on Disk File 27 ● Write to OS file directly ○ No application level buffering ● Zero copy ○ Transfer data from disk file to shuffle reader without user space memory ● Sequential write/read ○ No random disk IO
  • 28. Client Side Compression 28 ● Shuffle client compress/decompress data ● Reduce network transport data size ● Reduce CPU usage on shuffle server ● Support client side encryption ○ Encryption key inside each application ○ Encryption key not distributed to shuffle server
  • 29. Parallel Serialization and Network IO 29 ● Shuffle data serialization takes time ● Serialization in executor thread ● Network IO in another thread
  • 30. Connection Pool 30 ● Socket connect latency is not trivial ● Reuse client/server connections
  • 31. Scale Out Tackle Network Latency Performance Optimization 31
  • 32. Asynchronous Shuffle Data Commit 32 ● Map task ○ Stream data to server ○ Not wait for response ● Server flushes (commits) data asynchronously ● Reduce task queries data availability when fetching data
  • 34. Shuffle Server Discovery/Health Check 34 ● ZooKeeper as Server Registry
  • 35. Data Replica 35 ● Server Replication Group ● Duplicate Write in Parallel ● Read from Single Server, switch to another server on failure
  • 36. Local State Flush 36 ● Local state persistence in batch ○ Avoid flushing state for each map task ○ Flush when shuffle stage finishes ● Client not waiting for server side state flush
  • 38. Compatible with Open Source Apache Spark 38 ● Shuffle Manager Plugin ○ spark.shuffle.manager= org.apache.spark.shuffle.RssShuffleManager ● MapStatus / MapOutputTracker ○ Embed remote shuffle service related data inside MapStatus ○ Query MapOutputTracker to retrieve needed information
  • 39. Metrics/Monitoring 39 ● Uber’s open source M3 metrics library ● Important metrics ○ Network connections ○ File descriptors ○ Disk utilization
  • 40. Test Strategy 40 ● Unit Test ● Stress/Random Test ● Production Query Sampling
  • 41. Remote Spark Shuffle Service - Production Status 41 ● In production in last 8+ months for YARN ● Thousand’s of application running every day ● Job latencies are on par with external shuffle ● Open sourcing it soon!
  • 42. Roadmap 42 ● Support all Spark workloads including HiveOnSpark ● Multi-tenancy (quota) ● Load balancing ● Integrate with incoming Spark shuffle metadata APIs
  • 43. Proprietary and confidential © 2020 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. 4 3 Thank you !!!