SlideShare a Scribd company logo
1 of 39
Download to read offline
ML and Data Science
at Uber
Sudhir Tonse, Engineering Lead, Uber
FEB 18,
2017
GITPro 2017
Where do we want to go today?
Agenda
Introduction Problem Space Tools of the Trade
Challenges likely unique to
Uber .. interesting
opportunities
Challenges &
Opportunities
Who am I and what are we
talking about today?
Why does Uber need ML
and what are some of
the problems we tackle?
What does Uber’s tech
stack look like?
Agenda
Hop on the Uber ML Ride … destination please?
Uber, this talk and me the speaker
Introduction
•Engineering Leader @ Uber
•Marketplace Data
•Realtime Data Processing
•Analytics
•Forecasting
• Previous -> MicroServices/Cloud Platform at
Netflix
•Twitter @stonse
5
Who am I?
Driver Partner Riders Merchants
Uber’s logistic platform
Marketplace
Our partner in the ride
sharing business
Folks like you and me who
request a ride on any of
Uber’s transportation
products. e.g. UberX,
uberPool
Restaurants or shops that
have signed on to the
Uber platform.
Introduction
Uber
“Transportation as reliable as
running water, everywhere, for
everyone”
Uber
Mission
• Mapping (Routes, ETAs, …)
• Fraud and Security
• uberEATS Recommendations
• Marketplace Optimizations
• Forecasting
• Driver Positioning
• Health, Trends, Issues, ...
• And more …
ML Problems
Why do we need Machine Learning?
ETA, Route Optimization,
Pickup Points, Pool rider
matches
Marketplace
Build the platform, products, and algorithms
responsible for the real time execution and online
optimization of Uber's marketplace.
We are building the brain of Uber, solving NP-hard
algorithms and economic optimization problems at
scale.
Uber | Marketplace
Mission
Request Event
Driver Accept
Event
Trip Started
Event
more events
…
Overall Flow
Ma
t
c
h
Se
r
v
i
ces
Trip States
Sub-title
Scale
~400 Cities
Many Billion Events per Day
Scale
Geo
Space
Vehicle
Types
Time
• Indexing, Lookup, Rendering
• Symmetric Neighbors
• Convex & Compact Regions
• Equal Areas
• Equal Shape
Space -> Hexagons
Granular Data
Multi-resolution Realtime Forecasting, Airport ETR
ML Examples
Real-time spatiotemporal
forecasting at a variable
resolution of time and space
Example 1
Rider Demand Forecasting
Predict #of Riders per hexagon for various time horizons
Spatial granularity & Multiresolution Forecasting
The more you aggregate
or zoom out, trends
emerge
Sparsity at hexagon level:
many hexagons have little
signal
1. Forecast at the hex-cluster level
2. Using past activity for a similar time
window, apportion out total activity from
the hex-cluster to its component hexagons
Multiresolution Forecasting
Forecasting at different spatial granularity
Airport ETR
ML Example No 2.
Airport Taxi Line Uber Airport Lot
Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3)
Airport Demand (ETR)
Mean Delay
~30 minutes
Half Life
~ 1.0 minute
“ETR too
much. I bail out
..”
Solution: Time Meter Banner
“Only about 20
minutes. I would
wait!”
20 minutes wait to get a
$40 trip, oh yeah!
Data Science Flow
A Typical Data Scientist Workflow
Analyze/Prepare Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Data exploration,
cleansing,
transformations etc.
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
Data Preparation
A Typical Data Scientist Workflow
Analyze/Prepare
Data exploration,
cleansing,
transformations etc.
Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
Data Processing
Data Science Flow
A Typical Data Scientist Workflow
Feature Selection
Model Fitting
Evaluation
StorageEvaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
Data Scientists (Analytics)
Data Science Flow
A Typical Data Scientist Workflow
Analyze/Prepare Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Data exploration,
cleansing,
transformations etc.
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
Overview
Streamline the forecasting process
from conception to production
• Streams w/ flexible
geo-temporal resolution
• Valuable external data feeds
• Modular, reusable
components at each stage
• Same code for offline
model fitting and
production to enable fast
model iteration
Operators & Computation DAGs
Feature Generation
Online ModelsOffline Model Fitting
Predictions, Metrics & Visualizations
External DataStreams
Airport feed
Weather feed
Concerts feed
Realtime Models
- Something happened at a time and a
place. Now we will
Evaluate the DAG
- DAG evaluated for a single instant in time
real-time spatiotemporal forecasting at a variable resolution of time and space
Under the hood ..
Tools & Framework
• Curated set of algorithms
• Model Versioning
• Model Performance & Visualizations
• Automated Deployment Workflow
• …
Machine Learning as a Service
ML workflow at Uber
Open Source Technologies
Sub-title
Samza
Micro Batch based processing
Good integration with HDFS & S3
Exactly once semantics
Spark Streaming
Well integrated with Kafka
Built in State Management
Built in Checkpointing
Distributed Indexes & Queries
Versatile aggregations
Jupyter/IPython
Great community support
Data Scientists familiar with Python
..
Challenges & Opportunities
• What’s the best model for integrating vast amounts of disparate kinds
of information over space and time?
• What’s the best way of building spatiotemporal models in a fashion
that is effective, elegant, and debuggable?
• About a 100 or so more … :-)
ML Problems
Challenges
Links
Thank you!
• Realtime Streaming at Uber
https://www.infoq.com/presentations/real-tim
e-streaming-uber
• Spark at Uber
(http://www.slideshare.net/databricks/spark-
meetup-at-uber)
• Career at Uber
(https://www.uber.com/careers/)
•https://join.uber.com/marketplace
Happy to discuss design/architecture
Q & A
No product/business questions please :-)
@stonse
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be
reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any
information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the
use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise
exempt from disclosure under applicable law. All recipients of this document are notified that the information contained
herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any
way disclose this document or any of the enclosed information to any person other than employees of addressee to the
extent necessary for consultations with authorized personnel of Uber.
Sudhir Tonse
@stonse
Thank you

More Related Content

What's hot

Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Spark Summit
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Spark Summit
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Databricks
 

What's hot (20)

QCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberQCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uber
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
 
Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from Cray
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from Google
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 

Viewers also liked

Data-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reportingData-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reporting
AnalyticsWeek
 

Viewers also liked (20)

Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Uber Analytics Test
Uber Analytics TestUber Analytics Test
Uber Analytics Test
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventPros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
 
Uber's Business Model
Uber's Business ModelUber's Business Model
Uber's Business Model
 
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4U
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4UUBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4U
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4U
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
Uber Interview Questions and Process: How to Pass Easily
Uber Interview Questions and Process: How to Pass EasilyUber Interview Questions and Process: How to Pass Easily
Uber Interview Questions and Process: How to Pass Easily
 
Culture
CultureCulture
Culture
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
UBER Strategy
UBER StrategyUBER Strategy
UBER Strategy
 
Product management
Product managementProduct management
Product management
 
Using NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data CreativelyUsing NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data Creatively
 
Pro_Tools_Tier_2
Pro_Tools_Tier_2Pro_Tools_Tier_2
Pro_Tools_Tier_2
 
On Analyzing and Specifying Concerns for Data as a Service
On Analyzing and Specifying Concerns for Data as a ServiceOn Analyzing and Specifying Concerns for Data as a Service
On Analyzing and Specifying Concerns for Data as a Service
 
Data-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reportingData-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reporting
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
 
Data Driven Growth (Montreal 2015)
Data Driven Growth (Montreal 2015)Data Driven Growth (Montreal 2015)
Data Driven Growth (Montreal 2015)
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
 

Similar to ML and Data Science at Uber - GITPro talk 2017

Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 

Similar to ML and Data Science at Uber - GITPro talk 2017 (20)

Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 
From Python to Java
From Python to JavaFrom Python to Java
From Python to Java
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
Machine learning
Machine learningMachine learning
Machine learning
 
Architectural Considerations for Startups
Architectural Considerations for StartupsArchitectural Considerations for Startups
Architectural Considerations for Startups
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Open ETL for Real-Time Decision Making with Shuai Yuan
Open ETL for Real-Time Decision Making with Shuai YuanOpen ETL for Real-Time Decision Making with Shuai Yuan
Open ETL for Real-Time Decision Making with Shuai Yuan
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 

ML and Data Science at Uber - GITPro talk 2017

  • 1. ML and Data Science at Uber Sudhir Tonse, Engineering Lead, Uber FEB 18, 2017 GITPro 2017
  • 2. Where do we want to go today? Agenda
  • 3. Introduction Problem Space Tools of the Trade Challenges likely unique to Uber .. interesting opportunities Challenges & Opportunities Who am I and what are we talking about today? Why does Uber need ML and what are some of the problems we tackle? What does Uber’s tech stack look like? Agenda Hop on the Uber ML Ride … destination please?
  • 4. Uber, this talk and me the speaker Introduction
  • 5. •Engineering Leader @ Uber •Marketplace Data •Realtime Data Processing •Analytics •Forecasting • Previous -> MicroServices/Cloud Platform at Netflix •Twitter @stonse 5 Who am I?
  • 6. Driver Partner Riders Merchants Uber’s logistic platform Marketplace Our partner in the ride sharing business Folks like you and me who request a ride on any of Uber’s transportation products. e.g. UberX, uberPool Restaurants or shops that have signed on to the Uber platform. Introduction Uber
  • 7. “Transportation as reliable as running water, everywhere, for everyone” Uber Mission
  • 8. • Mapping (Routes, ETAs, …) • Fraud and Security • uberEATS Recommendations • Marketplace Optimizations • Forecasting • Driver Positioning • Health, Trends, Issues, ... • And more … ML Problems Why do we need Machine Learning? ETA, Route Optimization, Pickup Points, Pool rider matches
  • 9. Marketplace Build the platform, products, and algorithms responsible for the real time execution and online optimization of Uber's marketplace. We are building the brain of Uber, solving NP-hard algorithms and economic optimization problems at scale. Uber | Marketplace Mission
  • 10. Request Event Driver Accept Event Trip Started Event more events … Overall Flow Ma t c h Se r v i ces
  • 14. • Indexing, Lookup, Rendering • Symmetric Neighbors • Convex & Compact Regions • Equal Areas • Equal Shape Space -> Hexagons
  • 16. Multi-resolution Realtime Forecasting, Airport ETR ML Examples
  • 17. Real-time spatiotemporal forecasting at a variable resolution of time and space Example 1
  • 18. Rider Demand Forecasting Predict #of Riders per hexagon for various time horizons
  • 19. Spatial granularity & Multiresolution Forecasting The more you aggregate or zoom out, trends emerge Sparsity at hexagon level: many hexagons have little signal
  • 20. 1. Forecast at the hex-cluster level 2. Using past activity for a similar time window, apportion out total activity from the hex-cluster to its component hexagons Multiresolution Forecasting Forecasting at different spatial granularity
  • 21. Airport ETR ML Example No 2. Airport Taxi Line Uber Airport Lot
  • 22. Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3) Airport Demand (ETR) Mean Delay ~30 minutes Half Life ~ 1.0 minute
  • 23. “ETR too much. I bail out ..” Solution: Time Meter Banner “Only about 20 minutes. I would wait!” 20 minutes wait to get a $40 trip, oh yeah!
  • 24. Data Science Flow A Typical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 25. Data Preparation A Typical Data Scientist Workflow Analyze/Prepare Data exploration, cleansing, transformations etc. Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 27. Data Science Flow A Typical Data Scientist Workflow Feature Selection Model Fitting Evaluation StorageEvaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 29. Data Science Flow A Typical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 30. Overview Streamline the forecasting process from conception to production • Streams w/ flexible geo-temporal resolution • Valuable external data feeds • Modular, reusable components at each stage • Same code for offline model fitting and production to enable fast model iteration Operators & Computation DAGs Feature Generation Online ModelsOffline Model Fitting Predictions, Metrics & Visualizations External DataStreams Airport feed Weather feed Concerts feed
  • 31. Realtime Models - Something happened at a time and a place. Now we will Evaluate the DAG - DAG evaluated for a single instant in time real-time spatiotemporal forecasting at a variable resolution of time and space
  • 32. Under the hood .. Tools & Framework
  • 33. • Curated set of algorithms • Model Versioning • Model Performance & Visualizations • Automated Deployment Workflow • … Machine Learning as a Service ML workflow at Uber
  • 34. Open Source Technologies Sub-title Samza Micro Batch based processing Good integration with HDFS & S3 Exactly once semantics Spark Streaming Well integrated with Kafka Built in State Management Built in Checkpointing Distributed Indexes & Queries Versatile aggregations Jupyter/IPython Great community support Data Scientists familiar with Python
  • 36. • What’s the best model for integrating vast amounts of disparate kinds of information over space and time? • What’s the best way of building spatiotemporal models in a fashion that is effective, elegant, and debuggable? • About a 100 or so more … :-) ML Problems Challenges
  • 37. Links Thank you! • Realtime Streaming at Uber https://www.infoq.com/presentations/real-tim e-streaming-uber • Spark at Uber (http://www.slideshare.net/databricks/spark- meetup-at-uber) • Career at Uber (https://www.uber.com/careers/) •https://join.uber.com/marketplace
  • 38. Happy to discuss design/architecture Q & A No product/business questions please :-) @stonse
  • 39. Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. Sudhir Tonse @stonse Thank you