Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ML and Data Science at Uber - GITPro talk 2017

4,089 views

Published on

Applications of Statistical Modelling and Machine Learning at Uber

Published in: Data & Analytics
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Awesome lonely girl looking for fun on webcam with the you now - www.xslideshare.usa.cc
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

ML and Data Science at Uber - GITPro talk 2017

  1. 1. ML and Data Science at Uber Sudhir Tonse, Engineering Lead, Uber FEB 18, 2017 GITPro 2017
  2. 2. Where do we want to go today? Agenda
  3. 3. Introduction Problem Space Tools of the Trade Challenges likely unique to Uber .. interesting opportunities Challenges & Opportunities Who am I and what are we talking about today? Why does Uber need ML and what are some of the problems we tackle? What does Uber’s tech stack look like? Agenda Hop on the Uber ML Ride … destination please?
  4. 4. Uber, this talk and me the speaker Introduction
  5. 5. •Engineering Leader @ Uber •Marketplace Data •Realtime Data Processing •Analytics •Forecasting • Previous -> MicroServices/Cloud Platform at Netflix •Twitter @stonse 5 Who am I?
  6. 6. Driver Partner Riders Merchants Uber’s logistic platform Marketplace Our partner in the ride sharing business Folks like you and me who request a ride on any of Uber’s transportation products. e.g. UberX, uberPool Restaurants or shops that have signed on to the Uber platform. Introduction Uber
  7. 7. “Transportation as reliable as running water, everywhere, for everyone” Uber Mission
  8. 8. • Mapping (Routes, ETAs, …) • Fraud and Security • uberEATS Recommendations • Marketplace Optimizations • Forecasting • Driver Positioning • Health, Trends, Issues, ... • And more … ML Problems Why do we need Machine Learning? ETA, Route Optimization, Pickup Points, Pool rider matches
  9. 9. Marketplace Build the platform, products, and algorithms responsible for the real time execution and online optimization of Uber's marketplace. We are building the brain of Uber, solving NP-hard algorithms and economic optimization problems at scale. Uber | Marketplace Mission
  10. 10. Request Event Driver Accept Event Trip Started Event more events … Overall Flow Ma t c h Se r v i ces
  11. 11. Trip States Sub-title
  12. 12. Scale ~400 Cities Many Billion Events per Day
  13. 13. Scale Geo Space Vehicle Types Time
  14. 14. • Indexing, Lookup, Rendering • Symmetric Neighbors • Convex & Compact Regions • Equal Areas • Equal Shape Space -> Hexagons
  15. 15. Granular Data
  16. 16. Multi-resolution Realtime Forecasting, Airport ETR ML Examples
  17. 17. Real-time spatiotemporal forecasting at a variable resolution of time and space Example 1
  18. 18. Rider Demand Forecasting Predict #of Riders per hexagon for various time horizons
  19. 19. Spatial granularity & Multiresolution Forecasting The more you aggregate or zoom out, trends emerge Sparsity at hexagon level: many hexagons have little signal
  20. 20. 1. Forecast at the hex-cluster level 2. Using past activity for a similar time window, apportion out total activity from the hex-cluster to its component hexagons Multiresolution Forecasting Forecasting at different spatial granularity
  21. 21. Airport ETR ML Example No 2. Airport Taxi Line Uber Airport Lot
  22. 22. Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3) Airport Demand (ETR) Mean Delay ~30 minutes Half Life ~ 1.0 minute
  23. 23. “ETR too much. I bail out ..” Solution: Time Meter Banner “Only about 20 minutes. I would wait!” 20 minutes wait to get a $40 trip, oh yeah!
  24. 24. Data Science Flow A Typical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  25. 25. Data Preparation A Typical Data Scientist Workflow Analyze/Prepare Data exploration, cleansing, transformations etc. Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  26. 26. Data Processing
  27. 27. Data Science Flow A Typical Data Scientist Workflow Feature Selection Model Fitting Evaluation StorageEvaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  28. 28. Data Scientists (Analytics)
  29. 29. Data Science Flow A Typical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  30. 30. Overview Streamline the forecasting process from conception to production • Streams w/ flexible geo-temporal resolution • Valuable external data feeds • Modular, reusable components at each stage • Same code for offline model fitting and production to enable fast model iteration Operators & Computation DAGs Feature Generation Online ModelsOffline Model Fitting Predictions, Metrics & Visualizations External DataStreams Airport feed Weather feed Concerts feed
  31. 31. Realtime Models - Something happened at a time and a place. Now we will Evaluate the DAG - DAG evaluated for a single instant in time real-time spatiotemporal forecasting at a variable resolution of time and space
  32. 32. Under the hood .. Tools & Framework
  33. 33. • Curated set of algorithms • Model Versioning • Model Performance & Visualizations • Automated Deployment Workflow • … Machine Learning as a Service ML workflow at Uber
  34. 34. Open Source Technologies Sub-title Samza Micro Batch based processing Good integration with HDFS & S3 Exactly once semantics Spark Streaming Well integrated with Kafka Built in State Management Built in Checkpointing Distributed Indexes & Queries Versatile aggregations Jupyter/IPython Great community support Data Scientists familiar with Python
  35. 35. .. Challenges & Opportunities
  36. 36. • What’s the best model for integrating vast amounts of disparate kinds of information over space and time? • What’s the best way of building spatiotemporal models in a fashion that is effective, elegant, and debuggable? • About a 100 or so more … :-) ML Problems Challenges
  37. 37. Links Thank you! • Realtime Streaming at Uber https://www.infoq.com/presentations/real-tim e-streaming-uber • Spark at Uber (http://www.slideshare.net/databricks/spark- meetup-at-uber) • Career at Uber (https://www.uber.com/careers/) •https://join.uber.com/marketplace
  38. 38. Happy to discuss design/architecture Q & A No product/business questions please :-) @stonse
  39. 39. Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. Sudhir Tonse @stonse Thank you

×