Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Nazar Sheremeta, Olena Kasianenko "Building Machine Learning Models using real data from the vehicles"

143 views

Published on

BigData & Data Engineering

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Nazar Sheremeta, Olena Kasianenko "Building Machine Learning Models using real data from the vehicles"

  1. 1. Put topic in here
  2. 2. © 2018 CloudMade. Proprietary and Confidential. 2 Meet the Team CloudMade has Kyiv R&D office with 130 person Engineering team, own car fleet, and Design Studio in London. Nazar Sheremeta Senior Data Science Enginner Elena Kasianenko Data Scientist
  3. 3. © 2018 CloudMade. Proprietary and Confidential. 3 Self driving car
  4. 4. © 2018 CloudMade. Proprietary and Confidential. 4 Self driving car
  5. 5. © 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 5© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 5 Golf wheel
  6. 6. Smart Onboarding Personalized Autonomy Predictive Navigation Personalized Search Predictive Call List Personalized Coaching Intelligent Cabin Intelligent Climate Refueling & Recharging One Driver Profile Many Use Cases Personalized Parking Options Predictive Drive Mode Predictive Media Predictive Occupant ID
  7. 7. © 2018 CloudMade. Proprietary and Confidential. 7 Agenda 1. Sudden big data 2. Personalized learning 3. A lot of events and features, but not a lot of observations (Use complicated models to build features for the simple one) 4. Only 2 weeks to learn 5. 10 tips on how to build ML model
  8. 8. © 2018 CloudMade. Proprietary and Confidential. 8 Personalized learning Small number of observations Strong User Patterns Computationally Friendly
  9. 9. © 2018 CloudMade. Proprietary and Confidential. 9 Fleet learning Ton of Observations No User Patterns Computationally Complex
  10. 10. Problem Definition
  11. 11. 1 Page 11© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 11© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Time Series Rare phenomena Enterprise Solutions Aggregate modeling Where do small data come from?
  12. 12. ● Over- fitting becomes much harder to avoid ● Outliers become much more dangerous. Small Data problems
  13. 13. So what to do in these situation?
  14. 14. Page 14© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 14© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №1. Stick to simple models
  15. 15. ● Train personalised model on top of universal model on all users. №2. Pool data when possible
  16. 16. Page 16© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 16© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №3. Limit Experimentation  If you try too many different techniques, you’ll overfit on your validation set.
  17. 17. Page 17© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 17© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №4. How much training data do you need?
  18. 18. Page 18© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 18© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №4. How much training data do you need?  The rule of 10, namely the amount of training data you need for a well performing model is 10x the number of parameters in the model.
  19. 19. Page 19© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 19© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №5. Do clean up your data
  20. 20. Page 20© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 20© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №6. Do perform feature selection  If the data is truly limiting, sometimes explicit feature selection is essential.
  21. 21. Page 21© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 21© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №7. Do use Regularization  Reduces the effective degrees of freedom without reducing the actual number of parameters in the model.
  22. 22. Page 22© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 22© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №8. Do use Model Averaging Each of the red curves is a model fitted on a few data points But averaging all these high variance models gets us a smooth output that is remarkably close to the original
  23. 23. Page 23© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 23© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №9. Try Bayesian Modeling  Bayesian inference may be well suited for dealing with smaller data sets, especially if you can use domain expertise to construct sensible priors.
  24. 24. Page 24© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 24© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №10. Prefer Confidence Intervals ● Parts of the feature space are likely to be less covered by your data and prediction confidence within these regions should reflect that.
  25. 25. Page 25© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 25© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №10. Prefer Confidence Intervals
  26. 26. Please ask your questions! Thanks for your attention!
  27. 27. nsheremeta@cloudmade.com olena.kasianenko@cloudmade.com

×