Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Barcelona ML Meetup - Lessons Learned

2,758 views

Published on

Lessons jearned from building real-life Machine Learning systems

Published in: Technology

Barcelona ML Meetup - Lessons Learned

  1. 1. LessonsLearned
  2. 2. LessonsLearned
  3. 3. MoreDatavs.BetterModels
  4. 4. Really? Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
  5. 5. Sometimes, it’s not about more data
  6. 6. Norvig: “Google does not have better Algorithms only more Data” Many features/ low-bias models
  7. 7. Sometimes, it’s not about more data
  8. 8. YouMightnotneed allyour“bigData”
  9. 9. ○ ○
  10. 10. Sometimesyoudoneed aComplexModel
  11. 11. Itpaysofftobesmartabout Hyperparameters
  12. 12. ○ ○
  13. 13. Supervisedvs.plus UnsupervisedLearning
  14. 14. ○ ○ ○ ○ ○
  15. 15. ○ ○
  16. 16. Everythingisanensemble
  17. 17. ○ ○ ○ ○ ○ ○
  18. 18. ○ ○
  19. 19. Theoutputofyourmodel willbetheinputofanotherone (andothersystemdesignproblems)
  20. 20. ○ ○ ○
  21. 21.
  22. 22. Thepains&gains ofFeatureEngineering
  23. 23. ○ ○ ○ ○
  24. 24. ○ ○ ○ ○
  25. 25. Implicitsignalsbeat explicitones (almostalways)
  26. 26. ○ ○ ○ ○
  27. 27. ○ ○ ○
  28. 28. bethoughtfulaboutyour TrainingData
  29. 29. ○ ○
  30. 30. ○ ○ ○
  31. 31. YourModelwilllearn whatyouteachittolearn
  32. 32. ○ ○ ○ ○
  33. 33.
  34. 34. Learntodealwith PresentationBias
  35. 35. More likely to see Less likely
  36. 36. DataandModelsaregreat.Youknowwhat’ sevenbetter? Therightevaluationapproach!
  37. 37. ○ ○
  38. 38. Youdon’tneedtodistribute yourMLalgorithm
  39. 39. ○ ○ ○
  40. 40. ○ ○ ○
  41. 41. but,Ifyoudo,youshouldunderstandat whatleveltodoit
  42. 42. The three levels of Distribution/Parallelization ● For each subset of the population (e.g. region) ● For each combination of the hyperparameters ● For each subset of the training data Each level has different requirements ANN Training over distributed GPU’s
  43. 43. somethingsarebetterdone Online and othersoffline…and,thereis Nearlinefor everythinginbetween
  44. 44. System Overview ● Blueprint for multiple personalization algorithm services ● Ranking ● Row selection ● Ratings ● … ● Recommendation involving multi-layered Machine Learning
  45. 45. Matrix Factorization Example
  46. 46. Thetwofacesofyour MLinfrastructure
  47. 47. ○ ○
  48. 48. ○ ○ ○ ○
  49. 49. ○ ○ ○ ○
  50. 50. ○ ○
  51. 51. Whyyoushouldcareabout answeringquestions(aboutyourmodel)
  52. 52. ○ ○ ○
  53. 53. Theuntoldstoryof DataScienceandvs.MLengineering
  54. 54. ○ ○ ○
  55. 55. ○ ○
  56. 56. ○ ○

×