Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
LessonsLearned
from building real-life Machine Learning Systems
Xavier Amatriain (@xamat)
www.quora.com/profile/Xavier-Ama...
A bit about
Our Mission
“To share and grow
the world’s knowledge”
• Millions of questions & answers
• Millions of users
• Thousands of...
Demand
What we care about
Quality
Relevance
LessonsLearned
MoreDatavs.BetterModels
More data or better models?
Really?
Anand Rajaraman: VC, Founder, Stanford Professor
More data or better models?
Sometimes, it’s
not about more
data
More data or better models?
Norvig:
“Google does not have
better Algorithms only
more Data”
Many features/
low-bias models
More data or better models?
Sometimes, it’s
not about more
data
Sometimesyoudoneed
A(more)ComplexModel
Better models and features that “don’t work”
● E.g. You have a linear model and have
been selecting and optimizing feature...
Modelselectionisalsoabout
Hyperparameteroptimization
Hyperparameter optimization
● Automate hyperparameter
optimization by choosing the
right metric.
○ But, is it as simple as...
Supervisedvs.plus
UnsupervisedLearning
Supervised/Unsupervised Learning
● Unsupervised learning as dimensionality reduction
● Unsupervised learning as feature en...
Supervised/Unsupervised Learning
● One of the “tricks” in Deep Learning is how it
combines unsupervised/supervised learnin...
Everythingisanensemble
Ensembles
● Netflix Prize was won by an ensemble
○ Initially Bellkor was using GDBTs
○ BigChaos introduced ANN-based ensem...
Ensembles & Feature Engineering
● Ensembles are the way to turn any model into a feature!
● E.g. Don’t know if the way to ...
The Master Algorithm?
It definitely is the ensemble!
Thepains&gains
ofFeatureEngineering
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
●...
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
●...
Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
• truthful
• reusable
• provides explanati...
Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
• Features that rela...
Implicitsignalsbeat
explicitones
(almostalways)
Implicit vs. Explicit
● Many have acknowledged
that implicit feedback is more useful
● Is implicit feedback really always
...
● Implicit data is (usually):
○ More dense, and available for all users
○ Better representative of user behavior vs.
user ...
● However
○ It is not always the case that
direct implicit feedback correlates
well with long-term retention
○ E.g. clickb...
bethoughtfulaboutyour
TrainingData
Defining training/testing data
● Training a simple binary classifier for good/bad
answer
○ Defining positive and negative ...
Other training data issues: Time traveling
● Time traveling: usage of features that originated after the
event you are try...
YourModelwilllearn
whatyouteachittolearn
Training a model
● Model will learn according to:
○ Training data (e.g. implicit and explicit)
○ Target function (e.g. pro...
Example 2 - Quora’s feed
● Training data = implicit + explicit
● Target function: Value of showing a story to a
user ~ wei...
Offline testing
● Measure model performance,
using (IR) metrics
● Offline performance = indication
to make decisions on fo...
Learntodealwith
PresentationBias
2D Navigational modeling
More likely
to see
Less likely
The curse of presentation bias
● User can only click on what you decide to show
● But, what you decide to show is the resu...
Youdon’tneedtodistribute
yourMLalgorithm
Distributing ML
● Most of what people do in practice can fit into a multi-
core machine
○ Smart data sampling
○ Offline sc...
Distributing ML
● Example of optimizing computations to fit them into
one machine
○ Spark implementation: 6 hours, 15 mach...
Theuntoldstoryof
DataScienceandvs.MLengineering
Data Scientists and ML Engineers
● We all know the definition of a Data Scientist
● Where do Data Scientists fit in an org...
The data-driven ML innovation funnel
Data Research
ML Exploration -
Product Design
AB Testing
Data Scientists and ML Engineers
● Solution:
○ (1) Define different parts of the innovation funnel
■ Part 1. Data research...
Conclusions
● In data, size is not all that matters
● Understand dependencies between data, models
& systems
● Choose the right metric...
Questions?
Strata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Upcoming SlideShare
Loading in …5
×

Strata 2016 - Lessons Learned from building real-life Machine Learning Systems

5,173 views

Published on

Talk about practical machine learning lessons learned over the years. Given at the Hardcore Data Science track in Strata San Jose 2016

Published in: Technology

Strata 2016 - Lessons Learned from building real-life Machine Learning Systems

  1. 1. LessonsLearned from building real-life Machine Learning Systems Xavier Amatriain (@xamat) www.quora.com/profile/Xavier-Amatriain 3/29/16
  2. 2. A bit about
  3. 3. Our Mission “To share and grow the world’s knowledge” • Millions of questions & answers • Millions of users • Thousands of topics • ...
  4. 4. Demand What we care about Quality Relevance
  5. 5. LessonsLearned
  6. 6. MoreDatavs.BetterModels
  7. 7. More data or better models? Really? Anand Rajaraman: VC, Founder, Stanford Professor
  8. 8. More data or better models? Sometimes, it’s not about more data
  9. 9. More data or better models? Norvig: “Google does not have better Algorithms only more Data” Many features/ low-bias models
  10. 10. More data or better models? Sometimes, it’s not about more data
  11. 11. Sometimesyoudoneed A(more)ComplexModel
  12. 12. Better models and features that “don’t work” ● E.g. You have a linear model and have been selecting and optimizing features for that model ■ More complex model with the same features -> improvement not likely ■ More expressive features with the same model -> improvement not likely ● More complex features may require a more complex model ● A more complex model may not show improvements with a feature set that is too simple
  13. 13. Modelselectionisalsoabout Hyperparameteroptimization
  14. 14. Hyperparameter optimization ● Automate hyperparameter optimization by choosing the right metric. ○ But, is it as simple as choosing the max? ● Bayesian Optimization (Gaussian Processes) better than grid search ○ See spearmint, hyperopt, AutoML, MOE...
  15. 15. Supervisedvs.plus UnsupervisedLearning
  16. 16. Supervised/Unsupervised Learning ● Unsupervised learning as dimensionality reduction ● Unsupervised learning as feature engineering ● The “magic” behind combining unsupervised/supervised learning ○ E.g.1 clustering + knn ○ E.g.2 Matrix Factorization ■ MF can be interpreted as ● Unsupervised: ○ Dimensionality Reduction a la PCA ○ Clustering (e.g. NMF) ● Supervised ○ Labeled targets ~ regression
  17. 17. Supervised/Unsupervised Learning ● One of the “tricks” in Deep Learning is how it combines unsupervised/supervised learning ○ E.g. Stacked Autoencoders ○ E.g. training of convolutional nets
  18. 18. Everythingisanensemble
  19. 19. Ensembles ● Netflix Prize was won by an ensemble ○ Initially Bellkor was using GDBTs ○ BigChaos introduced ANN-based ensemble ● Most practical applications of ML run an ensemble ○ Why wouldn’t you? ○ At least as good as the best of your methods ○ Can add completely different approaches (e. g. CF and content-based) ○ You can use many different models at the ensemble layer: LR, GDBTs, RFs, ANNs...
  20. 20. Ensembles & Feature Engineering ● Ensembles are the way to turn any model into a feature! ● E.g. Don’t know if the way to go is to use Factorization Machines, Tensor Factorization, or RNNs? ○ Treat each model as a “feature” ○ Feed them into an ensemble
  21. 21. The Master Algorithm? It definitely is the ensemble!
  22. 22. Thepains&gains ofFeatureEngineering
  23. 23. Feature Engineering ● Main properties of a well-behaved ML feature ○ Reusable ○ Transformable ○ Interpretable ○ Reliable ● Reusability: You should be able to reuse features in different models, applications, and teams ● Transformability: Besides directly reusing a feature, it should be easy to use a transformation of it (e.g. log(f), max(f), ∑ft over a time window…)
  24. 24. Feature Engineering ● Main properties of a well-behaved ML feature ○ Reusable ○ Transformable ○ Interpretable ○ Reliable ● Interpretability: In order to do any of the previous, you need to be able to understand the meaning of features and interpret their values. ● Reliability: It should be easy to monitor and detect bugs/issues in features
  25. 25. Feature Engineering Example - Quora Answer Ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  26. 26. Feature Engineering Example - Quora Answer Ranking How are those dimensions translated into features? • Features that relate to the answer quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  27. 27. Implicitsignalsbeat explicitones (almostalways)
  28. 28. Implicit vs. Explicit ● Many have acknowledged that implicit feedback is more useful ● Is implicit feedback really always more useful? ● If so, why?
  29. 29. ● Implicit data is (usually): ○ More dense, and available for all users ○ Better representative of user behavior vs. user reflection ○ More related to final objective function ○ Better correlated with AB test results ● E.g. Rating vs watching Implicit vs. Explicit
  30. 30. ● However ○ It is not always the case that direct implicit feedback correlates well with long-term retention ○ E.g. clickbait ● Solution: ○ Combine different forms of implicit + explicit to better represent long-term goal Implicit vs. Explicit
  31. 31. bethoughtfulaboutyour TrainingData
  32. 32. Defining training/testing data ● Training a simple binary classifier for good/bad answer ○ Defining positive and negative labels -> Non-trivial task ○ Is this a positive or a negative? ● funny uninformative answer with many upvotes ● short uninformative answer by a well-known expert in the field ● very long informative answer that nobody reads/upvotes ● informative answer with grammar/spelling mistakes ● ...
  33. 33. Other training data issues: Time traveling ● Time traveling: usage of features that originated after the event you are trying to predict ○ E.g. Your upvoting an answer is a pretty good prediction of you reading that answer, especially because most upvotes happen AFTER you read the answer ○ Tricky when you have many related features ○ Whenever I see an offline experiment with huge wins, I ask: “Is there time traveling?”
  34. 34. YourModelwilllearn whatyouteachittolearn
  35. 35. Training a model ● Model will learn according to: ○ Training data (e.g. implicit and explicit) ○ Target function (e.g. probability of user reading an answer) ○ Metric (e.g. precision vs. recall) ● Example 1 (made up): ○ Optimize probability of a user going to the cinema to watch a movie and rate it “highly” by using purchase history and previous ratings. Use NDCG of the ranking as final metric using only movies rated 4 or higher as positives.
  36. 36. Example 2 - Quora’s feed ● Training data = implicit + explicit ● Target function: Value of showing a story to a user ~ weighted sum of actions: v = ∑a va 1{ya = 1} ○ predict probabilities for each action, then compute expected value: v_pred = E[ V | x ] = ∑a va p(a | x) ● Metric: any ranking metric
  37. 37. Offline testing ● Measure model performance, using (IR) metrics ● Offline performance = indication to make decisions on follow-up A/B tests ● A critical (and mostly unsolved) issue is how offline metrics correlate with A/B test results.
  38. 38. Learntodealwith PresentationBias
  39. 39. 2D Navigational modeling More likely to see Less likely
  40. 40. The curse of presentation bias ● User can only click on what you decide to show ● But, what you decide to show is the result of what your model predicted is good ● Simply treating things you show as negatives is not likely to work ● Better options ● Correcting for the probability a user will click on a position -> Attention models ● Explore/exploit approaches such as MAB
  41. 41. Youdon’tneedtodistribute yourMLalgorithm
  42. 42. Distributing ML ● Most of what people do in practice can fit into a multi- core machine ○ Smart data sampling ○ Offline schemes ○ Efficient parallel code ● Dangers of “easy” distributed approaches such as Hadoop/Spark ● Do you care about costs? How about latencies?
  43. 43. Distributing ML ● Example of optimizing computations to fit them into one machine ○ Spark implementation: 6 hours, 15 machines ○ Developer time: 4 days ○ C++ implementation: 10 minutes, 1 machine ● Most practical applications of Big Data can fit into a (multicore) implementation
  44. 44. Theuntoldstoryof DataScienceandvs.MLengineering
  45. 45. Data Scientists and ML Engineers ● We all know the definition of a Data Scientist ● Where do Data Scientists fit in an organization? ○ Many companies struggling with this ● Valuable to have strong DS who can bring value from the data ● Strong DS with solid engineering skills are unicorns and finding them is not scalable ○ DS need engineers to bring things to production ○ Engineers have enough on their plate to be willing to “productionize” cool DS projects
  46. 46. The data-driven ML innovation funnel Data Research ML Exploration - Product Design AB Testing
  47. 47. Data Scientists and ML Engineers ● Solution: ○ (1) Define different parts of the innovation funnel ■ Part 1. Data research & hypothesis building -> Data Science ■ Part 2. ML solution building & implementation -> ML Engineering ■ Part 3. Online experimentation, AB Testing analysis-> Data Science ○ (2) Broaden the definition of ML Engineers to include from coding experts with high-level ML knowledge to ML experts with good software skills Data Research ML Solution AB Testing Data Science Data Science ML Engineering
  48. 48. Conclusions
  49. 49. ● In data, size is not all that matters ● Understand dependencies between data, models & systems ● Choose the right metric & optimize what matters ● Be thoughtful about ○ your ML infrastructure/tools ○ about organizing your teams
  50. 50. Questions?

×