Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Staying Shallow & Lean in a
Deep Learning World
Xavier Amatriain (@xamat)
07/13/2016
Our Mission
“To share and grow
the world’s knowledge”
• Millions of questions
• Millions of answers
• Millions of users
• ...
Lots of high-quality textual information
Text + all those other things
Demand
What we care about
Quality
Relevance
ML Applications
● Homepage feed ranking
● Email digest
● Answer quality & ranking
● Spam & harassment classification
● Top...
Models
● Deep Neural Networks
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision Trees
● Random Forests
● La...
Deep Learning
Works
Image Recognition
Speech Recognition
Natural Language Processing
Game Playing
Recommender Systems
But...
Deep Learning is not Magic
Deep Learning is not always that “accurate”
… or that “deep”
Other ML Advances
● Factorization Machines
● Tensor Methods
● Non-parametric Bayesian models
● XGBoost
● Online Learning
●...
Other very successful approaches
Is it bad to obsess over
Deep Learning?
Some examples
Football or Futbol?
A real-life example
Label
A real-life example: improved solution
Label
Other feature
extraction
algorithms
E
n
s
e
m
b
l
e Accuracy ++
● Goal: Supervised Classification
○ 40 features
○ 10k examples
● What did the ML Engineer choose?
○ Multi-layer ANN traine...
Why DL is not the
only/main solution
Occam’s Razor
● Given two models that perform
more or less equally, you should
always prefer the less complex
● Deep Learning might not ...
Occam’s razor: reasons to prefer a simpler model
● There are many others
○ System complexity
○ Maintenance
○ Explainability
○ ….
Occam’s razor: reasons to prefer a simpler...
No Free Lunch
“ (...) any two optimization algorithms are equivalent when their
performance is averaged across all possible problems".
“...
Feature Engineering
Need for feature engineering
In many cases an understanding of the domain will lead to
optimal results.
Feature Engineering
Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
• truthful
• reusable
• provides explanati...
Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
• Features that rela...
Feature Engineering
● Properties of a well-behaved
ML feature:
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
Deep Learning and Feature Engineering
Unsupervised Learning
● Unsupervised learning is a very important paradigm in
theory and in practice
● So far, unsupervised learning has helped ...
Supervised/Unsupervised Learning
● Unsupervised learning as dimensionality reduction
● Unsupervised learning as feature en...
Ensembles
Even if all problems end up being suited for Deep
Learning, there will always be a place for ensembles.
● Given the output...
Ensembles
● Netflix Prize was won by an ensemble
○ Initially Bellkor was using GDBTs
○ BigChaos introduced ANN-based ensem...
Ensembles & Feature Engineering
● Ensembles are the way to turn any model into a feature!
● E.g. Don’t know if the way to ...
Distributing Algorithms
Distributing ML
● Most of what people do in practice can fit
into a multi-core machine
○ Smart data sampling
○ Offline sch...
Distributing ML
● That said…
● Deep Learning has managed to get away
by promoting a “new paradigm” of parallel
computing: ...
Conclusions
Conclusions
● Deep Learning has had some impressive results lately
● However, Deep Learning is not the only solution
○ It ...
Questions?
Staying Shallow & Lean in a Deep Learning World
Upcoming SlideShare
Loading in …5
×

Staying Shallow & Lean in a Deep Learning World

6,211 views

Published on

Deep learning has accomplished impressive feats in areas such as voice recognition, image processing, and natural language processing. Deep learning enthusiasts have rushed to predict that this family of algorithms is likely to take over most other applications in the near future. This focus on deep architectures seems to have cast a shadow over more “traditional” machine learning and data science approaches, leaving researchers and practitioners alike wondering whether there is any point in investing in feature engineering or simpler models.

In this talk, I will go over what deep learning can and cannot do for you, both now and in the near future. I will also describe how different approaches will continue to be needed, and why their demand will likely grow despite the rise of deep learning. I will support my claims not only by looking at recent publications, but also by using practical examples drawn from my experience at companies at the forefront of machine learning applications, such as Quora.

Published in: Technology
  • Be the first to comment

Staying Shallow & Lean in a Deep Learning World

  1. 1. Staying Shallow & Lean in a Deep Learning World Xavier Amatriain (@xamat) 07/13/2016
  2. 2. Our Mission “To share and grow the world’s knowledge” • Millions of questions • Millions of answers • Millions of users • Thousands of topics • ...
  3. 3. Lots of high-quality textual information
  4. 4. Text + all those other things
  5. 5. Demand What we care about Quality Relevance
  6. 6. ML Applications ● Homepage feed ranking ● Email digest ● Answer quality & ranking ● Spam & harassment classification ● Topic/User recommendation ● Trending Topics ● Automated Topic Labelling ● Related & Duplicate Question ● User trustworthiness ● ... click upvote downvote expand share
  7. 7. Models ● Deep Neural Networks ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● LambdaMART ● Matrix Factorization ● LDA ● ... ●
  8. 8. Deep Learning Works
  9. 9. Image Recognition
  10. 10. Speech Recognition
  11. 11. Natural Language Processing
  12. 12. Game Playing
  13. 13. Recommender Systems
  14. 14. But...
  15. 15. Deep Learning is not Magic
  16. 16. Deep Learning is not always that “accurate”
  17. 17. … or that “deep”
  18. 18. Other ML Advances ● Factorization Machines ● Tensor Methods ● Non-parametric Bayesian models ● XGBoost ● Online Learning ● Reinforcement Learning ● Learning to rank ● ...
  19. 19. Other very successful approaches
  20. 20. Is it bad to obsess over Deep Learning?
  21. 21. Some examples
  22. 22. Football or Futbol?
  23. 23. A real-life example Label
  24. 24. A real-life example: improved solution Label Other feature extraction algorithms E n s e m b l e Accuracy ++
  25. 25. ● Goal: Supervised Classification ○ 40 features ○ 10k examples ● What did the ML Engineer choose? ○ Multi-layer ANN trained with Tensor Flow ● What was his proposed next step? ○ Try ConvNets ● Where is the problem? ○ Hours to train, already looking into distributing ○ There are much simpler approaches Another real example
  26. 26. Why DL is not the only/main solution
  27. 27. Occam’s Razor
  28. 28. ● Given two models that perform more or less equally, you should always prefer the less complex ● Deep Learning might not be preferred, even if it squeezes a +1% in accuracy Occam’s razor
  29. 29. Occam’s razor: reasons to prefer a simpler model
  30. 30. ● There are many others ○ System complexity ○ Maintenance ○ Explainability ○ …. Occam’s razor: reasons to prefer a simpler model
  31. 31. No Free Lunch
  32. 32. “ (...) any two optimization algorithms are equivalent when their performance is averaged across all possible problems". “if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.” No Free Lunch Theorem
  33. 33. Feature Engineering
  34. 34. Need for feature engineering In many cases an understanding of the domain will lead to optimal results. Feature Engineering
  35. 35. Feature Engineering Example - Quora Answer Ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  36. 36. Feature Engineering Example - Quora Answer Ranking How are those dimensions translated into features? • Features that relate to the answer quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  37. 37. Feature Engineering ● Properties of a well-behaved ML feature: ○ Reusable ○ Transformable ○ Interpretable ○ Reliable
  38. 38. Deep Learning and Feature Engineering
  39. 39. Unsupervised Learning
  40. 40. ● Unsupervised learning is a very important paradigm in theory and in practice ● So far, unsupervised learning has helped deep learning, but deep learning has not helped unsupervised learning Unsupervised Learning
  41. 41. Supervised/Unsupervised Learning ● Unsupervised learning as dimensionality reduction ● Unsupervised learning as feature engineering ● The “magic” behind combining unsupervised/supervised learning ○ E.g.1 clustering + knn ○ E.g.2 Matrix Factorization ■ MF can be interpreted as ● Unsupervised: ○ Dimensionality Reduction a la PCA ○ Clustering (e.g. NMF) ● Supervised ○ Labeled targets ~ regression
  42. 42. Ensembles
  43. 43. Even if all problems end up being suited for Deep Learning, there will always be a place for ensembles. ● Given the output of a Deep Learning prediction, you will be able to combine it with some other model or feature to improve the results. Ensembles
  44. 44. Ensembles ● Netflix Prize was won by an ensemble ○ Initially Bellkor was using GDBTs ○ BigChaos introduced ANN-based ensemble ● Most practical applications of ML run an ensemble ○ Why wouldn’t you? ○ At least as good as the best of your methods ○ Can add completely different approaches
  45. 45. Ensembles & Feature Engineering ● Ensembles are the way to turn any model into a feature! ● E.g. Don’t know if the way to go is to use Factorization Machines, Tensor Factorization, or RNNs? ○ Treat each model as a “feature” ○ Feed them into an ensemble
  46. 46. Distributing Algorithms
  47. 47. Distributing ML ● Most of what people do in practice can fit into a multi-core machine ○ Smart data sampling ○ Offline schemes ○ Efficient parallel code ● … but not Deep ANNs ● Do you care about costs? How about latencies or system complexity/debuggability?
  48. 48. Distributing ML ● That said… ● Deep Learning has managed to get away by promoting a “new paradigm” of parallel computing: GPU’s
  49. 49. Conclusions
  50. 50. Conclusions ● Deep Learning has had some impressive results lately ● However, Deep Learning is not the only solution ○ It is dangerous to oversell Deep Learning ● Important to take other things into account ○ Other approaches/models ○ Feature Engineering ○ Unsupervised Learning ○ Ensembles ○ Need to distribute, costs, system complexity...
  51. 51. Questions?

×