SlideShare a Scribd company logo
1 of 13
Download to read offline
Dark Knowledge
Alex Tellez & Michal Malohlava
www.h2o.ai
DARK KNOWLEDGE?
Geoff’s been busy at Google. Recently, he published a paper talking
about ‘Dark Knowledge’. Sounds creepy….
What problem is this referring to?
Model complexity with respect to deployment.
Ensembles (RF / GBM) + DNN are slow to train + predict
and require lots of memory (READ: $$$)
What’s the solution?
Train a simpler model that extracts the ‘dark knowledge’ from the DNN
(or ensemble) we want to mimic. The simpler model can then be
deployed at a cheaper ‘cost’.
WHY NOW?
CLEARLY, this is a good idea BUT why hasn’t there been more
investigation to an otherwise very promising approach?
Our Perception
We equate the knowledge in a trained model learned parameters
(i.e. weights)
How can you change the ‘form’ of the model but keep the same
knowledge?
…which is why we have trouble with this question:
Answer: By using soft targets to train a simpler model to extract the
‘dark knowledge’ from the more complex model.
GAME-PLAN
1. Import Higgs-Boson Dataset (~11 million rows, 5 GBs)
2. Create FOUR splits of our dataset
3. Train a ‘cumbersome’ deep neural network
4. Predict targets for transfer dataset append as ‘soft targets’ for
distilled model.
5. Train ‘distilled’ model on soft targets to learn ‘Dark Knowledge’
6. Compare ‘distilled’ model vs.‘cumbersome’ model on validation data
FOUR DATASETS?
The original Higgs-Boson Dataset: 11 million rows
Split this into….
data.train - 8.8 million rows
data.test - 550k rows
‘Cumbersome Model’
data.transfer - 1.1 million rows ‘Distilled Model’
(labels = prediction from
‘Cumbersome’ Model
Model Comparisondata.valid - 550k rows
1.
2.
3.
4.
THE ‘CUMBERSOME’ NET
Inputs: 29 machine + human generated features
# of Layers: 3
# of Hidden Neurons: 1,024 / layer (3,072 total)
Activation Function: Rectifier w/ Dropout (default = 50%)
Input Dropout: 10%, L1-regularization: 0.0001
Total Training Time: 24 mins. 20 seconds
‘CUMBERSOME’ NET CONT’D.
27% Model Error
~ 0.82 AUC
SOFTVS. HARDTARGETS
Hard Targets: Actual labels of the data (e.g. 1 if Higgs-Boson particle)
Soft Targets: The predicted labels from the data which will be used to
train the distilled model
soft-targets
“distilled model”
“cumbersome model”
predicts labels
on transfer
dataset
(aka ‘soft’ targets)
TRAIN ‘DISTILLED’ NET
AFTER cumbersome model predicts labels on transfer data,
use these labels as ‘soft targets’ to train distilled network
‘Cumbersome’ Net ‘Distilled’ Net
3 layers x 1,024 neurons / layer 2 layers x 800 neurons / layer
Rectifier w/ Dropout Rectifier
Input Dropout + L1-regular. No Input Dropout OR L1-regular.
‘DISTILLED’ NET CONT’D.
~ 3 minutes to train
High AUC on ‘soft’ targets
THE REAL ACIDTEST
So we have 2 models:
Cumbersome Model: Trained w/ DReD Net
Distilled Model: Trained w/ SoftTargets onTransfer Dataset
NOW, it’s time to score each model against the validation dataset
(which has hard targets)
NOT SHABBY…
Cumbersome Model Confusion Matrix:
Distilled Model Confusion Matrix:
A difference of
737 errors (!!)
WHAT NOW?
If you want to know more read:
“Distilling the Knowledge in a Neural Network” - G.E. Hinton
Alex: alex@h2o.ai Michal: michal@h2o.ai
Coming Soon: “The Hinton Trick” will be added to H2O’s algo
roadmap!
Next Test: Try some ensemble approaches (e.g. Random Forest,
Gradient Boosting Machine)
Result: Our ‘simple’ net does a very decent job compared to the
complex net having learned ‘dark knowledge’

More Related Content

What's hot

Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Training course lect3
Training course lect3Training course lect3
Training course lect3Noor Dhiya
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksarjitkantgupta
 
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Altoros
 
Machine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronAndrew Ferlitsch
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Object classification using deep neural network
Object classification using deep neural networkObject classification using deep neural network
Object classification using deep neural networknishakushwah4
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayesKyuri Kim
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Jedha Bootcamp
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksKyuri Kim
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronsmitamm
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introductionSungminYou
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowSri Ambati
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksWoodridge Software
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayesmehdi Cherti
 

What's hot (20)

Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Training course lect3
Training course lect3Training course lect3
Training course lect3
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
 
Machine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - Perceptron
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Object classification using deep neural network
Object classification using deep neural networkObject classification using deep neural network
Object classification using deep neural network
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayes
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 

Similar to DarkKnowledge

Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningOswald Campesato
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep LearningOswald Campesato
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningOswald Campesato
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Oswald Campesato
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)Julien SIMON
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Ganesan Narayanasamy
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Amr Rashed
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkitde:code 2017
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 

Similar to DarkKnowledge (20)

Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117
 
supervised.pptx
supervised.pptxsupervised.pptx
supervised.pptx
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 

DarkKnowledge

  • 1. Dark Knowledge Alex Tellez & Michal Malohlava www.h2o.ai
  • 2. DARK KNOWLEDGE? Geoff’s been busy at Google. Recently, he published a paper talking about ‘Dark Knowledge’. Sounds creepy…. What problem is this referring to? Model complexity with respect to deployment. Ensembles (RF / GBM) + DNN are slow to train + predict and require lots of memory (READ: $$$) What’s the solution? Train a simpler model that extracts the ‘dark knowledge’ from the DNN (or ensemble) we want to mimic. The simpler model can then be deployed at a cheaper ‘cost’.
  • 3. WHY NOW? CLEARLY, this is a good idea BUT why hasn’t there been more investigation to an otherwise very promising approach? Our Perception We equate the knowledge in a trained model learned parameters (i.e. weights) How can you change the ‘form’ of the model but keep the same knowledge? …which is why we have trouble with this question: Answer: By using soft targets to train a simpler model to extract the ‘dark knowledge’ from the more complex model.
  • 4. GAME-PLAN 1. Import Higgs-Boson Dataset (~11 million rows, 5 GBs) 2. Create FOUR splits of our dataset 3. Train a ‘cumbersome’ deep neural network 4. Predict targets for transfer dataset append as ‘soft targets’ for distilled model. 5. Train ‘distilled’ model on soft targets to learn ‘Dark Knowledge’ 6. Compare ‘distilled’ model vs.‘cumbersome’ model on validation data
  • 5. FOUR DATASETS? The original Higgs-Boson Dataset: 11 million rows Split this into…. data.train - 8.8 million rows data.test - 550k rows ‘Cumbersome Model’ data.transfer - 1.1 million rows ‘Distilled Model’ (labels = prediction from ‘Cumbersome’ Model Model Comparisondata.valid - 550k rows 1. 2. 3. 4.
  • 6. THE ‘CUMBERSOME’ NET Inputs: 29 machine + human generated features # of Layers: 3 # of Hidden Neurons: 1,024 / layer (3,072 total) Activation Function: Rectifier w/ Dropout (default = 50%) Input Dropout: 10%, L1-regularization: 0.0001 Total Training Time: 24 mins. 20 seconds
  • 7. ‘CUMBERSOME’ NET CONT’D. 27% Model Error ~ 0.82 AUC
  • 8. SOFTVS. HARDTARGETS Hard Targets: Actual labels of the data (e.g. 1 if Higgs-Boson particle) Soft Targets: The predicted labels from the data which will be used to train the distilled model soft-targets “distilled model” “cumbersome model” predicts labels on transfer dataset (aka ‘soft’ targets)
  • 9. TRAIN ‘DISTILLED’ NET AFTER cumbersome model predicts labels on transfer data, use these labels as ‘soft targets’ to train distilled network ‘Cumbersome’ Net ‘Distilled’ Net 3 layers x 1,024 neurons / layer 2 layers x 800 neurons / layer Rectifier w/ Dropout Rectifier Input Dropout + L1-regular. No Input Dropout OR L1-regular.
  • 10. ‘DISTILLED’ NET CONT’D. ~ 3 minutes to train High AUC on ‘soft’ targets
  • 11. THE REAL ACIDTEST So we have 2 models: Cumbersome Model: Trained w/ DReD Net Distilled Model: Trained w/ SoftTargets onTransfer Dataset NOW, it’s time to score each model against the validation dataset (which has hard targets)
  • 12. NOT SHABBY… Cumbersome Model Confusion Matrix: Distilled Model Confusion Matrix: A difference of 737 errors (!!)
  • 13. WHAT NOW? If you want to know more read: “Distilling the Knowledge in a Neural Network” - G.E. Hinton Alex: alex@h2o.ai Michal: michal@h2o.ai Coming Soon: “The Hinton Trick” will be added to H2O’s algo roadmap! Next Test: Try some ensemble approaches (e.g. Random Forest, Gradient Boosting Machine) Result: Our ‘simple’ net does a very decent job compared to the complex net having learned ‘dark knowledge’