SlideShare a Scribd company logo
1 of 19
Download to read offline
DO DEEP NETS REALLY
NEED TO BE DEEP?
Meoni Marco – UNIPI – March 7th 2016
Lei Jimmy Ba
University of Toronto
Rich Caruana
Microsoft Research
PhD course in Deep Learning
NNs
Outputs
Inputs
SNN: Single Hidden Layer
Outputs
Inputs
DNN: Three Hidden Layers
Outputs
Inputs
CNN: Three Hidden Layers above
Convolutional/MaxPooling Layers
Introduction
•  DNNs excel over SNNs
•  e.g. accuracy on top of 1M labeled points is 91% vs 86%
•  Source of improvement of DNNs vs SNNs
•  Deep nets have more parameters?
•  Deep nets can learn more complex functions?
•  Convolution gives a plus?
Contribution
•  Possible to train a SNN that mimics the function of a DNN
•  Model compression method
•  Possible to mimic but non able to train
•  SNNs as accurate as DNNs even if not possible to train SNNs as
accurate as DNNs on the original labeled data
•  Necessary to be deep?
•  If SNN can mimic a DNN, DDN learning function not that deep?
•  Success related to the learning process
Model Compression
DNN CNN …
Ensemble
Data
1. Build a complex model 2.  Train a simple model to
mimic complex function
3.  Apply it
Scores
Labels
SNN
Data
Scores
SNN
Data
Labels
•  Compress large ensembles into smaller, faster models
•  Train to learn the function learned by the larger model, not on original labels
Model Compression (Bucila,Caruana&Niculescu2006)
•  Train smaller model to mimic a larger, smarter model
•  train smart model anyway you want:
•  DNN, CNN, or ensemble of CNNs
•  pass large unlabeled data through model to collect predictions (capture
the function learned by smart model)
•  train “small” model to mimic large model on labeled data
Logits
•  Model compression
•  train mimic SNNs using data labeled by DNNs
•  DNN trained with softmax output and cross-entropy
•  SNN trained on logits (log of predicted probabilities)
before softmax activation
SNN-MIMIC
•  Training data
•  Objective function
•  Weights updated with BP and SGD with momentum
Speed-up Mimic Learning
•  SNN has same #parameters: slow learning (GPU weeks)
•  Add bottleneck linear layer
•  k linear hidden units between input and non-linear hidden layer
•  factorize W ∈ RH×D into the product of 2 low-rank matrices
Cost Function with Linear Layer
•  O(k(H+D)) memory instead of O(HD)
•  Factorization between input and hidden levels is new and
improve convergence speed during training
•  Previous works factorize last output layer
Use Cases
TIMIT (phoneme recognition)
•  In: lexically/phonetically labeled sentences
•  Out: phonemes
CIFAR-10 (image recognition)
•  In: images
•  Out: classes
TIMIT Phoneme Recognition
•  1845 dimension input vector from raw waveform audio data
•  183 dimension target label vectors (61 phonemes x 3)
•  1.1M examples in training set
•  DNN
•  3 hidden layers with 2000 ReLU units
•  CNN
•  Convolutional + maxPooling + 3 hidden (2000 ReLU) layers
•  ECNN
•  Ensemble of 9 CNNs
•  SNN
•  8k/50k/400k non linear hidden units
TIMIT - Compression Results
TIMIT - Accuracy
CIFAR-10 Image Recognition
•  3072 dimension input vector (32x32 pixels x 3 colors)
•  10-dimension target label vectors
•  1.05M images in two merged training sets
CIFAR-10 - Compression Results
Discussion
•  Why MIMIC models can be more accurate than training on
original labels
•  If labels have errors, teacher may
eliminate them making learning easier
for student
•  Teacher might resolve complex
regions
•  Learning from probabilities is easier
•  All outputs have “reason” for student
while teacher may encounter
unexplainable things
Representational Power
“We see little evidence that shallow models have limited capacity
or representational power.
Instead, the main limitation appears to be the learning and
regularization procedures used to train the shallow models”
THANK YOU!

More Related Content

What's hot

Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlowReproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlowUniversitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Universitat Politècnica de Catalunya
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processingananth
 
Audio tagging system using densely connected convolutional networks (DCASE201...
Audio tagging system using densely connected convolutional networks (DCASE201...Audio tagging system using densely connected convolutional networks (DCASE201...
Audio tagging system using densely connected convolutional networks (DCASE201...Hyun-gui Lim
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataYao-Chieh Hu
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” TaskThe Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Taskmultimediaeval
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in PythonImry Kissos
 

What's hot (20)

Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlowReproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Audio tagging system using densely connected convolutional networks (DCASE201...
Audio tagging system using densely connected convolutional networks (DCASE201...Audio tagging system using densely connected convolutional networks (DCASE201...
Audio tagging system using densely connected convolutional networks (DCASE201...
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” TaskThe Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 

Viewers also liked

IARC Marketing and Sales
IARC Marketing and SalesIARC Marketing and Sales
IARC Marketing and SalesAmbre Quinn
 
What is Google+ and why should we care? (2013 edition)
What is Google+ and why should we care? (2013 edition) What is Google+ and why should we care? (2013 edition)
What is Google+ and why should we care? (2013 edition) Kamber
 
Taylor Milbun Estate Agents In Essex Who Help With Mortgage
Taylor Milbun Estate Agents In Essex Who Help With MortgageTaylor Milbun Estate Agents In Essex Who Help With Mortgage
Taylor Milbun Estate Agents In Essex Who Help With MortgageMark Joseph
 
Year 13 parents' evening presentation - October 2015
Year 13 parents' evening presentation - October 2015Year 13 parents' evening presentation - October 2015
Year 13 parents' evening presentation - October 2015rpalmerratcliffe
 
What happens to the artist when you pirate
What happens to the artist when you pirateWhat happens to the artist when you pirate
What happens to the artist when you pirateUtsab Bandopadhyay
 
Acceptable behaviour? Government intervention on unhealthy foods
Acceptable behaviour? Government intervention on unhealthy foodsAcceptable behaviour? Government intervention on unhealthy foods
Acceptable behaviour? Government intervention on unhealthy foodsIpsos UK
 
Presentazione turismo pellegrino
Presentazione turismo pellegrinoPresentazione turismo pellegrino
Presentazione turismo pellegrinoClaudio Cheirasco
 
Search Engine Optimization (SEO) Trends 2015
Search Engine Optimization (SEO) Trends 2015Search Engine Optimization (SEO) Trends 2015
Search Engine Optimization (SEO) Trends 2015Venchito Tampon
 
Does Your Business Need to be Using Social Media
Does Your Business Need to be Using Social MediaDoes Your Business Need to be Using Social Media
Does Your Business Need to be Using Social MediaHall Internet Marketing
 
Enseñanza de la me canica
Enseñanza de la me canicaEnseñanza de la me canica
Enseñanza de la me canicamvaldes0127
 
Exorcise the NIMBY Within
Exorcise the NIMBY WithinExorcise the NIMBY Within
Exorcise the NIMBY Withinacohenhnk
 
1 plan del buen vivir 2009 2013-octubre 20_2010
1 plan del buen vivir 2009 2013-octubre 20_20101 plan del buen vivir 2009 2013-octubre 20_2010
1 plan del buen vivir 2009 2013-octubre 20_2010ubertocortez
 

Viewers also liked (20)

IARC Marketing and Sales
IARC Marketing and SalesIARC Marketing and Sales
IARC Marketing and Sales
 
What is Google+ and why should we care? (2013 edition)
What is Google+ and why should we care? (2013 edition) What is Google+ and why should we care? (2013 edition)
What is Google+ and why should we care? (2013 edition)
 
จรรยาวิชาชีพวิจัย
จรรยาวิชาชีพวิจัยจรรยาวิชาชีพวิจัย
จรรยาวิชาชีพวิจัย
 
Angola
AngolaAngola
Angola
 
Taylor Milbun Estate Agents In Essex Who Help With Mortgage
Taylor Milbun Estate Agents In Essex Who Help With MortgageTaylor Milbun Estate Agents In Essex Who Help With Mortgage
Taylor Milbun Estate Agents In Essex Who Help With Mortgage
 
Historiadeladn
HistoriadeladnHistoriadeladn
Historiadeladn
 
شكر
شكرشكر
شكر
 
Year 13 parents' evening presentation - October 2015
Year 13 parents' evening presentation - October 2015Year 13 parents' evening presentation - October 2015
Year 13 parents' evening presentation - October 2015
 
What happens to the artist when you pirate
What happens to the artist when you pirateWhat happens to the artist when you pirate
What happens to the artist when you pirate
 
Renevela16
Renevela16Renevela16
Renevela16
 
Acceptable behaviour? Government intervention on unhealthy foods
Acceptable behaviour? Government intervention on unhealthy foodsAcceptable behaviour? Government intervention on unhealthy foods
Acceptable behaviour? Government intervention on unhealthy foods
 
Presentazione turismo pellegrino
Presentazione turismo pellegrinoPresentazione turismo pellegrino
Presentazione turismo pellegrino
 
Earthsoft-Collection-Apr 2011
Earthsoft-Collection-Apr 2011Earthsoft-Collection-Apr 2011
Earthsoft-Collection-Apr 2011
 
Search Engine Optimization (SEO) Trends 2015
Search Engine Optimization (SEO) Trends 2015Search Engine Optimization (SEO) Trends 2015
Search Engine Optimization (SEO) Trends 2015
 
Does Your Business Need to be Using Social Media
Does Your Business Need to be Using Social MediaDoes Your Business Need to be Using Social Media
Does Your Business Need to be Using Social Media
 
Enseñanza de la me canica
Enseñanza de la me canicaEnseñanza de la me canica
Enseñanza de la me canica
 
Exorcise the NIMBY Within
Exorcise the NIMBY WithinExorcise the NIMBY Within
Exorcise the NIMBY Within
 
1 plan del buen vivir 2009 2013-octubre 20_2010
1 plan del buen vivir 2009 2013-octubre 20_20101 plan del buen vivir 2009 2013-octubre 20_2010
1 plan del buen vivir 2009 2013-octubre 20_2010
 
Sesion 5
Sesion 5Sesion 5
Sesion 5
 
lingkaran
lingkaranlingkaran
lingkaran
 

Similar to Do deep nets really need to be deep?

Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitBAINIDA
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxChun-Hao Chang
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfAubainYro1
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabImry Kissos
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...Vandana Kannan
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...Apache MXNet
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
Resnet.pdf
Resnet.pdfResnet.pdf
Resnet.pdfYanhuaSi
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetAmazon Web Services
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 

Similar to Do deep nets really need to be deep? (20)

Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdf
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and Matlab
 
Use CNN for Sequence Modeling
Use CNN for Sequence ModelingUse CNN for Sequence Modeling
Use CNN for Sequence Modeling
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Resnet.pdf
Resnet.pdfResnet.pdf
Resnet.pdf
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 

Recently uploaded

Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Do deep nets really need to be deep?

  • 1. DO DEEP NETS REALLY NEED TO BE DEEP? Meoni Marco – UNIPI – March 7th 2016 Lei Jimmy Ba University of Toronto Rich Caruana Microsoft Research PhD course in Deep Learning
  • 2. NNs Outputs Inputs SNN: Single Hidden Layer Outputs Inputs DNN: Three Hidden Layers Outputs Inputs CNN: Three Hidden Layers above Convolutional/MaxPooling Layers
  • 3. Introduction •  DNNs excel over SNNs •  e.g. accuracy on top of 1M labeled points is 91% vs 86% •  Source of improvement of DNNs vs SNNs •  Deep nets have more parameters? •  Deep nets can learn more complex functions? •  Convolution gives a plus?
  • 4. Contribution •  Possible to train a SNN that mimics the function of a DNN •  Model compression method •  Possible to mimic but non able to train •  SNNs as accurate as DNNs even if not possible to train SNNs as accurate as DNNs on the original labeled data •  Necessary to be deep? •  If SNN can mimic a DNN, DDN learning function not that deep? •  Success related to the learning process
  • 5. Model Compression DNN CNN … Ensemble Data 1. Build a complex model 2.  Train a simple model to mimic complex function 3.  Apply it Scores Labels SNN Data Scores SNN Data Labels •  Compress large ensembles into smaller, faster models •  Train to learn the function learned by the larger model, not on original labels
  • 6. Model Compression (Bucila,Caruana&Niculescu2006) •  Train smaller model to mimic a larger, smarter model •  train smart model anyway you want: •  DNN, CNN, or ensemble of CNNs •  pass large unlabeled data through model to collect predictions (capture the function learned by smart model) •  train “small” model to mimic large model on labeled data
  • 7. Logits •  Model compression •  train mimic SNNs using data labeled by DNNs •  DNN trained with softmax output and cross-entropy •  SNN trained on logits (log of predicted probabilities) before softmax activation
  • 8. SNN-MIMIC •  Training data •  Objective function •  Weights updated with BP and SGD with momentum
  • 9. Speed-up Mimic Learning •  SNN has same #parameters: slow learning (GPU weeks) •  Add bottleneck linear layer •  k linear hidden units between input and non-linear hidden layer •  factorize W ∈ RH×D into the product of 2 low-rank matrices
  • 10. Cost Function with Linear Layer •  O(k(H+D)) memory instead of O(HD) •  Factorization between input and hidden levels is new and improve convergence speed during training •  Previous works factorize last output layer
  • 11. Use Cases TIMIT (phoneme recognition) •  In: lexically/phonetically labeled sentences •  Out: phonemes CIFAR-10 (image recognition) •  In: images •  Out: classes
  • 12. TIMIT Phoneme Recognition •  1845 dimension input vector from raw waveform audio data •  183 dimension target label vectors (61 phonemes x 3) •  1.1M examples in training set •  DNN •  3 hidden layers with 2000 ReLU units •  CNN •  Convolutional + maxPooling + 3 hidden (2000 ReLU) layers •  ECNN •  Ensemble of 9 CNNs •  SNN •  8k/50k/400k non linear hidden units
  • 15. CIFAR-10 Image Recognition •  3072 dimension input vector (32x32 pixels x 3 colors) •  10-dimension target label vectors •  1.05M images in two merged training sets
  • 17. Discussion •  Why MIMIC models can be more accurate than training on original labels •  If labels have errors, teacher may eliminate them making learning easier for student •  Teacher might resolve complex regions •  Learning from probabilities is easier •  All outputs have “reason” for student while teacher may encounter unexplainable things
  • 18. Representational Power “We see little evidence that shallow models have limited capacity or representational power. Instead, the main limitation appears to be the learning and regularization procedures used to train the shallow models”