SlideShare a Scribd company logo
1 of 26
Download to read offline
Integration of Unsupervised and
Supervised Criteria for DNNs
Training
International Conf. on Artificial Neural Networks
Francisco Zamora-Martínez, Francisco Javier
Muñoz-Almaraz, Juan Pardo
Departamento de ciencias físicas, matemáticas y de la computación
Universidad CEU Cardenal Herrera
September 7th, 2016
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Motivation
Greedy layer-wise unsupervised pre-training is
successful training logistic MLPs: Two training
stages
...1 Pre-training with unsupervised data (SAEs or
RBMs)
...2 Fine-tuning parameters with supervised data
Very useful when large unsupervised data is
available
But…
It is a greedy approach
Not valid for on-line learning scenarios
Not as much useful with small data sets
Motivation
Goals
Train a supervised model
Layer-wise conditioned by unsupervised loss
Improving gradient flow
Learning better features
Every layer parameters should be
Useful for the global supervised task
Able to reconstruct their input (Auto-Encoders)
Motivation
Related works
Is Joint Training Better for Deep
Auto-Encoders?, by Y. Zhou et al (2015) paper
at arXiv (fine-tuning stage for supervision)
Preliminary work done by P. Vincent et al
(2010), Stacked denoising autoencoders:
Learning useful representations in a deep
network with a local denoising criterion
Deep learning via Semi-Supervised Embedding,
by Weston et al (2008) ICML paper
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Method description
How to do it
Risk = Supervised Loss + Sumlayer( Unsup. Loss )
R(θ, D) =
1
|D|
∑
(x,y)∈D
[
λ0Ls(F(x; θ), y) +
H∑
k=1
λkU(k)
]
+ ϵΩ(θ)
U(k)
= Lu(Ak(h(k−1)
; θ), h(k−1)
) for 1 ≤ k ≤ H
λk ≥ 0
F(x; θ) is the MLP model
A(h(k−1)
; θ) is a Denoising AE model
H is the number of hidden layers
h(0)
= x
Method description
How to do it
Risk = Supervised Loss + Sumlayer( Unsup. Loss )
R(θ, D) =
1
|D|
∑
(x,y)∈D
[
λ0Ls(F(x; θ), y)+
H∑
k=1
λkU(k)
]
+ϵΩ(θ)
λ vector mixes all the components
It should be updated every iteration
Starting focused at unsupervised criteria
Ending focused at supervised criterion
Method description
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
λ Update Policies I
A λ update policy indicates how to change λ
vector every iteration
The supervised part (λ0) can be fixed to 1.
The unsupervised part should be important
during first iterations
Loosing focus while training
Being insignificant at the end
A greedy exponential decay (GED) will suffice
λ0(t) = 1 ; λk(t) = Λγt
Being the constants Λ > 0 and γ ∈ [0, 1]
λ Update Policies II
Exponential decay is the most simple approach, but
other policies are possible
Ratio between loss functions
Ratio between gradients at each layer
A combination of them
…
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Experiments and Results (MNIST) I
Benchmark with MNIST dataset
Logistic activation functions, softmax output
Cross-entropy for supervised and unsupervised
losses
Classification error as evaluation measure
Effect of MLP topology and Λ initial value λk
Sensitivity study of γ exponential decay term
Comparison with other literature models
Experiments and Results (MNIST) II
Test error (%) plus 95% confidence interval
Data Set SAE-3 SDAE-3 GED-3
MNIST 1.40±0.23 1.28±0.22 1.22±0.22
basic 3.46±0.16 2.84±0.15 2.72±0.14
SAE-3 and SDAE-3 taken from Vincent et al (2010)
Experiments and Results (MNIST) III
Hyper-parameters grid search (Validation set)
MNIST
256
512
1024
2048
Depth=1
256
512
1024
2048
Depth=2
256
512
1024
2048
Depth=3
256
512
1024
2048
Depth=4
256
512
1024
2048
0.8
1
1.2
1.4
1.6
1.8
2
Error(%)
Layer size
Depth=5
Λ values
0.00000
0.00001
0.00100
0.20000
0.60000
1.00000
3.00000
5.00000
Experiments and Results (MNIST) IV
γ exponential decay term (Validation set)
MNIST
1.00
1.10
1.20
1.30
1.40
1.50
0.5 0.6 0.7 0.8 0.9 1
Validationerror(%)
Decay (γ)
1.00
1.10
1.20
1.30
1.40
1.50
0.997 0.998 0.999 1
Validationerror(%)
Decay (γ)
Detail
Experiments and Results (MNIST) V
First layer filters (16 of 2048 units)
Only supervised γ = 0.999 γ = 1.000
Experiments and Results (SML2010) I
SML2010 UCI data set: indoor temperature
forecasting
Logistic hidden act. functions, linear output
48 inputs (12 hours) and 12 outputs (3 hours)
Mean Square Error function supervised loss
Cross-entropy unsupervised losses
Mean Absolute Error (MAE) for evaluation
Compared MLPs w/wout unsupervised losses
Experiments and Results (SML2010) II
Depth Size MAE Λ = 0 MAE Λ = 1
3 32 0.1322 0.1266
3 64 0.1350 0.1257
3 128 0.1308 0.1292
3 512 0.6160 0.1312
Validation set results. In red statistically significant
improvements.
Λ = 0 is training DNN only with supervised loss.
Λ = 1 is training DNN with supervised and unsupervised losses.
Experiments and Results (SML2010) III
Test result for 3 layers model with 64 neurons
per hidden layer:
0.1274 when Λ = 0
0.1177 when Λ = 1
Able to train up to 10 layers DNNs with 64
hidden units per layer when Λ = 1
MAE in range [0.1274, 0.1331]
Λ = 0 is training DNN only with supervised loss.
Λ = 1 is training DNN with supervised and unsupervised losses.
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Conclusions
One-stage training of deep models combining
supervised and unsupervised loss functions
Comparable with greedy layer-wise
unsupervised pre-training + fine-tuning
The approach is successful training deep MLPs
with logistic activations
Decaying unsupervised loss during training is
crucial
Time-series results encourage further research
of this idea into on-line learning scenarios
Future Work
Better filters and models? Further research
needed
Study the effect using ReLU activations
Study other alternatives to exponential
decaying of unsupervised loss: dynamic
adaptation
The End
Thanks for your attention!!!
Questions?

More Related Content

Similar to Integration of Unsupervised and Supervised Criteria for DNNs Training

Don't forget, there is more than forgetting: new metrics for Continual Learni...
Don't forget, there is more than forgetting: new metrics for Continual Learni...Don't forget, there is more than forgetting: new metrics for Continual Learni...
Don't forget, there is more than forgetting: new metrics for Continual Learni...Vincenzo Lomonaco
 
A Tactile P300-based Brain-Computer Interface Accuracy Improvement
A Tactile P300-based Brain-Computer Interface Accuracy ImprovementA Tactile P300-based Brain-Computer Interface Accuracy Improvement
A Tactile P300-based Brain-Computer Interface Accuracy ImprovementTakumi Kodama
 
Full–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Full–body Tactile P300–based Brain–computer Interface Accuracy RefinementFull–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Full–body Tactile P300–based Brain–computer Interface Accuracy RefinementTakumi Kodama
 
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Ricardo Guerrero Gómez-Olmedo
 
Faster Interleaved Modular Multiplier Based on Sign Detection
Faster Interleaved Modular Multiplier Based on Sign DetectionFaster Interleaved Modular Multiplier Based on Sign Detection
Faster Interleaved Modular Multiplier Based on Sign DetectionVLSICS Design
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Rafael Nogueras
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learningSylvain Ferrandiz
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation SaravanakumarSekar4
 
Expert estimation from Multimodal Features
Expert estimation from Multimodal FeaturesExpert estimation from Multimodal Features
Expert estimation from Multimodal FeaturesXavier Ochoa
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsYoung-Geun Choi
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultPiyush Agarwal
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoringharmonylab
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Université de Liège (ULg)
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksUniversitat Politècnica de Catalunya
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 

Similar to Integration of Unsupervised and Supervised Criteria for DNNs Training (20)

Don't forget, there is more than forgetting: new metrics for Continual Learni...
Don't forget, there is more than forgetting: new metrics for Continual Learni...Don't forget, there is more than forgetting: new metrics for Continual Learni...
Don't forget, there is more than forgetting: new metrics for Continual Learni...
 
A Tactile P300-based Brain-Computer Interface Accuracy Improvement
A Tactile P300-based Brain-Computer Interface Accuracy ImprovementA Tactile P300-based Brain-Computer Interface Accuracy Improvement
A Tactile P300-based Brain-Computer Interface Accuracy Improvement
 
Full–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Full–body Tactile P300–based Brain–computer Interface Accuracy RefinementFull–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Full–body Tactile P300–based Brain–computer Interface Accuracy Refinement
 
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Faster Interleaved Modular Multiplier Based on Sign Detection
Faster Interleaved Modular Multiplier Based on Sign DetectionFaster Interleaved Modular Multiplier Based on Sign Detection
Faster Interleaved Modular Multiplier Based on Sign Detection
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation
 
Expert estimation from Multimodal Features
Expert estimation from Multimodal FeaturesExpert estimation from Multimodal Features
Expert estimation from Multimodal Features
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
 
report
reportreport
report
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 

More from Francisco Zamora-Martinez

ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeFrancisco Zamora-Martinez
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Francisco Zamora-Martinez
 
F-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural NetworksF-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural NetworksFrancisco Zamora-Martinez
 
Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...Francisco Zamora-Martinez
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingFrancisco Zamora-Martinez
 
Adding morphological information to a connectionist Part-Of-Speech tagger
Adding morphological information  to a connectionist Part-Of-Speech taggerAdding morphological information  to a connectionist Part-Of-Speech tagger
Adding morphological information to a connectionist Part-Of-Speech taggerFrancisco Zamora-Martinez
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Francisco Zamora-Martinez
 
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsFrancisco Zamora-Martinez
 
Efficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural NetworksEfficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural NetworksFrancisco Zamora-Martinez
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFrancisco Zamora-Martinez
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...Francisco Zamora-Martinez
 

More from Francisco Zamora-Martinez (11)

ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
 
F-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural NetworksF-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural Networks
 
Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
 
Adding morphological information to a connectionist Part-Of-Speech tagger
Adding morphological information  to a connectionist Part-Of-Speech taggerAdding morphological information  to a connectionist Part-Of-Speech tagger
Adding morphological information to a connectionist Part-Of-Speech tagger
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
 
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
 
Efficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural NetworksEfficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural Networks
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
 

Recently uploaded

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 

Recently uploaded (20)

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 

Integration of Unsupervised and Supervised Criteria for DNNs Training

  • 1. Integration of Unsupervised and Supervised Criteria for DNNs Training International Conf. on Artificial Neural Networks Francisco Zamora-Martínez, Francisco Javier Muñoz-Almaraz, Juan Pardo Departamento de ciencias físicas, matemáticas y de la computación Universidad CEU Cardenal Herrera September 7th, 2016
  • 2. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 3. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 4. Motivation Greedy layer-wise unsupervised pre-training is successful training logistic MLPs: Two training stages ...1 Pre-training with unsupervised data (SAEs or RBMs) ...2 Fine-tuning parameters with supervised data Very useful when large unsupervised data is available But… It is a greedy approach Not valid for on-line learning scenarios Not as much useful with small data sets
  • 5. Motivation Goals Train a supervised model Layer-wise conditioned by unsupervised loss Improving gradient flow Learning better features Every layer parameters should be Useful for the global supervised task Able to reconstruct their input (Auto-Encoders)
  • 6. Motivation Related works Is Joint Training Better for Deep Auto-Encoders?, by Y. Zhou et al (2015) paper at arXiv (fine-tuning stage for supervision) Preliminary work done by P. Vincent et al (2010), Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion Deep learning via Semi-Supervised Embedding, by Weston et al (2008) ICML paper
  • 7. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 8. Method description How to do it Risk = Supervised Loss + Sumlayer( Unsup. Loss ) R(θ, D) = 1 |D| ∑ (x,y)∈D [ λ0Ls(F(x; θ), y) + H∑ k=1 λkU(k) ] + ϵΩ(θ) U(k) = Lu(Ak(h(k−1) ; θ), h(k−1) ) for 1 ≤ k ≤ H λk ≥ 0 F(x; θ) is the MLP model A(h(k−1) ; θ) is a Denoising AE model H is the number of hidden layers h(0) = x
  • 9. Method description How to do it Risk = Supervised Loss + Sumlayer( Unsup. Loss ) R(θ, D) = 1 |D| ∑ (x,y)∈D [ λ0Ls(F(x; θ), y)+ H∑ k=1 λkU(k) ] +ϵΩ(θ) λ vector mixes all the components It should be updated every iteration Starting focused at unsupervised criteria Ending focused at supervised criterion
  • 11. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 12. λ Update Policies I A λ update policy indicates how to change λ vector every iteration The supervised part (λ0) can be fixed to 1. The unsupervised part should be important during first iterations Loosing focus while training Being insignificant at the end A greedy exponential decay (GED) will suffice λ0(t) = 1 ; λk(t) = Λγt Being the constants Λ > 0 and γ ∈ [0, 1]
  • 13. λ Update Policies II Exponential decay is the most simple approach, but other policies are possible Ratio between loss functions Ratio between gradients at each layer A combination of them …
  • 14. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 15. Experiments and Results (MNIST) I Benchmark with MNIST dataset Logistic activation functions, softmax output Cross-entropy for supervised and unsupervised losses Classification error as evaluation measure Effect of MLP topology and Λ initial value λk Sensitivity study of γ exponential decay term Comparison with other literature models
  • 16. Experiments and Results (MNIST) II Test error (%) plus 95% confidence interval Data Set SAE-3 SDAE-3 GED-3 MNIST 1.40±0.23 1.28±0.22 1.22±0.22 basic 3.46±0.16 2.84±0.15 2.72±0.14 SAE-3 and SDAE-3 taken from Vincent et al (2010)
  • 17. Experiments and Results (MNIST) III Hyper-parameters grid search (Validation set) MNIST 256 512 1024 2048 Depth=1 256 512 1024 2048 Depth=2 256 512 1024 2048 Depth=3 256 512 1024 2048 Depth=4 256 512 1024 2048 0.8 1 1.2 1.4 1.6 1.8 2 Error(%) Layer size Depth=5 Λ values 0.00000 0.00001 0.00100 0.20000 0.60000 1.00000 3.00000 5.00000
  • 18. Experiments and Results (MNIST) IV γ exponential decay term (Validation set) MNIST 1.00 1.10 1.20 1.30 1.40 1.50 0.5 0.6 0.7 0.8 0.9 1 Validationerror(%) Decay (γ) 1.00 1.10 1.20 1.30 1.40 1.50 0.997 0.998 0.999 1 Validationerror(%) Decay (γ) Detail
  • 19. Experiments and Results (MNIST) V First layer filters (16 of 2048 units) Only supervised γ = 0.999 γ = 1.000
  • 20. Experiments and Results (SML2010) I SML2010 UCI data set: indoor temperature forecasting Logistic hidden act. functions, linear output 48 inputs (12 hours) and 12 outputs (3 hours) Mean Square Error function supervised loss Cross-entropy unsupervised losses Mean Absolute Error (MAE) for evaluation Compared MLPs w/wout unsupervised losses
  • 21. Experiments and Results (SML2010) II Depth Size MAE Λ = 0 MAE Λ = 1 3 32 0.1322 0.1266 3 64 0.1350 0.1257 3 128 0.1308 0.1292 3 512 0.6160 0.1312 Validation set results. In red statistically significant improvements. Λ = 0 is training DNN only with supervised loss. Λ = 1 is training DNN with supervised and unsupervised losses.
  • 22. Experiments and Results (SML2010) III Test result for 3 layers model with 64 neurons per hidden layer: 0.1274 when Λ = 0 0.1177 when Λ = 1 Able to train up to 10 layers DNNs with 64 hidden units per layer when Λ = 1 MAE in range [0.1274, 0.1331] Λ = 0 is training DNN only with supervised loss. Λ = 1 is training DNN with supervised and unsupervised losses.
  • 23. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 24. Conclusions One-stage training of deep models combining supervised and unsupervised loss functions Comparable with greedy layer-wise unsupervised pre-training + fine-tuning The approach is successful training deep MLPs with logistic activations Decaying unsupervised loss during training is crucial Time-series results encourage further research of this idea into on-line learning scenarios
  • 25. Future Work Better filters and models? Further research needed Study the effect using ReLU activations Study other alternatives to exponential decaying of unsupervised loss: dynamic adaptation
  • 26. The End Thanks for your attention!!! Questions?