SlideShare a Scribd company logo
Integration of Unsupervised and
Supervised Criteria for DNNs
Training
International Conf. on Artificial Neural Networks
Francisco Zamora-Martínez, Francisco Javier
Muñoz-Almaraz, Juan Pardo
Departamento de ciencias físicas, matemáticas y de la computación
Universidad CEU Cardenal Herrera
September 7th, 2016
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Motivation
Greedy layer-wise unsupervised pre-training is
successful training logistic MLPs: Two training
stages
...1 Pre-training with unsupervised data (SAEs or
RBMs)
...2 Fine-tuning parameters with supervised data
Very useful when large unsupervised data is
available
But…
It is a greedy approach
Not valid for on-line learning scenarios
Not as much useful with small data sets
Motivation
Goals
Train a supervised model
Layer-wise conditioned by unsupervised loss
Improving gradient flow
Learning better features
Every layer parameters should be
Useful for the global supervised task
Able to reconstruct their input (Auto-Encoders)
Motivation
Related works
Is Joint Training Better for Deep
Auto-Encoders?, by Y. Zhou et al (2015) paper
at arXiv (fine-tuning stage for supervision)
Preliminary work done by P. Vincent et al
(2010), Stacked denoising autoencoders:
Learning useful representations in a deep
network with a local denoising criterion
Deep learning via Semi-Supervised Embedding,
by Weston et al (2008) ICML paper
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Method description
How to do it
Risk = Supervised Loss + Sumlayer( Unsup. Loss )
R(θ, D) =
1
|D|
∑
(x,y)∈D
[
λ0Ls(F(x; θ), y) +
H∑
k=1
λkU(k)
]
+ ϵΩ(θ)
U(k)
= Lu(Ak(h(k−1)
; θ), h(k−1)
) for 1 ≤ k ≤ H
λk ≥ 0
F(x; θ) is the MLP model
A(h(k−1)
; θ) is a Denoising AE model
H is the number of hidden layers
h(0)
= x
Method description
How to do it
Risk = Supervised Loss + Sumlayer( Unsup. Loss )
R(θ, D) =
1
|D|
∑
(x,y)∈D
[
λ0Ls(F(x; θ), y)+
H∑
k=1
λkU(k)
]
+ϵΩ(θ)
λ vector mixes all the components
It should be updated every iteration
Starting focused at unsupervised criteria
Ending focused at supervised criterion
Method description
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
λ Update Policies I
A λ update policy indicates how to change λ
vector every iteration
The supervised part (λ0) can be fixed to 1.
The unsupervised part should be important
during first iterations
Loosing focus while training
Being insignificant at the end
A greedy exponential decay (GED) will suffice
λ0(t) = 1 ; λk(t) = Λγt
Being the constants Λ > 0 and γ ∈ [0, 1]
λ Update Policies II
Exponential decay is the most simple approach, but
other policies are possible
Ratio between loss functions
Ratio between gradients at each layer
A combination of them
…
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Experiments and Results (MNIST) I
Benchmark with MNIST dataset
Logistic activation functions, softmax output
Cross-entropy for supervised and unsupervised
losses
Classification error as evaluation measure
Effect of MLP topology and Λ initial value λk
Sensitivity study of γ exponential decay term
Comparison with other literature models
Experiments and Results (MNIST) II
Test error (%) plus 95% confidence interval
Data Set SAE-3 SDAE-3 GED-3
MNIST 1.40±0.23 1.28±0.22 1.22±0.22
basic 3.46±0.16 2.84±0.15 2.72±0.14
SAE-3 and SDAE-3 taken from Vincent et al (2010)
Experiments and Results (MNIST) III
Hyper-parameters grid search (Validation set)
MNIST
256
512
1024
2048
Depth=1
256
512
1024
2048
Depth=2
256
512
1024
2048
Depth=3
256
512
1024
2048
Depth=4
256
512
1024
2048
0.8
1
1.2
1.4
1.6
1.8
2
Error(%)
Layer size
Depth=5
Λ values
0.00000
0.00001
0.00100
0.20000
0.60000
1.00000
3.00000
5.00000
Experiments and Results (MNIST) IV
γ exponential decay term (Validation set)
MNIST
1.00
1.10
1.20
1.30
1.40
1.50
0.5 0.6 0.7 0.8 0.9 1
Validationerror(%)
Decay (γ)
1.00
1.10
1.20
1.30
1.40
1.50
0.997 0.998 0.999 1
Validationerror(%)
Decay (γ)
Detail
Experiments and Results (MNIST) V
First layer filters (16 of 2048 units)
Only supervised γ = 0.999 γ = 1.000
Experiments and Results (SML2010) I
SML2010 UCI data set: indoor temperature
forecasting
Logistic hidden act. functions, linear output
48 inputs (12 hours) and 12 outputs (3 hours)
Mean Square Error function supervised loss
Cross-entropy unsupervised losses
Mean Absolute Error (MAE) for evaluation
Compared MLPs w/wout unsupervised losses
Experiments and Results (SML2010) II
Depth Size MAE Λ = 0 MAE Λ = 1
3 32 0.1322 0.1266
3 64 0.1350 0.1257
3 128 0.1308 0.1292
3 512 0.6160 0.1312
Validation set results. In red statistically significant
improvements.
Λ = 0 is training DNN only with supervised loss.
Λ = 1 is training DNN with supervised and unsupervised losses.
Experiments and Results (SML2010) III
Test result for 3 layers model with 64 neurons
per hidden layer:
0.1274 when Λ = 0
0.1177 when Λ = 1
Able to train up to 10 layers DNNs with 64
hidden units per layer when Λ = 1
MAE in range [0.1274, 0.1331]
Λ = 0 is training DNN only with supervised loss.
Λ = 1 is training DNN with supervised and unsupervised losses.
Outline
...1 Motivation
...2 Method description
...3 λ Update Policies
...4 Experiments and Results
MNIST
SML2010 temperature forecasting
...5 Conclusions and Future Work
Conclusions
One-stage training of deep models combining
supervised and unsupervised loss functions
Comparable with greedy layer-wise
unsupervised pre-training + fine-tuning
The approach is successful training deep MLPs
with logistic activations
Decaying unsupervised loss during training is
crucial
Time-series results encourage further research
of this idea into on-line learning scenarios
Future Work
Better filters and models? Further research
needed
Study the effect using ReLU activations
Study other alternatives to exponential
decaying of unsupervised loss: dynamic
adaptation
The End
Thanks for your attention!!!
Questions?

More Related Content

Similar to Integration of Unsupervised and Supervised Criteria for DNNs Training

Don't forget, there is more than forgetting: new metrics for Continual Learni...
Don't forget, there is more than forgetting: new metrics for Continual Learni...Don't forget, there is more than forgetting: new metrics for Continual Learni...
Don't forget, there is more than forgetting: new metrics for Continual Learni...
Vincenzo Lomonaco
 
A Tactile P300-based Brain-Computer Interface Accuracy Improvement
A Tactile P300-based Brain-Computer Interface Accuracy ImprovementA Tactile P300-based Brain-Computer Interface Accuracy Improvement
A Tactile P300-based Brain-Computer Interface Accuracy Improvement
Takumi Kodama
 
Full–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Full–body Tactile P300–based Brain–computer Interface Accuracy RefinementFull–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Full–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Takumi Kodama
 
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Ricardo Guerrero Gómez-Olmedo
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Faster Interleaved Modular Multiplier Based on Sign Detection
Faster Interleaved Modular Multiplier Based on Sign DetectionFaster Interleaved Modular Multiplier Based on Sign Detection
Faster Interleaved Modular Multiplier Based on Sign Detection
VLSICS Design
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Rafael Nogueras
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
Sylvain Ferrandiz
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation
SaravanakumarSekar4
 
Expert estimation from Multimodal Features
Expert estimation from Multimodal FeaturesExpert estimation from Multimodal Features
Expert estimation from Multimodal Features
Xavier Ochoa
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
Young-Geun Choi
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
Piyush Agarwal
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
harmonylab
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Université de Liège (ULg)
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 

Similar to Integration of Unsupervised and Supervised Criteria for DNNs Training (20)

Don't forget, there is more than forgetting: new metrics for Continual Learni...
Don't forget, there is more than forgetting: new metrics for Continual Learni...Don't forget, there is more than forgetting: new metrics for Continual Learni...
Don't forget, there is more than forgetting: new metrics for Continual Learni...
 
A Tactile P300-based Brain-Computer Interface Accuracy Improvement
A Tactile P300-based Brain-Computer Interface Accuracy ImprovementA Tactile P300-based Brain-Computer Interface Accuracy Improvement
A Tactile P300-based Brain-Computer Interface Accuracy Improvement
 
Full–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Full–body Tactile P300–based Brain–computer Interface Accuracy RefinementFull–body Tactile P300–based Brain–computer Interface Accuracy Refinement
Full–body Tactile P300–based Brain–computer Interface Accuracy Refinement
 
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Faster Interleaved Modular Multiplier Based on Sign Detection
Faster Interleaved Modular Multiplier Based on Sign DetectionFaster Interleaved Modular Multiplier Based on Sign Detection
Faster Interleaved Modular Multiplier Based on Sign Detection
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation
 
Expert estimation from Multimodal Features
Expert estimation from Multimodal FeaturesExpert estimation from Multimodal Features
Expert estimation from Multimodal Features
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
 
report
reportreport
report
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 

More from Francisco Zamora-Martinez

ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
Francisco Zamora-Martinez
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Francisco Zamora-Martinez
 
F-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural NetworksF-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural Networks
Francisco Zamora-Martinez
 
Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...
Francisco Zamora-Martinez
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
Francisco Zamora-Martinez
 
Adding morphological information to a connectionist Part-Of-Speech tagger
Adding morphological information  to a connectionist Part-Of-Speech taggerAdding morphological information  to a connectionist Part-Of-Speech tagger
Adding morphological information to a connectionist Part-Of-Speech tagger
Francisco Zamora-Martinez
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Francisco Zamora-Martinez
 
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
Francisco Zamora-Martinez
 
Efficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural NetworksEfficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural Networks
Francisco Zamora-Martinez
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
Francisco Zamora-Martinez
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
Francisco Zamora-Martinez
 

More from Francisco Zamora-Martinez (11)

ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
 
F-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural NetworksF-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural Networks
 
Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
 
Adding morphological information to a connectionist Part-Of-Speech tagger
Adding morphological information  to a connectionist Part-Of-Speech taggerAdding morphological information  to a connectionist Part-Of-Speech tagger
Adding morphological information to a connectionist Part-Of-Speech tagger
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
 
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
 
Efficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural NetworksEfficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural Networks
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
 

Recently uploaded

Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 

Recently uploaded (20)

Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 

Integration of Unsupervised and Supervised Criteria for DNNs Training

  • 1. Integration of Unsupervised and Supervised Criteria for DNNs Training International Conf. on Artificial Neural Networks Francisco Zamora-Martínez, Francisco Javier Muñoz-Almaraz, Juan Pardo Departamento de ciencias físicas, matemáticas y de la computación Universidad CEU Cardenal Herrera September 7th, 2016
  • 2. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 3. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 4. Motivation Greedy layer-wise unsupervised pre-training is successful training logistic MLPs: Two training stages ...1 Pre-training with unsupervised data (SAEs or RBMs) ...2 Fine-tuning parameters with supervised data Very useful when large unsupervised data is available But… It is a greedy approach Not valid for on-line learning scenarios Not as much useful with small data sets
  • 5. Motivation Goals Train a supervised model Layer-wise conditioned by unsupervised loss Improving gradient flow Learning better features Every layer parameters should be Useful for the global supervised task Able to reconstruct their input (Auto-Encoders)
  • 6. Motivation Related works Is Joint Training Better for Deep Auto-Encoders?, by Y. Zhou et al (2015) paper at arXiv (fine-tuning stage for supervision) Preliminary work done by P. Vincent et al (2010), Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion Deep learning via Semi-Supervised Embedding, by Weston et al (2008) ICML paper
  • 7. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 8. Method description How to do it Risk = Supervised Loss + Sumlayer( Unsup. Loss ) R(θ, D) = 1 |D| ∑ (x,y)∈D [ λ0Ls(F(x; θ), y) + H∑ k=1 λkU(k) ] + ϵΩ(θ) U(k) = Lu(Ak(h(k−1) ; θ), h(k−1) ) for 1 ≤ k ≤ H λk ≥ 0 F(x; θ) is the MLP model A(h(k−1) ; θ) is a Denoising AE model H is the number of hidden layers h(0) = x
  • 9. Method description How to do it Risk = Supervised Loss + Sumlayer( Unsup. Loss ) R(θ, D) = 1 |D| ∑ (x,y)∈D [ λ0Ls(F(x; θ), y)+ H∑ k=1 λkU(k) ] +ϵΩ(θ) λ vector mixes all the components It should be updated every iteration Starting focused at unsupervised criteria Ending focused at supervised criterion
  • 11. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 12. λ Update Policies I A λ update policy indicates how to change λ vector every iteration The supervised part (λ0) can be fixed to 1. The unsupervised part should be important during first iterations Loosing focus while training Being insignificant at the end A greedy exponential decay (GED) will suffice λ0(t) = 1 ; λk(t) = Λγt Being the constants Λ > 0 and γ ∈ [0, 1]
  • 13. λ Update Policies II Exponential decay is the most simple approach, but other policies are possible Ratio between loss functions Ratio between gradients at each layer A combination of them …
  • 14. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 15. Experiments and Results (MNIST) I Benchmark with MNIST dataset Logistic activation functions, softmax output Cross-entropy for supervised and unsupervised losses Classification error as evaluation measure Effect of MLP topology and Λ initial value λk Sensitivity study of γ exponential decay term Comparison with other literature models
  • 16. Experiments and Results (MNIST) II Test error (%) plus 95% confidence interval Data Set SAE-3 SDAE-3 GED-3 MNIST 1.40±0.23 1.28±0.22 1.22±0.22 basic 3.46±0.16 2.84±0.15 2.72±0.14 SAE-3 and SDAE-3 taken from Vincent et al (2010)
  • 17. Experiments and Results (MNIST) III Hyper-parameters grid search (Validation set) MNIST 256 512 1024 2048 Depth=1 256 512 1024 2048 Depth=2 256 512 1024 2048 Depth=3 256 512 1024 2048 Depth=4 256 512 1024 2048 0.8 1 1.2 1.4 1.6 1.8 2 Error(%) Layer size Depth=5 Λ values 0.00000 0.00001 0.00100 0.20000 0.60000 1.00000 3.00000 5.00000
  • 18. Experiments and Results (MNIST) IV γ exponential decay term (Validation set) MNIST 1.00 1.10 1.20 1.30 1.40 1.50 0.5 0.6 0.7 0.8 0.9 1 Validationerror(%) Decay (γ) 1.00 1.10 1.20 1.30 1.40 1.50 0.997 0.998 0.999 1 Validationerror(%) Decay (γ) Detail
  • 19. Experiments and Results (MNIST) V First layer filters (16 of 2048 units) Only supervised γ = 0.999 γ = 1.000
  • 20. Experiments and Results (SML2010) I SML2010 UCI data set: indoor temperature forecasting Logistic hidden act. functions, linear output 48 inputs (12 hours) and 12 outputs (3 hours) Mean Square Error function supervised loss Cross-entropy unsupervised losses Mean Absolute Error (MAE) for evaluation Compared MLPs w/wout unsupervised losses
  • 21. Experiments and Results (SML2010) II Depth Size MAE Λ = 0 MAE Λ = 1 3 32 0.1322 0.1266 3 64 0.1350 0.1257 3 128 0.1308 0.1292 3 512 0.6160 0.1312 Validation set results. In red statistically significant improvements. Λ = 0 is training DNN only with supervised loss. Λ = 1 is training DNN with supervised and unsupervised losses.
  • 22. Experiments and Results (SML2010) III Test result for 3 layers model with 64 neurons per hidden layer: 0.1274 when Λ = 0 0.1177 when Λ = 1 Able to train up to 10 layers DNNs with 64 hidden units per layer when Λ = 1 MAE in range [0.1274, 0.1331] Λ = 0 is training DNN only with supervised loss. Λ = 1 is training DNN with supervised and unsupervised losses.
  • 23. Outline ...1 Motivation ...2 Method description ...3 λ Update Policies ...4 Experiments and Results MNIST SML2010 temperature forecasting ...5 Conclusions and Future Work
  • 24. Conclusions One-stage training of deep models combining supervised and unsupervised loss functions Comparable with greedy layer-wise unsupervised pre-training + fine-tuning The approach is successful training deep MLPs with logistic activations Decaying unsupervised loss during training is crucial Time-series results encourage further research of this idea into on-line learning scenarios
  • 25. Future Work Better filters and models? Further research needed Study the effect using ReLU activations Study other alternatives to exponential decaying of unsupervised loss: dynamic adaptation
  • 26. The End Thanks for your attention!!! Questions?