SlideShare a Scribd company logo
1 of 18
Autoencoders
• Supervised learning uses explicit labels/correct output
in order to train a network.
• E.g., classification of images.
• Unsupervised learning relies on data only.
• E.g., CBOW and skip-gram word embeddings: the output is
determined implicitly from word order in the input data.
• Key point is to produce a useful embedding of words.
• The embedding encodes structure such as word similarity and
some relationships.
• Still need to define a loss – this is an implicit supervision.
Autoencoders
• Autoencoders are designed to reproduce their
input, especially for images.
• Key point is to reproduce the input from a learned
encoding.
https://www.edureka.co/blog/autoencoders-tutorial/
Autoencoders
• Compare PCA/SVD
• PCA takes a collection of vectors (images) and produces
a usually smaller set of vectors that can be used to
approximate the input vectors via linear combination.
• Very efficient for certain applications.
• Fourier and wavelet compression is similar.
• Neural network autoencoders
• Can learn nonlinear dependencies
• Can use convolutional layers
• Can use transfer learning
https://www.edureka.co/blog/autoencoders-tutorial/
Autoencoders: structure
• Encoder: compress input into a latent-space of
usually smaller dimension. h = f(x)
• Decoder: reconstruct input from the latent space.
r = g(f(x)) with r as close to x as possible
https://towardsdatascience.com/deep-inside-autoencoders-7e41f319999f
Autoencoders: applications
• Denoising: input clean image + noise and train to
reproduce the clean image.
https://www.edureka.co/blog/autoencoders-tutorial/
Autoencoders: Applications
• Image colorization: input black and white and train
to produce color images
https://www.edureka.co/blog/autoencoders-tutorial/
Autoencoders: Applications
• Watermark removal
https://www.edureka.co/blog/autoencoders-tutorial/
Properties of Autoencoders
• Data-specific: Autoencoders are only able to
compress data similar to what they have been
trained on.
• Lossy: The decompressed outputs will be degraded
compared to the original inputs.
• Learned automatically from examples: It is easy to
train specialized instances of the algorithm that will
perform well on a specific type of input.
https://www.edureka.co/blog/autoencoders-tutorial/
Capacity
• As with other NNs, overfitting is a problem when
capacity is too large for the data.
• Autoencoders address this through some
combination of:
• Bottleneck layer – fewer degrees of freedom than in
possible outputs.
• Training to denoise.
• Sparsity through regularization.
• Contractive penalty.
Bottleneck layer (undercomplete)
• Suppose input images are nxn and the latent space
is m < nxn.
• Then the latent space is not sufficient to reproduce
all images.
• Needs to learn an encoding that captures the
important features in training data, sufficient for
approximate reconstruction.
Simple bottleneck layer in Keras
• input_img = Input(shape=(784,))
• encoding_dim = 32
• encoded = Dense(encoding_dim, activation='relu')(input_img)
• decoded = Dense(784, activation='sigmoid')(encoded)
• autoencoder = Model(input_img, decoded)
• Maps 28x28 images into a 32 dimensional vector.
• Can also use more layers and/or convolutions.
https://blog.keras.io/building-autoencoders-in-keras.html
Denoising autoencoders
• Basic autoencoder trains to minimize the loss
between x and the reconstruction g(f(x)).
• Denoising autoencoders train to minimize the loss
between x and g(f(x+w)), where w is random noise.
• Same possible architectures, different training data.
• Kaggle has a dataset on damaged documents.
https://blog.keras.io/building-autoencoders-in-keras.html
Denoising autoencoders
• Denoising autoencoders can’t simply memorize the
input output relationship.
• Intuitively, a denoising autoencoder learns a
projection from a neighborhood of our training
data back onto the training data.
https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf
Sparse autoencoders
• Construct a loss function to penalize activations
within a layer.
• Usually regularize the weights of a network, not the
activations.
• Individual nodes of a trained model that activate
are data-dependent.
• Different inputs will result in activations of different
nodes through the network.
• Selectively activate regions of the network
depending on the input data.
https://www.jeremyjordan.me/autoencoders/
Sparse autoencoders
• Construct a loss function to penalize activations the
network.
• L1 Regularization: Penalize the absolute value of the
vector of activations a in layer h for observation I
• KL divergence: Use cross-entropy between average
activation and desired activation
https://www.jeremyjordan.me/autoencoders/
Contractive autoencoders
• Arrange for similar inputs to have similar activations.
• I.e., the derivative of the hidden layer activations are
small with respect to the input.
• Denoising autoencoders make the reconstruction function
(encoder+decoder) resist small perturbations of the input
• Contractive autoencoders make the feature extraction
function (ie. encoder) resist infinitesimal perturbations of
the input.
https://www.jeremyjordan.me/autoencoders/
Contractive autoencoders
• Contractive autoencoders make the feature
extraction function (ie. encoder) resist infinitesimal
perturbations of the input.
https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf
Autoencoders
• Both the denoising and contractive autoencoder can
perform well
• Advantage of denoising autoencoder : simpler to implement-
requires adding one or two lines of code to regular
autoencoder-no need to compute Jacobian of hidden layer
• Advantage of contractive autoencoder : gradient is
deterministic -can use second order optimizers (conjugate
gradient, LBFGS, etc.)-might be more stable than denoising
autoencoder, which uses a sampled gradient
• To learn more on contractive autoencoders:
• Contractive Auto-Encoders: Explicit Invariance During Feature
Extraction. Salah Rifai, Pascal Vincent, Xavier Muller, Xavier
Glorot et Yoshua Bengio, 2011.
https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf

More Related Content

Similar to Learn Autoencoders in 40 Characters

A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...Lola Burgueño
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015Beatrice van Eden
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with EpsilonSina Madani
 
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTSNEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTShemasubbu08
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptxasdq4
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceESUG
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPrashant Rane
 
Web development basics (Part-7)
Web development basics (Part-7)Web development basics (Part-7)
Web development basics (Part-7)Rajat Pratap Singh
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningVishwas Lele
 

Similar to Learn Autoencoders in 40 Characters (20)

Cc module 3.pptx
Cc module 3.pptxCc module 3.pptx
Cc module 3.pptx
 
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with Epsilon
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTSNEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
Scheduling Thread
Scheduling  ThreadScheduling  Thread
Scheduling Thread
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Web development basics (Part-7)
Web development basics (Part-7)Web development basics (Part-7)
Web development basics (Part-7)
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Learn Autoencoders in 40 Characters

  • 1. Autoencoders • Supervised learning uses explicit labels/correct output in order to train a network. • E.g., classification of images. • Unsupervised learning relies on data only. • E.g., CBOW and skip-gram word embeddings: the output is determined implicitly from word order in the input data. • Key point is to produce a useful embedding of words. • The embedding encodes structure such as word similarity and some relationships. • Still need to define a loss – this is an implicit supervision.
  • 2. Autoencoders • Autoencoders are designed to reproduce their input, especially for images. • Key point is to reproduce the input from a learned encoding. https://www.edureka.co/blog/autoencoders-tutorial/
  • 3. Autoencoders • Compare PCA/SVD • PCA takes a collection of vectors (images) and produces a usually smaller set of vectors that can be used to approximate the input vectors via linear combination. • Very efficient for certain applications. • Fourier and wavelet compression is similar. • Neural network autoencoders • Can learn nonlinear dependencies • Can use convolutional layers • Can use transfer learning https://www.edureka.co/blog/autoencoders-tutorial/
  • 4. Autoencoders: structure • Encoder: compress input into a latent-space of usually smaller dimension. h = f(x) • Decoder: reconstruct input from the latent space. r = g(f(x)) with r as close to x as possible https://towardsdatascience.com/deep-inside-autoencoders-7e41f319999f
  • 5. Autoencoders: applications • Denoising: input clean image + noise and train to reproduce the clean image. https://www.edureka.co/blog/autoencoders-tutorial/
  • 6. Autoencoders: Applications • Image colorization: input black and white and train to produce color images https://www.edureka.co/blog/autoencoders-tutorial/
  • 7. Autoencoders: Applications • Watermark removal https://www.edureka.co/blog/autoencoders-tutorial/
  • 8. Properties of Autoencoders • Data-specific: Autoencoders are only able to compress data similar to what they have been trained on. • Lossy: The decompressed outputs will be degraded compared to the original inputs. • Learned automatically from examples: It is easy to train specialized instances of the algorithm that will perform well on a specific type of input. https://www.edureka.co/blog/autoencoders-tutorial/
  • 9. Capacity • As with other NNs, overfitting is a problem when capacity is too large for the data. • Autoencoders address this through some combination of: • Bottleneck layer – fewer degrees of freedom than in possible outputs. • Training to denoise. • Sparsity through regularization. • Contractive penalty.
  • 10. Bottleneck layer (undercomplete) • Suppose input images are nxn and the latent space is m < nxn. • Then the latent space is not sufficient to reproduce all images. • Needs to learn an encoding that captures the important features in training data, sufficient for approximate reconstruction.
  • 11. Simple bottleneck layer in Keras • input_img = Input(shape=(784,)) • encoding_dim = 32 • encoded = Dense(encoding_dim, activation='relu')(input_img) • decoded = Dense(784, activation='sigmoid')(encoded) • autoencoder = Model(input_img, decoded) • Maps 28x28 images into a 32 dimensional vector. • Can also use more layers and/or convolutions. https://blog.keras.io/building-autoencoders-in-keras.html
  • 12. Denoising autoencoders • Basic autoencoder trains to minimize the loss between x and the reconstruction g(f(x)). • Denoising autoencoders train to minimize the loss between x and g(f(x+w)), where w is random noise. • Same possible architectures, different training data. • Kaggle has a dataset on damaged documents. https://blog.keras.io/building-autoencoders-in-keras.html
  • 13. Denoising autoencoders • Denoising autoencoders can’t simply memorize the input output relationship. • Intuitively, a denoising autoencoder learns a projection from a neighborhood of our training data back onto the training data. https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf
  • 14. Sparse autoencoders • Construct a loss function to penalize activations within a layer. • Usually regularize the weights of a network, not the activations. • Individual nodes of a trained model that activate are data-dependent. • Different inputs will result in activations of different nodes through the network. • Selectively activate regions of the network depending on the input data. https://www.jeremyjordan.me/autoencoders/
  • 15. Sparse autoencoders • Construct a loss function to penalize activations the network. • L1 Regularization: Penalize the absolute value of the vector of activations a in layer h for observation I • KL divergence: Use cross-entropy between average activation and desired activation https://www.jeremyjordan.me/autoencoders/
  • 16. Contractive autoencoders • Arrange for similar inputs to have similar activations. • I.e., the derivative of the hidden layer activations are small with respect to the input. • Denoising autoencoders make the reconstruction function (encoder+decoder) resist small perturbations of the input • Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of the input. https://www.jeremyjordan.me/autoencoders/
  • 17. Contractive autoencoders • Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of the input. https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf
  • 18. Autoencoders • Both the denoising and contractive autoencoder can perform well • Advantage of denoising autoencoder : simpler to implement- requires adding one or two lines of code to regular autoencoder-no need to compute Jacobian of hidden layer • Advantage of contractive autoencoder : gradient is deterministic -can use second order optimizers (conjugate gradient, LBFGS, etc.)-might be more stable than denoising autoencoder, which uses a sampled gradient • To learn more on contractive autoencoders: • Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot et Yoshua Bengio, 2011. https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf