SlideShare a Scribd company logo
1 of 24
Deep Learning
Chapter 14 - Auto Encoders
Ashish Kumar
Twitter: @ashish_fagna
LinkedIn: https://www.linkedin.com/in/ashkmr1
Using presentation by
Ike Okonkwo - @ikeondata
14 Autoencoders
14 Autoencoders
AE is neural network that is trained to attempt to copy its input to its output
It has a hidden layer [h] that encodes the input [x]. H = f(wx + bias) and also a
decoder
AE are restricted in ways that allow them to copy the inputs only approximately
and so it is forced to prioritize which aspects of the input should be copied which
can be great for feature extraction
AE traditionally used for dimensionality reduction or feature learning
AE can be considered a special case of feedforward networks and can be trained
with similar techniques - - MiniBatch GD. They’re also trained by recirculation
14.1 Undercomplete Autoencoders
Undercomplete AE is one in which the dimension of Hidden layer [h] is less than
the dimensions of Input layer [x].
We are typically not interested in the AE output but the hidden layer [h]
[h] is typically constrained to smaller dimension than [x] which is called
Undercomplete. This forces an AE to capture only the salient features
If the AE is allowed too much capacity, then it just learn to copy the inputs with
extracting useful information about the distribution of the data
14.2 Regularized Autoencoders
AE with hidden layers with dimensions equal to or greater than the input are called Overcomplete.
Regularized autoencoders provide the ability train a architecture of autoencoder successfully,
choosing the code dimension and the capacity of the encoder and decoder based on the
complexity of distribution to be modeled.
Rather than limiting the model capacity by keeping the encoder and decoder shallow and the code
size small, regularized autoencoders use a loss function that encourages the model to have other
properties besides the ability to copy its input to its output.
These other properties include sparsity of the representation, smallness of the derivative of the
representation, and robustness to noise or to missing inputs.
A regularized autoencoder can be nonlinear and overcomplete but still learn something useful
about the data distribution, even if the model capacity is great enough to learn a identity function.
14.2.1 Sparse Autoencoders
We can also think of the penalty as a regularizer term added to a feedforward network whose main task is to copy
inputs to outputs and perform some supervised tasks
Generative models are used in machine learning for either modeling data directly (i.e., modeling observations
drawn from a probability density function), or as an intermediate step to forming a conditional probability density
function.
Another way to think about sparse AE framework is approximating maximum likelihood of a generative model*
with latent variables
14.2.2 Denoising Autoencoders
A denoising AE (DAE) is an AE that receives a corrupted input [x^hat ] and then try to
reconstruct the original inputs
We use a corruption process C ( x^hat | x) which represents a conditional distribution over
the corrupted samples x^hat given the original input [x]. The AE then learn a reconstruction
distribution p (x | x^hat) for training pairs (x | x^hat)
Typically, we can perform gradient based optimization and as long as the encoder is
deterministic, the denoising AE is a feed forward network and can be trained with the same
techniques as other similar networks
Denoising AE shows how useful byproducts emerge by just reducing the reconstruction
error. They also show how high capacity models may be used as autoencoders and still
learn useful features without learning the identity function
14.2.3 Regularizing by Penalizing Derivatives
Another strategy for regularizing autoencoders is to use a penalty gamma, as in
sparse autoencoders but with a different form .
This forces the model to be invariant to slight changes in the input vector [x]. Since
this is only applies to training examples, it forces the AE to capture useful
information about the training distribution
This is called a contractive AE
14.3 Representational Power, Layer Size and Depth
AE are usually trained using single layer encoders and decoders but we can also
make the hidden layer [h] deep
Since the encoder and decoder are both feed forward networks, they both benefit
from deep architectures.
The major advantage of deep architectures for feed forward neural networks is
that they can represent an approximation of any function to an arbitrary degree of
accuracy
A deep encoder can also approximate any mapping from the input [x] to hidden
layer [h] given enough hidden units. Depth exponentially reduces training cost and
amount of training data needed and achieves better compression.
14.4 Stochastic Encoders and Decoders
AEs are essentially feed forward neural networks
For a stochastic AE, the encoder and decoder are not just simple functions but
sampled from a distribution. p(h | x) for the encoder and p (x | h) for the decoder
14.5.1 Estimating the Score
Score matching is an alternative to maximum likelihood and provides an
alternative to probability distribution by encouraging the model to have the same
score as the data distribution at every training point in [x]
For AEs, learning the gradient field is one way of learning the structure of p(data)
Denoising training of a specific kind of AE (sigmoid hidden units , linear
reconstruction units) is equivalent to training an RBM (restricted Boltzmann
machine , basic neural network) with Gaussian visible units .
14.5.1 Estimating the Score
Score matching applied to RBMs yields a cost function that is identical to the
reconstruction error combined with a regularization term similar to the contractive
penalty of the CAE
14.5.2 Historical Perspective
The idea of using MLP (multilayer perceptron) to denoise goes back to the 80’s
A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists
of at least three layers of nodes. Except for the input nodes, each node is a neuron that uses a
nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation
for training
Denoising encoders are in some MLP trained to denoise. But the term “Denoising AE” refers to a
model that learn not only to denoise its inputs but also learn a good internal representation (useful
features)
The learned representation can be used to pretrain a deeper unsupervised / supervised network
The motivation for Denoising AE was to allow the learning of a very high capacity encoder while
preventing the encoder / decoder from learning an identity function
14.6 Learning Manifolds with Autoencoders
Most learning algorithms including AE exploit the idea that data concentrates around a
low dimension manifold or learning surface. AEs aims to learn the structure of this
manifold
All AE training procedure involve a compromise between two forces
- Learning a representation [h] of a training example [x] such that [x] can be
reconstructed via [h] through a decoder
- Satisfying the constraint of regularization penalty. This is usually an architectural
constraint that limits the capacity of the AE These technique prefer methods that
are less sensitive to the input
The two forces force the hidden representation to capture information about the
structure of the data generating distribution
14.6 Learning Manifolds with Autoencoders
The AE can afford to represent only the variations needed for reconstruction
The manifold captures a local coordinate system if the generating distribution
concentrates near a low dimensional manifold. Hence the encoder learns a
mapping from [x] to representative space that is only sensitive to changes along
the manifold and not changes orthogonal to the manifold.
The AE recovers the manifold structure if the reconstruction function is insensitive
to the perturbation in the input -see example
Most of the ML research of learning nonlinear manifolds has focused on non
parametric methods based on the nearest-neighbour graph
14.6 Learning Manifolds with Autoencoders
- Discuss locally linear Gaussian patches
AI problems can usually have very complicated structures or manifolds that can be
difficult to capture from local interpolation
14.7 Contractive Autoencoders
The contractive AE introduces a regularizer on the [h] , the hidden layer
There is a connection between Denoising AE and Contractive AE. In the limit of
small Gaussian input noise the denoising reconstruction error is equivalent to the
contractive penalty on the reconstruction function.
That is Denoising AEs make the reconstruction function resist small but finite
perturbations of the input while contractive AEs make the feature extraction
function resist small perturbations to the input
The CAE maps a neighborhood of input points to smaller neighborhood of output
points,hence contracting the input.
14.7 Contractive Autoencoders
Regularized AE learn manifolds by balancing opposing forces. For CAEs these
are reconstruction error and the contractive penalty. Reconstruction error alone
would allow the CAE to learn an identity function and contractive penalty alone will
allow the CAE to learn features that are constant wrt [x]
A good strategy to train AEs is to train a series of single layer AEs each trained to
reconstruct the previous AEs hidden layer. The composition of these AEs forms
a deep AE. Because each layer was separately trained to be contractive , the
deep AE is contractive as well which is different from training the full AE with
penalty
The contractive penalty can also obtain useless results unless corrective action is
taken
14.8 Predictive Sparse Decomposition
PSD is hybrid model of sparse coding and parametric AE. A parametric AE is
trained to predict the output of iterative inference and have been applied to
unsupervised feature learning for object recognition in images/ video and audio
PSD consists an encoder / decoder which are both parametric.
The training algorithm alternates between minimizing wrt [h] and minimizing wrt to
the model parameters
PSD regularizes the decoder to use parameters for f(x) can infer good values
For PSD the parametric encoder [f] is used to compute the learned features when
the model is deployed. Evaluating [f] is computationally inexpensive vs inferring [h]
via gradient descent. PSDs can be stacked and used to initialize a deep network
14.9 Applications of Autoencoders
AE have successfully applied to recommendation systems, dimensionality
reduction and information retrieval.
The learned representation in [h] were qualitatively easier to interpret and relate to
the underlying categories, with those categories manifesting as clusters
Lower dimensional representation can improve performance on classification tasks
since they consume less memory cheaper to run
One task that benefits greatly from dimensionality reduction is information retrieval
since search can become extremely efficient in low dimension spaces.
14.9 Applications of Autoencoders
We can use DR (dimensional reduction) to produce [h] that is low dimension and
binary and then store the entries in database mapping binary code vectors to
entries (lookup)
Searching of the hash table is very efficient. This approach to IR (Information
Retrieval) via DM (data mining) and binarization is called semantic hashing.
To produce binary codes for semantic hashing, we typically use an encoder with
sigmoids (as activation function) on the final layer.
Thanks
Special thanks to Laura Montoya and Accel.ai !
Ashish Kumar
ashish.fagna@gmail.com
Twitter: @ashish_fagna
LinkedIn: https://www.linkedin.com/in/ashkmr1

More Related Content

What's hot (20)

Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Deep learning
Deep learningDeep learning
Deep learning
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Neural network
Neural networkNeural network
Neural network
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Activation function
Activation functionActivation function
Activation function
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 

Similar to Deep Learning Chapter on Autoencoders

Autoencoder.pdf
Autoencoder.pdfAutoencoder.pdf
Autoencoder.pdfP78097082
 
Complier design
Complier design Complier design
Complier design shreeuva
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...IOSR Journals
 
Downloadfile
DownloadfileDownloadfile
Downloadfilegaur07av
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksVauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksMumtaz Hannah Vauhkonen
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptxthanhdowork
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learningaliaKhan71
 
MPerceptron
MPerceptronMPerceptron
MPerceptronbutest
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksParrotAI
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networksFörderverein Technische Fakultät
 

Similar to Deep Learning Chapter on Autoencoders (20)

Autoencoder.pdf
Autoencoder.pdfAutoencoder.pdf
Autoencoder.pdf
 
Neural network
Neural networkNeural network
Neural network
 
Complier design
Complier design Complier design
Complier design
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Downloadfile
DownloadfileDownloadfile
Downloadfile
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksVauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Y34147151
Y34147151Y34147151
Y34147151
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learning
 
Real-Time Face Tracking with GPU Acceleration
Real-Time Face Tracking with GPU AccelerationReal-Time Face Tracking with GPU Acceleration
Real-Time Face Tracking with GPU Acceleration
 
CUDA Accelerated Face Recognition
CUDA Accelerated Face RecognitionCUDA Accelerated Face Recognition
CUDA Accelerated Face Recognition
 
MPerceptron
MPerceptronMPerceptron
MPerceptron
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networks
 

More from Entrepreneur / Startup

R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networksEntrepreneur / Startup
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
Machine Learning Algorithms in Enterprise Applications
Machine Learning Algorithms in Enterprise ApplicationsMachine Learning Algorithms in Enterprise Applications
Machine Learning Algorithms in Enterprise ApplicationsEntrepreneur / Startup
 
Build a Neural Network for ITSM with TensorFlow
Build a Neural Network for ITSM with TensorFlowBuild a Neural Network for ITSM with TensorFlow
Build a Neural Network for ITSM with TensorFlowEntrepreneur / Startup
 
Building chat bots using ai platforms (wit.ai or api.ai) in nodejs
Building chat bots using ai platforms (wit.ai or api.ai) in nodejsBuilding chat bots using ai platforms (wit.ai or api.ai) in nodejs
Building chat bots using ai platforms (wit.ai or api.ai) in nodejsEntrepreneur / Startup
 

More from Entrepreneur / Startup (13)

R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Machine Learning Algorithms in Enterprise Applications
Machine Learning Algorithms in Enterprise ApplicationsMachine Learning Algorithms in Enterprise Applications
Machine Learning Algorithms in Enterprise Applications
 
OpenAI Gym & Universe
OpenAI Gym & UniverseOpenAI Gym & Universe
OpenAI Gym & Universe
 
Build a Neural Network for ITSM with TensorFlow
Build a Neural Network for ITSM with TensorFlowBuild a Neural Network for ITSM with TensorFlow
Build a Neural Network for ITSM with TensorFlow
 
Build an AI based virtual agent
Build an AI based virtual agent Build an AI based virtual agent
Build an AI based virtual agent
 
Building Bots Using IBM Watson
Building Bots Using IBM WatsonBuilding Bots Using IBM Watson
Building Bots Using IBM Watson
 
Building chat bots using ai platforms (wit.ai or api.ai) in nodejs
Building chat bots using ai platforms (wit.ai or api.ai) in nodejsBuilding chat bots using ai platforms (wit.ai or api.ai) in nodejs
Building chat bots using ai platforms (wit.ai or api.ai) in nodejs
 
Building mobile apps using meteorJS
Building mobile apps using meteorJSBuilding mobile apps using meteorJS
Building mobile apps using meteorJS
 
Building iOS app using meteor
Building iOS app using meteorBuilding iOS app using meteor
Building iOS app using meteor
 
Understanding angular meteor
Understanding angular meteorUnderstanding angular meteor
Understanding angular meteor
 
Introducing ElasticSearch - Ashish
Introducing ElasticSearch - AshishIntroducing ElasticSearch - Ashish
Introducing ElasticSearch - Ashish
 
Meteor Introduction - Ashish
Meteor Introduction - AshishMeteor Introduction - Ashish
Meteor Introduction - Ashish
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Deep Learning Chapter on Autoencoders

  • 1. Deep Learning Chapter 14 - Auto Encoders Ashish Kumar Twitter: @ashish_fagna LinkedIn: https://www.linkedin.com/in/ashkmr1
  • 2. Using presentation by Ike Okonkwo - @ikeondata
  • 4.
  • 5. 14 Autoencoders AE is neural network that is trained to attempt to copy its input to its output It has a hidden layer [h] that encodes the input [x]. H = f(wx + bias) and also a decoder AE are restricted in ways that allow them to copy the inputs only approximately and so it is forced to prioritize which aspects of the input should be copied which can be great for feature extraction AE traditionally used for dimensionality reduction or feature learning AE can be considered a special case of feedforward networks and can be trained with similar techniques - - MiniBatch GD. They’re also trained by recirculation
  • 6. 14.1 Undercomplete Autoencoders Undercomplete AE is one in which the dimension of Hidden layer [h] is less than the dimensions of Input layer [x]. We are typically not interested in the AE output but the hidden layer [h] [h] is typically constrained to smaller dimension than [x] which is called Undercomplete. This forces an AE to capture only the salient features If the AE is allowed too much capacity, then it just learn to copy the inputs with extracting useful information about the distribution of the data
  • 7. 14.2 Regularized Autoencoders AE with hidden layers with dimensions equal to or greater than the input are called Overcomplete. Regularized autoencoders provide the ability train a architecture of autoencoder successfully, choosing the code dimension and the capacity of the encoder and decoder based on the complexity of distribution to be modeled. Rather than limiting the model capacity by keeping the encoder and decoder shallow and the code size small, regularized autoencoders use a loss function that encourages the model to have other properties besides the ability to copy its input to its output. These other properties include sparsity of the representation, smallness of the derivative of the representation, and robustness to noise or to missing inputs. A regularized autoencoder can be nonlinear and overcomplete but still learn something useful about the data distribution, even if the model capacity is great enough to learn a identity function.
  • 8. 14.2.1 Sparse Autoencoders We can also think of the penalty as a regularizer term added to a feedforward network whose main task is to copy inputs to outputs and perform some supervised tasks Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. Another way to think about sparse AE framework is approximating maximum likelihood of a generative model* with latent variables
  • 9. 14.2.2 Denoising Autoencoders A denoising AE (DAE) is an AE that receives a corrupted input [x^hat ] and then try to reconstruct the original inputs We use a corruption process C ( x^hat | x) which represents a conditional distribution over the corrupted samples x^hat given the original input [x]. The AE then learn a reconstruction distribution p (x | x^hat) for training pairs (x | x^hat) Typically, we can perform gradient based optimization and as long as the encoder is deterministic, the denoising AE is a feed forward network and can be trained with the same techniques as other similar networks Denoising AE shows how useful byproducts emerge by just reducing the reconstruction error. They also show how high capacity models may be used as autoencoders and still learn useful features without learning the identity function
  • 10. 14.2.3 Regularizing by Penalizing Derivatives Another strategy for regularizing autoencoders is to use a penalty gamma, as in sparse autoencoders but with a different form . This forces the model to be invariant to slight changes in the input vector [x]. Since this is only applies to training examples, it forces the AE to capture useful information about the training distribution This is called a contractive AE
  • 11. 14.3 Representational Power, Layer Size and Depth AE are usually trained using single layer encoders and decoders but we can also make the hidden layer [h] deep Since the encoder and decoder are both feed forward networks, they both benefit from deep architectures. The major advantage of deep architectures for feed forward neural networks is that they can represent an approximation of any function to an arbitrary degree of accuracy A deep encoder can also approximate any mapping from the input [x] to hidden layer [h] given enough hidden units. Depth exponentially reduces training cost and amount of training data needed and achieves better compression.
  • 12. 14.4 Stochastic Encoders and Decoders AEs are essentially feed forward neural networks For a stochastic AE, the encoder and decoder are not just simple functions but sampled from a distribution. p(h | x) for the encoder and p (x | h) for the decoder
  • 13. 14.5.1 Estimating the Score Score matching is an alternative to maximum likelihood and provides an alternative to probability distribution by encouraging the model to have the same score as the data distribution at every training point in [x] For AEs, learning the gradient field is one way of learning the structure of p(data) Denoising training of a specific kind of AE (sigmoid hidden units , linear reconstruction units) is equivalent to training an RBM (restricted Boltzmann machine , basic neural network) with Gaussian visible units .
  • 14. 14.5.1 Estimating the Score Score matching applied to RBMs yields a cost function that is identical to the reconstruction error combined with a regularization term similar to the contractive penalty of the CAE
  • 15. 14.5.2 Historical Perspective The idea of using MLP (multilayer perceptron) to denoise goes back to the 80’s A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of at least three layers of nodes. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training Denoising encoders are in some MLP trained to denoise. But the term “Denoising AE” refers to a model that learn not only to denoise its inputs but also learn a good internal representation (useful features) The learned representation can be used to pretrain a deeper unsupervised / supervised network The motivation for Denoising AE was to allow the learning of a very high capacity encoder while preventing the encoder / decoder from learning an identity function
  • 16. 14.6 Learning Manifolds with Autoencoders Most learning algorithms including AE exploit the idea that data concentrates around a low dimension manifold or learning surface. AEs aims to learn the structure of this manifold All AE training procedure involve a compromise between two forces - Learning a representation [h] of a training example [x] such that [x] can be reconstructed via [h] through a decoder - Satisfying the constraint of regularization penalty. This is usually an architectural constraint that limits the capacity of the AE These technique prefer methods that are less sensitive to the input The two forces force the hidden representation to capture information about the structure of the data generating distribution
  • 17. 14.6 Learning Manifolds with Autoencoders The AE can afford to represent only the variations needed for reconstruction The manifold captures a local coordinate system if the generating distribution concentrates near a low dimensional manifold. Hence the encoder learns a mapping from [x] to representative space that is only sensitive to changes along the manifold and not changes orthogonal to the manifold. The AE recovers the manifold structure if the reconstruction function is insensitive to the perturbation in the input -see example Most of the ML research of learning nonlinear manifolds has focused on non parametric methods based on the nearest-neighbour graph
  • 18. 14.6 Learning Manifolds with Autoencoders - Discuss locally linear Gaussian patches AI problems can usually have very complicated structures or manifolds that can be difficult to capture from local interpolation
  • 19. 14.7 Contractive Autoencoders The contractive AE introduces a regularizer on the [h] , the hidden layer There is a connection between Denoising AE and Contractive AE. In the limit of small Gaussian input noise the denoising reconstruction error is equivalent to the contractive penalty on the reconstruction function. That is Denoising AEs make the reconstruction function resist small but finite perturbations of the input while contractive AEs make the feature extraction function resist small perturbations to the input The CAE maps a neighborhood of input points to smaller neighborhood of output points,hence contracting the input.
  • 20. 14.7 Contractive Autoencoders Regularized AE learn manifolds by balancing opposing forces. For CAEs these are reconstruction error and the contractive penalty. Reconstruction error alone would allow the CAE to learn an identity function and contractive penalty alone will allow the CAE to learn features that are constant wrt [x] A good strategy to train AEs is to train a series of single layer AEs each trained to reconstruct the previous AEs hidden layer. The composition of these AEs forms a deep AE. Because each layer was separately trained to be contractive , the deep AE is contractive as well which is different from training the full AE with penalty The contractive penalty can also obtain useless results unless corrective action is taken
  • 21. 14.8 Predictive Sparse Decomposition PSD is hybrid model of sparse coding and parametric AE. A parametric AE is trained to predict the output of iterative inference and have been applied to unsupervised feature learning for object recognition in images/ video and audio PSD consists an encoder / decoder which are both parametric. The training algorithm alternates between minimizing wrt [h] and minimizing wrt to the model parameters PSD regularizes the decoder to use parameters for f(x) can infer good values For PSD the parametric encoder [f] is used to compute the learned features when the model is deployed. Evaluating [f] is computationally inexpensive vs inferring [h] via gradient descent. PSDs can be stacked and used to initialize a deep network
  • 22. 14.9 Applications of Autoencoders AE have successfully applied to recommendation systems, dimensionality reduction and information retrieval. The learned representation in [h] were qualitatively easier to interpret and relate to the underlying categories, with those categories manifesting as clusters Lower dimensional representation can improve performance on classification tasks since they consume less memory cheaper to run One task that benefits greatly from dimensionality reduction is information retrieval since search can become extremely efficient in low dimension spaces.
  • 23. 14.9 Applications of Autoencoders We can use DR (dimensional reduction) to produce [h] that is low dimension and binary and then store the entries in database mapping binary code vectors to entries (lookup) Searching of the hash table is very efficient. This approach to IR (Information Retrieval) via DM (data mining) and binarization is called semantic hashing. To produce binary codes for semantic hashing, we typically use an encoder with sigmoids (as activation function) on the final layer.
  • 24. Thanks Special thanks to Laura Montoya and Accel.ai ! Ashish Kumar ashish.fagna@gmail.com Twitter: @ashish_fagna LinkedIn: https://www.linkedin.com/in/ashkmr1