SlideShare a Scribd company logo
1 of 35
Download to read offline
Deep learning -The Dream of Artificial Intelligence
Cibe Sridharan Kumaran(11pt08)
cibesridharan94@gmail.com
Indian Institute of Technology, Madras
Cibe Sridharan (PSG Tech) August 8,2014 1 / 34
Thanks to
Dr R.Nadarajan
Professor and head
Department of Applied Mathematics and Computational Sciences
PSG College of Technology
Dr R.Anitha
Programme Coordinator
Associate Professor
Department of Applied Mathematics and Computational Sciences
PSG College of Technology
Cibe Sridharan (PSG Tech) August 8,2014 2 / 34
Thanks to
Mr N.Mohan Raj
Tutor
Associate Professor
Department of Applied Mathematics and Computational Sciences
PSG College of Technology
Ms B.Malar
Internal Guide
Assistant Professor (Senior Grade)
Department of Applied Mathematics and Computational Sciences
PSG College of Technology
Cibe Sridharan (PSG Tech) August 8,2014 3 / 34
Thanks to
Dr B.Ravindran
External Guide
Associate Professor
Department of Computer Science and Engineering
Indian Institute of Technology Madras
Cibe Sridharan (PSG Tech) August 8,2014 4 / 34
Agenda
Motivation
Learning Methods
BackPropagation
Autoencoders
Restricted Boltzmann Machines
Deep learning Recipe
Problem
Implementation
Future Works
References
Cibe Sridharan (PSG Tech) August 8,2014 5 / 34
AI’s Dream
Startling gains in fields as diverse as computer vision, speech
recognition and the identification of promising new molecules for
designing drugs.
The Deep learning movement seeks to meld computer science
with Neuroscience — something that never quite happened in the
world of Artificial Intelligence.
This remarkable machine is capable of what amounts to thought.
Instead of doing AI, we ended up spending our lives doing curve
fitting.
Cibe Sridharan (PSG Tech) August 8,2014 6 / 34
Motivation
Human Intelligence dominated Machine Learning perfomance.
Finally Machine Learning is catching up to the dream of AI.
Cibe Sridharan (PSG Tech) August 8,2014 7 / 34
Learning Methods
Supervised Learning:
Supervised learning is when the data you feed your algorithm is
”tagged” to help your logic make decisions.
Example: For instance, very often training a neural network is
supervised learning: you’re telling the network to which class
corresponds the feature vector you’re feeding..
Unsupervised Learning:
The algorithm decides how to group samples into classes that
share common properties.
Example: K-means Clustering.
Cibe Sridharan (PSG Tech) August 8,2014 8 / 34
BackPropagation
BPN is based on Gradient Descent Learning that is the
mininmization of error E in terms of weights and activation
function. Wij = η ∗ ∂E/∂Wij
It is usually considered to be a supervised learning method,
although it is also used in some unsupervised networks such as
autoencoders.
It involes the activation function to be differentiable.
Cibe Sridharan (PSG Tech) August 8,2014 9 / 34
Idea-BPN
Computes the error term for the output units using the observed
error.
From output layer, repeat propagating the error term back to the
previous layerand updating the weights between the two layers
until the earliest hidden layer is reached.
Cibe Sridharan (PSG Tech) August 8,2014 10 / 34
Gradient Computation
Initialize weights (typically random)
Keep doing epochs
For each example e in training set do
•forward pass to compute
•–O = neural-net-output(network,e).
•–miss = (T-O) at each output unit.
•backward pass to calculate deltas to weights.
•update all weights.
end
until tuning set error stops improving
Cibe Sridharan (PSG Tech) August 8,2014 11 / 34
Flaws in BackPropagation Algorithm
Requires labeled data, and most data is unlabeled.
Vanishing Gradient Problem.
Overfitting(high variance low bias).
Easy to get stuck in poor local optima.
Gets worse as we add more hidden layers.
Cibe Sridharan (PSG Tech) August 8,2014 12 / 34
Autoencoders
Another technique for dimensionality reduction[3], where the
output of the encoder represents the reduced representation and
where the decoder is tuned to reconstruct the initial input from the
encoder’s representation through the minimization of a cost
function.
Cibe Sridharan (PSG Tech) August 8,2014 13 / 34
Contd..
Encoder
The encoder is a function f that maps an input x ∈ Rdx to hidden
representation h(x) ∈ Rdh . h = f(x) = sf (Wx + bh)
Decoder
The decoder function g maps hidden representation h back to a
reconstruction y. y = g(h) = sg(W h + by)
Objective Function
The Cross-entropy loss when sg is the sigmoid and inputs are in [0; 1]
L(x; y) = − dx
i=1 xi log(yi) + (1 − xi) log(1 − yi)
Cibe Sridharan (PSG Tech) August 8,2014 14 / 34
More on Hidden Layers
1 The hidden layer is said to undercomplete if it is smaller than the
input layer.The input layer compress its information in its units.
1 The hidden layer is said to overcomplete if it is greater than the
input layer.No compression in the hidden layer. Each hidden layer
copies a different input component.
Cibe Sridharan (PSG Tech) August 8,2014 15 / 34
Denoising Autoencoders
One simply corrupts input x before sending it through the
autoencoder, that is trained to reconstruct the clean version.[5]
Use a Guassian Additive Noise.
Loss function compares the reconstructed output with the
noiseless input not the corrupted input.
Cibe Sridharan (PSG Tech) August 8,2014 16 / 34
Contractive Autoencoders
The Contractive auto-encoder (CAE) is obtained with the
regularization term of yielding the objective function:
J(θ) = x∈Dn
(L(x, g(f(x))) + λ Jf (x) 2
F )
We add an explicit term in the loss that penalizes the solution.
We wish only to extract features that only reflect variations in the
training set.
Jf (x) 2
F = i(∂hj(x)∂xi)2
Cibe Sridharan (PSG Tech) August 8,2014 17 / 34
Restricted Boltzmann Machines
An RBM is an energy-based and undirected graphical model for
unsupervised learning.
It consists of two layers of binary units visible layer, to represent
the data, and the hidden layer, to increase learning capacity[4].
E(v, h) = − i,j vihjwij − i vibi − j hjbj
Inputs are Binary as Autoencoders.
There are no lateral connection in each layers. Which Graph??
Cibe Sridharan (PSG Tech) August 8,2014 18 / 34
Formulation
1 The Distribution Probability :P(v, h) = e−E(v,h)/Z
Z:Normalizing constant
2 The Objective Function:φ = log P(v) = φ+ − φ−
3 Positive gradient φ+ = log h e−E(v,h)
4 Negative gradient φ− = log v,h e−E(v,h)
∂φ+/∂wij = viP(hj = 1|v)
∂φ−/∂wij = log Z = P(vi = 1, hj = 1)
Negative Gradient is Intractable
5 Our distribution function is Intractable due to the Normalizing
constant Z so we have conditional distribution on it.
p(h|x) = j p(hj|x) all hj are binary.
p(hj = 1|x) = sigmoid(bj + Wjx).
Cibe Sridharan (PSG Tech) August 8,2014 19 / 34
Contrastive Divergence Algorithm
Idea:Every unit influences every neighbors since it is undirected one.
Replace the expectation by a point estimate x
Obtain the points by gibbs sampling.[2]
Start sampling chain x(t).
Cibe Sridharan (PSG Tech) August 8,2014 20 / 34
Relaxing the Constraints
If the two constraints are relaxed then:
Input x are unbound reals.
Add a quadtraic term to the energy function.
E(v, h) = − i,j vihjwij − i vibi − j hjbj + 1/2xT x
The distribution is only Guassian with µ = c + WT h and Identity
Covariance matrix.
If the original layers have lateral connections then.
E(v, h) = − i,j vihjwij − i vibi − j hjbj − 1/2xT Vx − 1/2hT Uh
Cibe Sridharan (PSG Tech) August 8,2014 21 / 34
Deep Learning era
Figure: Visual Cortex
Learning model with Multilayer Representations.
Each layer corresponds to the distributed representaion.
Each units in the layer are not mutually exclusive.
Cibe Sridharan (PSG Tech) August 8,2014 22 / 34
Recipe
The models in these Neural Networks are formed by two parts.
The first part is a trained feature extraction section, consisting in
successive layers of units, processing the data set inputs.
Each layer can be unsupervisedly pre-trained, as an
Auto-Encoder or a RBM.
The second part is a classifier which is trained in a supervised
way.
Cibe Sridharan (PSG Tech) August 8,2014 23 / 34
Greedy layerwise procedure
Geoff Hinton : “If you want to do computer vision, first learn computer
graphics.”
Unsupervised pretraining
Train one layer at a time using the unsupervised criterion.Fix the
parameters of the previous hidden layers.The previous layers are
viewed as feature extraction[1].
Supervised finetuning
Add an Output layer.Train the Neural network using Supervised
Learning(Back Propagation).Finally all parameters are tuned then stop
it.
Cibe Sridharan (PSG Tech) August 8,2014 24 / 34
Greedy Layerwise Algorithm
PseudoCode
1 for l=1:L
build unsupervised training set with (h(0)
(x) = x)
D ← {h(l−1)
(xt
)}T
t=1
2 train greedy module(Autoencoder,RBM) on D
3 use hidden layer weights and biases of greedy module to initialize
the deep network parameters Wl, bl
4 Initialize Wl+1, bl+1 randomly.
5 Train the whole network by supervised stochastic gradient
descent(Backprop).
Cibe Sridharan (PSG Tech) August 8,2014 25 / 34
Droupout Approach
This “overfitting” is greatly reduced by randomly omitting half of
the feature detectors on each training case.
For each training case, each hidden unit is randomly omitted from
the network with a probability of 0.5, so a hidden unit cannot rely
on other hidden units being present.
Instead, each neuron learns to detect a feature that is generally
helpful for producing the correct answer given the combinatorially
large variety of internal contexts in which it must operate.
Cibe Sridharan (PSG Tech) August 8,2014 26 / 34
Droupout Approach
This “overfitting” is greatly reduced by randomly omitting half of
the feature detectors on each training case.
For each training case, each hidden unit is randomly omitted from
the network with a probability of 0.5, so a hidden unit cannot rely
on other hidden units being present.
Instead, each neuron learns to detect a feature that is generally
helpful for producing the correct answer given the combinatorially
large variety of internal contexts in which it must operate.
h
(k)
(x) ← g
(k)
(x).m(k)
Cibe Sridharan (PSG Tech) August 8,2014 26 / 34
Pros and Cons
Add a pretraining phase to learn the structure of the input data.
Requires no labeled data.
No possibility to get stuck in bad local optima(Regularization).
A greedy layer-wise algorithm makes this efficient and fast.
Most of the learning in deep architectures is just some form of
gradient descent.
The Convergence rate of Contrastive Divergence is not clear.
Deep learning methods are often looked at as a black box, with
most confirmations done empirically, rather than theoretically.
Cibe Sridharan (PSG Tech) August 8,2014 27 / 34
Implementation
Implementation in Python-Theano Deep learning Library
Matlab
MNIST datset.
Cibe Sridharan (PSG Tech) August 8,2014 28 / 34
MNIST Recoginition
When the hidden layers are large the construction is good not the
vice versa.
Cibe Sridharan (PSG Tech) August 8,2014 29 / 34
Feature Seperation
Given two feature vector which are independent can we learn a
shared representaion from it.
Given a feature vector it should be able to retrieve the another one.
The above problem can be modelled as a Cocktail party problem.
Cibe Sridharan (PSG Tech) August 8,2014 30 / 34
Independent Component Analysis
It depends:
Cotraining
Nongaussian is independent
Kurtosis
Preprocessing in ICA.
Steps
Drawbacks
Cibe Sridharan (PSG Tech) August 8,2014 31 / 34
Future Works
KLA Tencor dataset.(Defect Classification)
Using deep nets for feature selection.
To find the classes from without the knowledge of labels.
To build a neural networks in Speech Recoginition (Deep ICA).
Cibe Sridharan (PSG Tech) August 8,2014 32 / 34
Timeline
1 May 14th-June14th –Deep learning-intro,Autoencoders
2 June 14th-July14th –Restricted Boltzmann Machines,Deep belief
nets.
3 June 15th-July27th –Python-Theano
4 July 28th-August 8th–Independent Component Analysis
Cibe Sridharan (PSG Tech) August 8,2014 33 / 34
References
Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine
Manzagol, Pascal Vincent, and Samy Bengio.
Why does unsupervised pre-training help deep learning?
The Journal of Machine Learning Research, 11:625–660, 2010.
Geoffrey E Hinton.
Training products of experts by minimizing contrastive divergence.
Neural computation, 14(8):1771–1800, 2002.
Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and
Yoshua Bengio.
Contractive auto-encoders: Explicit invariance during feature
extraction.
pages 833–840, 2011.
Tijmen Tieleman.
Training restricted boltzmann machines using approximations to
the likelihood gradient.
pages 1064–1071, 2008.Cibe Sridharan (PSG Tech) August 8,2014 34 / 34

More Related Content

What's hot

Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...Alexander Decker
 
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...diannepatricia
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayesmehdi Cherti
 
Improving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning AlgorithmImproving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning Algorithmijsrd.com
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
IRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET Journal
 
RNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsEmory NLP
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbfkylin
 
Multilayer Slides
Multilayer  SlidesMultilayer  Slides
Multilayer SlidesESCOM
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Performance Anaysis for Imaging System
Performance Anaysis for Imaging SystemPerformance Anaysis for Imaging System
Performance Anaysis for Imaging SystemVrushali Lanjewar
 
SOIAM (SOINN-AM)
SOIAM (SOINN-AM)SOIAM (SOINN-AM)
SOIAM (SOINN-AM)SOINN Inc.
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Mostafa G. M. Mostafa
 

What's hot (20)

Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...
 
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 
Improving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning AlgorithmImproving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning Algorithm
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
IRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based Steganography
 
RNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq Models
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Multilayer Slides
Multilayer  SlidesMultilayer  Slides
Multilayer Slides
 
vector QUANTIZATION
vector QUANTIZATIONvector QUANTIZATION
vector QUANTIZATION
 
476 479
476 479476 479
476 479
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Performance Anaysis for Imaging System
Performance Anaysis for Imaging SystemPerformance Anaysis for Imaging System
Performance Anaysis for Imaging System
 
Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
 
SOIAM (SOINN-AM)
SOIAM (SOINN-AM)SOIAM (SOINN-AM)
SOIAM (SOINN-AM)
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 

Similar to Review_Cibe Sridharan

Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithmsaciijournal
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS Academy
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine LearningSheilaJimenezMorejon
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsaciijournal
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowOswald Campesato
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
4213ijaia05
4213ijaia054213ijaia05
4213ijaia05ijaia
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringArthur Mensch
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...cscpconf
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to ReconstructJonas Adler
 

Similar to Review_Cibe Sridharan (20)

Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
Scala and Deep Learning
Scala and Deep LearningScala and Deep Learning
Scala and Deep Learning
 
4213ijaia05
4213ijaia054213ijaia05
4213ijaia05
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
 
AI Lesson 39
AI Lesson 39AI Lesson 39
AI Lesson 39
 

Review_Cibe Sridharan

  • 1. Deep learning -The Dream of Artificial Intelligence Cibe Sridharan Kumaran(11pt08) cibesridharan94@gmail.com Indian Institute of Technology, Madras Cibe Sridharan (PSG Tech) August 8,2014 1 / 34
  • 2. Thanks to Dr R.Nadarajan Professor and head Department of Applied Mathematics and Computational Sciences PSG College of Technology Dr R.Anitha Programme Coordinator Associate Professor Department of Applied Mathematics and Computational Sciences PSG College of Technology Cibe Sridharan (PSG Tech) August 8,2014 2 / 34
  • 3. Thanks to Mr N.Mohan Raj Tutor Associate Professor Department of Applied Mathematics and Computational Sciences PSG College of Technology Ms B.Malar Internal Guide Assistant Professor (Senior Grade) Department of Applied Mathematics and Computational Sciences PSG College of Technology Cibe Sridharan (PSG Tech) August 8,2014 3 / 34
  • 4. Thanks to Dr B.Ravindran External Guide Associate Professor Department of Computer Science and Engineering Indian Institute of Technology Madras Cibe Sridharan (PSG Tech) August 8,2014 4 / 34
  • 5. Agenda Motivation Learning Methods BackPropagation Autoencoders Restricted Boltzmann Machines Deep learning Recipe Problem Implementation Future Works References Cibe Sridharan (PSG Tech) August 8,2014 5 / 34
  • 6. AI’s Dream Startling gains in fields as diverse as computer vision, speech recognition and the identification of promising new molecules for designing drugs. The Deep learning movement seeks to meld computer science with Neuroscience — something that never quite happened in the world of Artificial Intelligence. This remarkable machine is capable of what amounts to thought. Instead of doing AI, we ended up spending our lives doing curve fitting. Cibe Sridharan (PSG Tech) August 8,2014 6 / 34
  • 7. Motivation Human Intelligence dominated Machine Learning perfomance. Finally Machine Learning is catching up to the dream of AI. Cibe Sridharan (PSG Tech) August 8,2014 7 / 34
  • 8. Learning Methods Supervised Learning: Supervised learning is when the data you feed your algorithm is ”tagged” to help your logic make decisions. Example: For instance, very often training a neural network is supervised learning: you’re telling the network to which class corresponds the feature vector you’re feeding.. Unsupervised Learning: The algorithm decides how to group samples into classes that share common properties. Example: K-means Clustering. Cibe Sridharan (PSG Tech) August 8,2014 8 / 34
  • 9. BackPropagation BPN is based on Gradient Descent Learning that is the mininmization of error E in terms of weights and activation function. Wij = η ∗ ∂E/∂Wij It is usually considered to be a supervised learning method, although it is also used in some unsupervised networks such as autoencoders. It involes the activation function to be differentiable. Cibe Sridharan (PSG Tech) August 8,2014 9 / 34
  • 10. Idea-BPN Computes the error term for the output units using the observed error. From output layer, repeat propagating the error term back to the previous layerand updating the weights between the two layers until the earliest hidden layer is reached. Cibe Sridharan (PSG Tech) August 8,2014 10 / 34
  • 11. Gradient Computation Initialize weights (typically random) Keep doing epochs For each example e in training set do •forward pass to compute •–O = neural-net-output(network,e). •–miss = (T-O) at each output unit. •backward pass to calculate deltas to weights. •update all weights. end until tuning set error stops improving Cibe Sridharan (PSG Tech) August 8,2014 11 / 34
  • 12. Flaws in BackPropagation Algorithm Requires labeled data, and most data is unlabeled. Vanishing Gradient Problem. Overfitting(high variance low bias). Easy to get stuck in poor local optima. Gets worse as we add more hidden layers. Cibe Sridharan (PSG Tech) August 8,2014 12 / 34
  • 13. Autoencoders Another technique for dimensionality reduction[3], where the output of the encoder represents the reduced representation and where the decoder is tuned to reconstruct the initial input from the encoder’s representation through the minimization of a cost function. Cibe Sridharan (PSG Tech) August 8,2014 13 / 34
  • 14. Contd.. Encoder The encoder is a function f that maps an input x ∈ Rdx to hidden representation h(x) ∈ Rdh . h = f(x) = sf (Wx + bh) Decoder The decoder function g maps hidden representation h back to a reconstruction y. y = g(h) = sg(W h + by) Objective Function The Cross-entropy loss when sg is the sigmoid and inputs are in [0; 1] L(x; y) = − dx i=1 xi log(yi) + (1 − xi) log(1 − yi) Cibe Sridharan (PSG Tech) August 8,2014 14 / 34
  • 15. More on Hidden Layers 1 The hidden layer is said to undercomplete if it is smaller than the input layer.The input layer compress its information in its units. 1 The hidden layer is said to overcomplete if it is greater than the input layer.No compression in the hidden layer. Each hidden layer copies a different input component. Cibe Sridharan (PSG Tech) August 8,2014 15 / 34
  • 16. Denoising Autoencoders One simply corrupts input x before sending it through the autoencoder, that is trained to reconstruct the clean version.[5] Use a Guassian Additive Noise. Loss function compares the reconstructed output with the noiseless input not the corrupted input. Cibe Sridharan (PSG Tech) August 8,2014 16 / 34
  • 17. Contractive Autoencoders The Contractive auto-encoder (CAE) is obtained with the regularization term of yielding the objective function: J(θ) = x∈Dn (L(x, g(f(x))) + λ Jf (x) 2 F ) We add an explicit term in the loss that penalizes the solution. We wish only to extract features that only reflect variations in the training set. Jf (x) 2 F = i(∂hj(x)∂xi)2 Cibe Sridharan (PSG Tech) August 8,2014 17 / 34
  • 18. Restricted Boltzmann Machines An RBM is an energy-based and undirected graphical model for unsupervised learning. It consists of two layers of binary units visible layer, to represent the data, and the hidden layer, to increase learning capacity[4]. E(v, h) = − i,j vihjwij − i vibi − j hjbj Inputs are Binary as Autoencoders. There are no lateral connection in each layers. Which Graph?? Cibe Sridharan (PSG Tech) August 8,2014 18 / 34
  • 19. Formulation 1 The Distribution Probability :P(v, h) = e−E(v,h)/Z Z:Normalizing constant 2 The Objective Function:φ = log P(v) = φ+ − φ− 3 Positive gradient φ+ = log h e−E(v,h) 4 Negative gradient φ− = log v,h e−E(v,h) ∂φ+/∂wij = viP(hj = 1|v) ∂φ−/∂wij = log Z = P(vi = 1, hj = 1) Negative Gradient is Intractable 5 Our distribution function is Intractable due to the Normalizing constant Z so we have conditional distribution on it. p(h|x) = j p(hj|x) all hj are binary. p(hj = 1|x) = sigmoid(bj + Wjx). Cibe Sridharan (PSG Tech) August 8,2014 19 / 34
  • 20. Contrastive Divergence Algorithm Idea:Every unit influences every neighbors since it is undirected one. Replace the expectation by a point estimate x Obtain the points by gibbs sampling.[2] Start sampling chain x(t). Cibe Sridharan (PSG Tech) August 8,2014 20 / 34
  • 21. Relaxing the Constraints If the two constraints are relaxed then: Input x are unbound reals. Add a quadtraic term to the energy function. E(v, h) = − i,j vihjwij − i vibi − j hjbj + 1/2xT x The distribution is only Guassian with µ = c + WT h and Identity Covariance matrix. If the original layers have lateral connections then. E(v, h) = − i,j vihjwij − i vibi − j hjbj − 1/2xT Vx − 1/2hT Uh Cibe Sridharan (PSG Tech) August 8,2014 21 / 34
  • 22. Deep Learning era Figure: Visual Cortex Learning model with Multilayer Representations. Each layer corresponds to the distributed representaion. Each units in the layer are not mutually exclusive. Cibe Sridharan (PSG Tech) August 8,2014 22 / 34
  • 23. Recipe The models in these Neural Networks are formed by two parts. The first part is a trained feature extraction section, consisting in successive layers of units, processing the data set inputs. Each layer can be unsupervisedly pre-trained, as an Auto-Encoder or a RBM. The second part is a classifier which is trained in a supervised way. Cibe Sridharan (PSG Tech) August 8,2014 23 / 34
  • 24. Greedy layerwise procedure Geoff Hinton : “If you want to do computer vision, first learn computer graphics.” Unsupervised pretraining Train one layer at a time using the unsupervised criterion.Fix the parameters of the previous hidden layers.The previous layers are viewed as feature extraction[1]. Supervised finetuning Add an Output layer.Train the Neural network using Supervised Learning(Back Propagation).Finally all parameters are tuned then stop it. Cibe Sridharan (PSG Tech) August 8,2014 24 / 34
  • 25. Greedy Layerwise Algorithm PseudoCode 1 for l=1:L build unsupervised training set with (h(0) (x) = x) D ← {h(l−1) (xt )}T t=1 2 train greedy module(Autoencoder,RBM) on D 3 use hidden layer weights and biases of greedy module to initialize the deep network parameters Wl, bl 4 Initialize Wl+1, bl+1 randomly. 5 Train the whole network by supervised stochastic gradient descent(Backprop). Cibe Sridharan (PSG Tech) August 8,2014 25 / 34
  • 26. Droupout Approach This “overfitting” is greatly reduced by randomly omitting half of the feature detectors on each training case. For each training case, each hidden unit is randomly omitted from the network with a probability of 0.5, so a hidden unit cannot rely on other hidden units being present. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Cibe Sridharan (PSG Tech) August 8,2014 26 / 34
  • 27. Droupout Approach This “overfitting” is greatly reduced by randomly omitting half of the feature detectors on each training case. For each training case, each hidden unit is randomly omitted from the network with a probability of 0.5, so a hidden unit cannot rely on other hidden units being present. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. h (k) (x) ← g (k) (x).m(k) Cibe Sridharan (PSG Tech) August 8,2014 26 / 34
  • 28. Pros and Cons Add a pretraining phase to learn the structure of the input data. Requires no labeled data. No possibility to get stuck in bad local optima(Regularization). A greedy layer-wise algorithm makes this efficient and fast. Most of the learning in deep architectures is just some form of gradient descent. The Convergence rate of Contrastive Divergence is not clear. Deep learning methods are often looked at as a black box, with most confirmations done empirically, rather than theoretically. Cibe Sridharan (PSG Tech) August 8,2014 27 / 34
  • 29. Implementation Implementation in Python-Theano Deep learning Library Matlab MNIST datset. Cibe Sridharan (PSG Tech) August 8,2014 28 / 34
  • 30. MNIST Recoginition When the hidden layers are large the construction is good not the vice versa. Cibe Sridharan (PSG Tech) August 8,2014 29 / 34
  • 31. Feature Seperation Given two feature vector which are independent can we learn a shared representaion from it. Given a feature vector it should be able to retrieve the another one. The above problem can be modelled as a Cocktail party problem. Cibe Sridharan (PSG Tech) August 8,2014 30 / 34
  • 32. Independent Component Analysis It depends: Cotraining Nongaussian is independent Kurtosis Preprocessing in ICA. Steps Drawbacks Cibe Sridharan (PSG Tech) August 8,2014 31 / 34
  • 33. Future Works KLA Tencor dataset.(Defect Classification) Using deep nets for feature selection. To find the classes from without the knowledge of labels. To build a neural networks in Speech Recoginition (Deep ICA). Cibe Sridharan (PSG Tech) August 8,2014 32 / 34
  • 34. Timeline 1 May 14th-June14th –Deep learning-intro,Autoencoders 2 June 14th-July14th –Restricted Boltzmann Machines,Deep belief nets. 3 June 15th-July27th –Python-Theano 4 July 28th-August 8th–Independent Component Analysis Cibe Sridharan (PSG Tech) August 8,2014 33 / 34
  • 35. References Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11:625–660, 2010. Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002. Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. pages 833–840, 2011. Tijmen Tieleman. Training restricted boltzmann machines using approximations to the likelihood gradient. pages 1064–1071, 2008.Cibe Sridharan (PSG Tech) August 8,2014 34 / 34