ProjectReport

1
A Study of Deep Learning Models and
Bayesian Statistics
By: Pritish Yuvraj
Summer Research Fellow
Indian Statistical Institute, Kolkata
Guide: Prof. Rajat Kumar De
Machine Intelligence Unit

2
A Study of Deep Learning Models and
Bayesian Statistics
By: Pritish Yuvraj
Summer Research Fellow
Guide: Prof. Rajat Kumar De
Machine Intelligence Unit

3
Table of Contents
Serial No: Content Page No:
1) History 3 - 4
2)
Restricted Boltzmann
Machine
5 - 11
3) Deep Belief Network 12 - 18
4) Bayesian Statistics 19
5) Conclusion 20

4
1. History
● The ultimate aim for Artificial Intelligence is to reach a pinnacle where machines can
accomplish tasks which are pernicious, byzantine and arduous for human beings to
perform. Towards this direction, researchers are working and have invented multiple
algorithms. With a colossal amount of money and manpower dedicated to the
improvement of the field of Artificial Intelligence, the field is improving very fast.
● The history of Artificial Intelligence backs to 1940 when Philosopher Pamela
McCorduck, attempted to describe the process of human thinking as the mechanical
manipulation of symbols. Time passed, algorithms with the basis of statistics, which
could "theoretically" solve real life problems. Unfortunately, these problems could not
be applicable to real life situations. These Artificially Intelligent programs failed to
undertake the importance of Environment, hence failed in real life. This was paved the
advent of Machine Learning.
● A subfield of Artificial Intelligence evolved from studies of Pattern Recognition and
Computational Learning Theory. In 1959, Arthur Samuel defined machine learning as
a "Field of study that gives computers the ability to learn without being explicitly
programmed". Machine Learning was able to overcome the deficiency of AI. Unlike the
previous AI applications, ML incorporated a lot of Data.

5
A notion was created that more the data from pragmatic sources the better will your program run. The world
knew about the importance of "Data". A relevant question here asked could be, why did Machine Learning
do well? It's because it was being trained on real datasets recorded from sensors, cameras, and recording
devices. Statistical Analysis of these data gave us a good insight into the convoluted pattern stored in data
and hence Machine Learning Researchers were able to mimic the learned information into a real life
application. Some examples where Machine Learning is being actively used is Spam Filtering, Search
Engines, Computer Vision, etc.
● At the present moment, a mammoth amount of data has already been collected and the process is still
continuing. Here comes the glitch, these data are mostly unsupervised. Data Mining and other techniques
perform the task of Unsupervised Learning better than Machine Learning. These stirred up an environment
to delve into a complex subfield of Machine Learning, "Deep Learning". Deep Learning was present since
1940's but never utilized as:
● 1) Data was not Sufficient
● 2) Computation power was high
● 3) No immediate necessity
●
● In this project, a study of Deep Learning Algorithms was conducted by implementing the algorithms in C++,
using object oriented approach and results are shared. The following four algorithms were implemented:
● 1) Restricted Boltzmann Machine
● 2) Deep Belief Network
● 3) Recurrent Neural Network
● 4) Convoluted Neural Network

6
2.1 Restricted Boltzmann Machine
● Found prominance after
efforts from Geoffrey Hinton
(University of Toranto).
● Unsupervised or
Supervised Depending on
Application
● Applications in Dimensional
Reduction, Classification,
Collaborative Filtering,
Feature Learning and Topic
Modelling.

7
2.1 RBM: Mathematical Formulas
Energy Configurations:
Probability Dist (where
Z is PartitionFunction):
Probabiltiy V given H:
Probability H given V:
Weight Update:

8
2.2 RBM:Training
● Take training Sample (v), Compute probabilites of hidden units.
● Sample a hidden activation vector (h) from the above probability Dist.
● Compute “Outer Product” of (v) and (h) (called Positive Gradient p(x)).
● From (h), sample reconstruction (v') of visible units, resample hidden activations (h') from this
probability Dist. (Gibbs Sampling)
● Compute “Outer Product” of (v') and (h'). (called Negative Gradient q(x))
● Update the Weights based upon differences between Positive Gradient and Negative Gradient.
● Aim of KL Divergence is to maximize common area from function p(x) and q(x).
● Or in other ways the probability of positive gradient and the probability of negative gradient are converged.
● The better the convergence the better we have predicted probability dist. Of input to prob. Dist of hidden Layer.

9
Experiment 1: RBM
● Example taken from Edwin Chen blog.
● Results are found after implementing RBM in C+
+. Codes available with github under profile of
“Pritish Yuvraj”.
● RBM conducts the experiment in Unsupervised
way. It isn't fed with the comments. So based
only on the inputs it needs create some sort of
pattern. Only hint it has is that it needs to create 2
separate group based on the given inputs.
Harray Potter Avatar LOTR Gladiator Titanic Glitter Comments
Alice 1 1 1 0 0 0 Big SF/Fantasy
Fan
Bob 1 0 1 0 0 0 SF/Fantasy fan,
but not Avatar
Carol 1 1 1 0 0 0 Big SG/fantasy
fab
David 0 0 1 1 1 0 Big Oscar
Winners Fan
Eric 0 0 1 1 1 0 Oscar Fan Except
Titanic
Fred 0 0 1 1 1 0 Big Oscar Winner
Fan

10
Conclusion 1: RBM
Hidden Layer
1
Hidden Layer
2
Harry Potter -7.70958 6.260625
Avatar -13.7941 3.09608
LOTR3 8.89752 4.491787
Gladiator 7.87261 -6.69001
Titanic 7.87356 -6.73726
Glitter -8.50164 -5.07191
●
Result Inference:
● 1) Harry Potter and Avatar form one group
(Science Fiction/ Fantasy Movies).
● 2) Gladiator and Titanic form another group
(Oscar Winners).
● 3) LOTR3 and Glitter don't belong clearly to
anyone of the two groups.
● Experiment Conducted with 6 visible
layers (Inputs) and 2 hidden layers
(Output).
● No of Epoch = 50000
● Final Error after iteration of all the
epochs = 5.683 * 10-6 .

11
Experiment 2: RBM
● Applied Restricted Boltzmann Machine
Algorithm on Datasets available by Yale
University. The database is called “Yale
Face Database”.
● References: P. Belhumeur, J. Hespanha, D.
Kriegman, ÒEigenfaces vs. Fisherfaces:
Recognition Using Class Specific Linear
Projection,Ó IEEE Transactions on Pattern
Analysis and Machine Intelligence, July
1997, pp. 711-720.
● The next slide will show the conlcusion of
the experiment conducted. We try to
generate the image based on whatever the
RBM has learned. The number of epochs
and Hidden Layers used wil are mentioned
on the next slide.

12
Result: Dreaming Phase of an RBM
● Hidden Layer:
100
● Epochs: 5000
● Hidden Layer:
10
● Epochs: 5000
● Hidden
Layer: 2
● Epochs: 5000

13
3. Deep Belief Network
History:
● Observed by Yee-Whye Teh, a student
of Geoffrey Hinton.
● 1st Effective Deep Learning Algorithm.
About the Model:
● Generative Graphical Model.
● Can be used in Unsupervised /
Supervised way.
● Composition of RBM's in stack.
● Supervised Deep Belief Network can
be used for Classification.

14
3.1 DBN formed after stacking RBM's

15
3.2 DBN: Algorithm
● 1) Train RBM on inputs (X) to obtain weight matrix (W).
● 2)Transform (X) by the RBM to produce new data (X').
● 3) Repeat this procedure with X <- X' for the next pair of layers
● 4) Stop before the top 2 layers.
● 5) Fine-tune the top 2 layers for Supervised Learning.
3.3 Fine-Tuning
Implemented using RBM stack layers and Logistic
Classifier in Fine-Tuning Stage. Implementation of this
program is available on github under the profile of “Pritish
Yuvraj”
Methods for Fine-tuning:
● Feed Forward Network
● Support Vector Machine
● Logistic Classifier

16
3.4 SoftMax Function
● The purpose of SoftMax Function is to
convert the Output into probability it
belongs to a certain group.
(Classification Problems).
● Eg.
– [0.1, 0.0001, 0.00002] becomes
– [0.35558, 0.3220, 0.3219]
– Or, [35%, 32%, 32%] (approx).

17
Experiment 3: DBN
● The objective of the experiment was to
determine the accuracy of Deep Belief
Network wrt to other Machine Learning
Algorithms.
● The problem we picked was to classify
images.
● To conduct the experiment, two image
databases were merged to create an
artificial database.
– LFW database (Labeled Wild Face taken from
Department of Computer Science, University
of Massachussets, Amhrest)
– Flowers database (Department of Computer
Science, University of Oxford)
.

18
3. 5 DBN Architecture for the Experiment
Preprocessing of Images.
● First the images were preprocessed
Black/White from RBM.
● Pixels were made unifrom to 320 * 240 pixels.
● No of Input Neurons: 77760
● 2 Hidden Layers with 100 neurons
each.
● Output Layer with 2 neurons,
classifying either a flower or a human
in the image.
● The Fine-Tuning was achieved
using Logistic Regression Classifier.
● Probablity of an image belonging to a
group was decided based on
SoftMax Funciton.

19
3. 6 DBN: Results
Algorithm Accuracy (in %):
Stochastic Gradient
Descent
96.20%
Random Forest Tree
Classifier
98.26%
Support Vector
Machine
98.3%
Deep Belief Network 100%
Detailed Report on Deep Belief Network
performance:
Number of Epochs
(of Entire
Architecture):
Accuracy
10 74.91%
50 95.38%
100 100%

20
4. Bayesian statistics●
Three approaches to Probability
– Axiomatic
●
Probability by definition and properties
– Relative Frequency
●
Repeated trials
– Degree of belief (subjective)
●
Personal measure of uncertainty
●
Problems
– The chance that a meteor strikes earth is 1%
– The probability of rain today is 30%
– The chance of getting an A on the exam is 50%
4.1 Bayes Theorem for Statistics
●
Let θ represent parameter(s)
●
Let X represent data
●
Left-hand side is a function of θ
●
Denominator on right-hand side does not depend on θ
●
Posterior distribution: Likelihood x Prior distribution
●
Posterior dist’n = Constant x Likelihood x Prior dist’n
●
Equation can be understood at the level of densities
●
Goal: Explore the posterior distribution of θ
( | ) ( | ) ( ) / ( )f X f X f f X  
( | ) ( | ) ( )f X f X f  

21
5. Conclusion
●
The project was on much more on the practical aspects of Deep Learning and
extracting the statistical knowledge required for that.
●
Deep Learning is a new emerging field of Artificial Intelligence which has brought
us one step closer to real vision of AI.
●
RBM is very important as most of the data present in the world are unsupervised.
The same goes with Deep Belief Network, We can use a lot of unsupervised data
to train the intial stack of RBM's on DBN and then use fine-tuning method for some
small supervised dataset.
●
DBN is more accurate compared to SVM and other Machine Learning algorithms
as can be deduced by the results of the experiment conducted in DBN section.
●
Unlike Machine Learning where an algorithm does particularly a single task, Deep
Learning Algorithms can perform multiple tasks. Like the same DBN can be used in
Classification of an Image or Classification of Text.
●
This was the orignal dream of Artificial Intelligence, from which we are still very far
but Deep Learning has taken us one step closer to it.

ProjectReport

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to ProjectReport

Similar to ProjectReport (20)

ProjectReport