SlideShare a Scribd company logo
1 of 13
Download to read offline
A Study On Deep Learning
Abdelrahman Hosny
Graduate Student, Master’s
Computer Science
University of Connecticut
Email: abdelrahman@engr.uconn.edu
Anthony Parziale
Undergraduate Student, Junior
Computer Science
University of Connecticut
Email: anthony.parziale@uconn.edu
Abstract—With massive amounts of computational power, ma-
chines can now recognize objects and translate speech in real
time. Thanks to Deep Learning, Artificial Intelligence is now
getting smart. Deep Learning models attempt to mimic the
activity of the neocortex. It is understood that the activity of
these layers of neurons is what constitutes a brain to be able
to ”think.” These models learn to recognize patterns in digital
representations of data in a very similar sense to humans. In
this survey report, we introduce the most important concepts
of Deep Learning along with the state of the art models that
are now widely adopted in commercial products.
1. Introduction
Machine learning is the science of getting computers
to act without being explicitly programmed. It is the main
engine to many of the modern software applications: from
web searches to content filtering on social networks to
recommendations on e-commerce websites and smartphone
applications. Deep learning is a new area of machine learn-
ing research, which has been introduced with the objective
of moving machine learning closer to one of its original
goals: Artificial Intelligence. When exploring the field of
deep learning, it is easy to be overwhelmed with various
models and in the process, lose sight of the end objective [1].
Researchers aim at utilizing deep learning models to make
progress toward human-level AI. Many of them view deep
learning is a direct extension to artificial neural networks,
that are inspired by how the human brain works.
In this survey, our aim is to provide a brief explanation of
neural networks in addition to a concise explanation of the
differing deep learning architectures, their objectives, and
how they relate. In the next section, we start with the build-
ing block of any deep learning architecture: Artificial Neural
Networks. After that, we explore Deep learning models that
generally either consist of a ”deep” neural network (more
than 3 layers), or a stack of neural networks (where each
layer in the deep architecture is in fact a neural network
itself). In each model introduced, we shed the light on the
purpose of the model and its architecture. At the end, we
give practical tips on using each model and introduce some
of the recent commercial applications that are empowered
by deep learning models.
2. Artificial Neural Networks
Artificial neural networks are a family of models that
are inspired by biological neural networks. The idea behind
artificial neural network is the observation that babies see
adults moving around and after a few months, the accu-
mulated knowledge start to stimulate them to make mini-
pushups. This behavior encouraged neuroscientists to study
the activity that happens in their brains to learn without
being explicitly taught. In a similar analogy, computer sci-
entists modeled the brain in a mathematical model called:
artificial neural network. The question now is: how does the
brain work?
2.1. Background
Brains consist of neural cells. Each cell of these looks
like the one in figure 1. In the body of the neuron, there
is the nucleus that receives pulses of electricity from input
wires (dendrites) and based on these signals, the neuron
does some computation and sends a message (electrical
impulses) to other neurons through output wires (axons).
The human brain has billions of these neurons connected
together. Different neurons in the brain are responsible for
different senses, like the sight, smell and touch senses. It
is scientifically observed that any neuron in the brain net
can learn to do other jobs. For example, experiments on
animals prove that if we disconnect the wires that connect
an auditory neuron to the ears and connect it to the eyes, the
Figure 1: Human Brain Neuron
(a) Auditory cortex learns to see (b) Somatosensory cortex learns to see
Figure 2: Neurons learn to do different tasks when original wires are disconnected and reconnected to other senses
neuron will learn to see as in figure 2a. Similar experiments
disconnect the somatosensory neuron connection to the hand
and connect it to the eyes, it will eventually learn to see as in
figure 2b. Now, let’s switch context to talk about mimicking
this neural network in computers.
In a software environment, we create a similar model
that has the three major components:
• Cell body that contains the neuron. This neuron is
responsible for doing the computations.
• Input wires that carry out signals as inputs.
• Output wire(s) that transfer the output signal to other
neurons.
Figure 3 is a simple artificial (computer-one) neural
network that has only one neuron (the orange circle). x1, x2,
and x3 are the inputs to the neuron and they carry numerical
values. The function h is called the hypothesis function. It
computes its value by multiplying the input vector x by a
wight vector w and then the output is passed through an
activation function that computes the final scalar output
Figure 3: Artificial neural network with one neuron
Figure 4 shows a more advanced neural network. Each
vertical set of neurons is called a layer. Layer 1 contains the
neurons that represent inputs. Layer 2 is also called a hidden
layer. It does the core computation. Layer 3 is called the
output layer and does a computation on the data received
from layer 2 and then outputs one final result. Now, the
missing information in the one-neuron figure is:
1) What is the weight vector to be multiplied by the
input vector?
2) After multiplying the two vectors, what is the acti-
vation function that will output the final result?
Besides the number of layers and the number of neurons in
each layer, the answers to the above two questions are going
to define the neural network model. If one could solve, or
model, a specific mathematical problem by assigning values
to the weight vector and choosing an appropriate activation
function, the neural network model would satisfy its goal.
Figure 4: Artificial neural network with two layers
In practice, assigning weights and choosing an activation
function is the most challenging part in designing a neural
network. Therefore, computerized training procedures have
been developed to let the software optimize the values of the
weights. In the next two subsections, we discuss the back-
propagation algorithm; the fundamental technique to train a
neural network.
2.2. Activation Functions
As stated in the previous subsection, each layer is com-
posed of a set of neurons. The purpose of each neuron is
to perform a non-linear transformation on the input. Using
the network in Figure 3 as an example, input vector x will
be multiplied by weight vector w. If N is the number of
nodes in a layer, vector x will have a shape of [1, N] and
vector w will have a shape of [N, 1]. Multiplying these two
vectors will result in a scalar [1, 1] value.
x = [x1, x2, ..., xn] (1)
w =



w1
w2
...
wn


 (2)
x × w = xi ∗ wj = x1 ∗ w1 + x2 ∗ w2 + ... + xn ∗ wn
(3a)
y = x1 ∗ w1 + x2 ∗ w2 + ... + xn ∗ wn + bias (3b)
As you can see from equation 3b, y represents a simple
linear equation. Although interesting, this linearity serves
no advantage over simple linear regression. If y were to
be passed right onto the next layer’s nodes, we would say
that it had a linear activation function. In fact, one can
view a perceptron with a linear activation function as just
that – linear regression! By passing y through non-linear
activation functions, the network is able to truly represent
any function. The following equations illustrate the most
popular activation functions:
Identity
Figure 5: Identity: A(y) = y
Binary Step
Figure 6: Binary Step
A(y) =
0 for y < 0
1 for y ≥ 0
From a biological standpoint, these Activation functions
determine whether the neuron propagates a signal forward
to a receiving neuron or not.
Logistic
Figure 7: Logistic
A(y) =
1
1 + e−y
TanH
Figure 8: TanH
A(y) = tanh(y) =
2
1 + e−2y
− 1
Softsign
A(y) =
y
1 + |y|
Figure 9: Softsign
Rectified Linear Unit (ReLU)
Figure 10: ReLU
A(y) =
0 for y < 0
y for y ≥ 0
2.3. Backpropagation Algorithm
A neural network is trained with the combination of two
steps. The first step involves propagating the information
forward through the activation functions. The previous sec-
tion illustrated some of the most popular activation functions
used for the nodes in a network. Once this first pass is
completed, the model will produce an output. The error of
the network represents how close this output was to the
expected value. The second step in the training process
involves adjusting the weights of the network in an attempt
to minimize this error. As one can imagine, in a network
where every layer is fully connected to the next, the number
of weights that are produced is exponential. Therefore, min-
imizing training error through the use of back propagation is
a crucial need. Back propagation can be viewed as a clever
use of the chain rule [2].
Figure 11: Demonstration of the chain rule
Back Propagation propagates signals in the opposite
direction. Starting at the output layer L, the error derivative
is computed bases on all the input connections coming from
the previous layer L−1. Stemming from the simple fact that
the error of the output layer is the Ouput−Target, the error
can then be ”recursively” defined, enabling fast training of
the network. In reality, the error is usually defined as shown
in the equation below.
Etotal = Σ
1
2
(target − output)2
As you can see in figure 12, the error derivative of the
unactivated input z to each layer is used to compute the
error of the previous layer’s output. With the use of the
true power of the matrix to perform many calculations in
one step, neural networks are able to compute these error
derivatives and update the weight matrices very fast. Back
propagation was the key to finally being able to train and
therefore utilize neural networks. Due to matrix operations,
Back propagation can be parallelized to further decrease
Figure 12: Back Propagation Algorithm
training time, making deep neural networks possible. In
fact, the emergence of the entire field of Deep Learning
has been made possible with these advances to hardware.
Bottom line, without the advent of Back propagation, neural
networks would be borderline impossible to train efficiently.
2.4. Constraints of Neural Networks
Although neural networks have proven to be very ef-
ficient in many applications, research in cognitive neuro-
science has revealed many important differences between
brains and computers. Here, we list some of the major
differences:
• First, brains are analogue while computers are dig-
ital. Brains transmit information at a rate that is
essentially a continuous variable. Therefore, it is
believed that to build a model that is absolutely
identical to brains takes scientists either to build ana-
logue computers (changing the whole computation
model we know), or creatively develop a scheme
for mapping continuous brain signals to the existing
binary computing capabilities.
• Second, brains retrieve information by content while
computers retrieve them by address. For example,
thinking of the word apple automatically stimulate
your activation to think about other related fruits. In
a computer, it is either the word apple is addressable
and has a specific value or not. However, similar
paradigms can be implemented in computers, mostly
by building massive indices of stored data (like what
Google does).
• Third, while artificial neural networks are not ca-
pable of storing in memory, processing and memory
are performed by the same components in brains. In-
spired by neurons memory, a model of deep learning
has been developed called Long Short Term Memory
(LSTM) that address this inability by introducing a
technique to store information for longer time in
artificial neurons (see section 3.3 below).
Although the idea of artificial neural networks dates back
to 1950s, their applications are now brought back to the
table with the availability of large computational and storage
powers. Computer scientists are continuously improving
the models of neural networks to address different insuf-
ficiencies. The evolving architectures are now called Deep
Learning models, which will be the focus of the next section.
3. Deep Learning Models
When exploring differing deep learning models, it is
easy to tune into the ”buzzwords” that are frequently re-
peated and lose sight of what the actual objective of the
learning procedure is. It is easiest to divide the types of
model into the following two categories:
1) Discriminant Architectures: these models char-
acterize patterns based off posterior distributions
of classes. This can be assimilated to techniques
such as classification/regression. The paradigm of
discriminant models is that for an input, produce
an output. Discriminant models can be viewed as
bottom up networks. Inputs are given and they
propagate up through the network to produce out-
puts. This is the main difference from its genera-
tive counterpart that has no outputs. These models
can be view as the Supervised Deep Learning.
Examples for these models include Deep Neural
Networks(neural networks with >2 layers), Convo-
lutional Neural Networks (section 3.1), Recurrent
Neural Networks (section 3.2), and Long Short
Term Memory (section 3.3).
2) Generative Architectures: these models are em-
ployed to discover high-order correlations in
a given input. In these models, there are no
classes/value to predict for the input data as seen in
classification/regression techniques. The goal is to
extract meaningful relationships between features
in an effort to learn high-order features. Generative
models can learn a distribution from training data
and then be able to produce samples. The bottom
layer of these networks generate a vector x. The
goal is to train the model to give high probability
to the training data. The reason these models are in
fact called Generative is because they start from the
top layer and aim to generate the inputs by propa-
gating downwards through the network. The main
domain of these architectures is therefore Unsuper-
vised Feature Learning. Examples for these models
include Restricted Boltzmann Machines (section
3.4), Deep Boltzmann Machines (section 3.6), Deep
Belief Networks (section 3.5), and Auto-encoders
(section 3.7).
Figure 13: Common network architectures
In each of the following subsections, we illustrate a deep
learning model and its purpose. In general, discriminant
architectures are trained with backpropagation whereas gen-
erative architectures are trained with a modified free-energy
method. Training procedures tend to vary on the generative
side. Figure 13 illustrates some of the general schemes of
network architectures.
3.1. Convolutional Neural Network (CNN)
3.1.1. Purpose. A CNN is primarily used for processing
two dimensional data. Therefore, it is a prime candidate
for data such as images and videos. In the area of image
processing, a CNN (also called ConvNet) is able to extract
high-order features from an image (such as horizontal edges,
vertical edges, or color contrasts) and can lead to an impres-
sive understanding of the content. Convolutional networks
proved to be very efficient for learning representations of
data.
3.1.2. Architecture. For simplicity, we start by describing
the model on one-dimensional data then we move forward
to see how the model is express its effectiveness on two-
dimensional data. To classify a sample: x1, x2, x3, ... xn
using a basic neural network, we connect all the inputs to a
fully-connected layer where each input sample connects to
each neuron in the hidden layer as in figure 14.
Figure 14: Feeding input samples into a fully connected
layer (denoted by F) in a basic neural network
The architecture of CNNs follows a more sophisticated
approach that notices a symmetry in the features it is looking
for in the data. Therefore, we can create a group of neurons
before the hidden layer that takes a segment of the data as in
figure 15. This added layer is called a Convolutional Layer.
The output from the convolutional layer is fed into the fully
Figure 15: Adding a convolutional layer. Each A contains
a group of neurons that are fully connected to a segment
from the inputs.
connected layer, which we previously added. Convolutional
layer output can be fed into another convolutional layers,
hence creating layers of convolutions. The idea of a con-
volutional layer is to learn the appropriate feature filters as
opposed to hand engineering them.
To get a higher level representation of the data, a Pooling
Layer is added after the convolutional layers. A pooling
layer not only learns more abstract representations of the
data, but also reduces the number of parameters that will be
fed to the fully connected layer. For example, A max-pooling
layer takes the maximum of features over small blocks of
the the previous convolutional layer. Output from a pooling
layers can also be fed into the input of another convolutional
layer as in figure 16.
Figure 16: Adding a max-pooling layer. The output is fed
into another convolutional layer B.
The same concepts are applied to two-dimensional in-
puts such as images or videos. We can think of figure 17
from bottom to top as zooming out from the very specific
details of the data representation toward the more general
representation. A convolutional layer has A groups of neu-
rons. Each group just feed on part of the two dimensional
(e.g. a 5x5 pixel frame). As an example of face detection,
a first convolutional neural layer learns representations of
edges. After a first pooling layer, a second convolutional
(a) 2-D input (b) A full 2-D input to a convolutional network
Figure 17: A full convolutional neural network with two
dimensional input.
layer learns a more general representations of face parts such
as the eye or the nose. After a second pooling layer, a third
convolutional layer learns the most general representation to
detect a human face. The output is then passed to a fully
connected layer to produce the final classifications.
In summary, a CNN is divided into two stages. The first
is called the convolution layer. At this layer, each input has
a filter applied to it. This filter is a function representing a
certain transformation on the input data. The second stage
is the pooling layer. This process consists of summing
up neighborhoods in the output of the convolutional layer.
These two alternating stages can be applied for as many
layers as needed, each having a different filter A final fully
connected layer is responsible for the classifications. This
ensures that the model is able to detect high-order similari-
ties within the data irrespective of orientation/rotation.
3.1.3. References. Refer to the following paper [3] and blog
post [4] for a detailed illustration and studies of CNNs.
3.2. Recurrent Neural Network (RNN)
3.2.1. Purpose. A RNN is primarily used for processing
data that come in the form of a sequence. Therefore, it is a
prime candidate for speech recognition, language modeling
and translation. One limitation of ConvNets is that they
accept a fixed-size vector as input and produce a fixed-size
vector as output, performing this mapping using a fixed
amount of computational steps - the number of layers in
the model and the number of units in each layer. The core
difference in RNNs is that they operate over sequences of
vectors in the input as well as the output.
3.2.2. Architecture. Traditional neural network (and Con-
vNets) are memoryless. If a traditional neural network is
to be used to classify what is the weather like from the
forecast readings, it is unclear how the model will do that.
They operate on a fixed size input and a fixed size output
performing the computation using a pre-specified number of
Figure 18: Recurrent neural network basic component.
Left: a chunk of neural network A receives some input x
and outputs a value h. Right: an unrolled RNN.
hidden layers and units. Recurrent neural networks address
this issue by introducing memory in the network in the form
of a loop as in figure 18. You can think of an RNN as a stack
of separate neural networks with some parameters of each
network fed from the previous network; these parameters
play the role of a memory.
Inside each repeating module of the recurrent neural
network, the input x at time-step t is concatenated with
the output h at the time-step t-1 and together are passed
through an activation function to result in the output h at the
current time-step t. Figure 19 shows an unrolled illustration
of the this behavior, where the yellow box represents a single
neural network layer with a tanh activation function (other
activation functions can be used as well).
Figure 19: The repeating module in a RNN with tanh used
as the activation function in the neural network.
Although RNNs are simple in the way that they accept
an input vector x and produce and output vector y, their
effectiveness comes from the fact that the output vector’s
content is influenced not only by the input x, but also by
the entire history of that have been fed to the network in
the past. The RNN has some internal state that gets updated
every time an input is fed into the network. In the simplest
case, this state is represented as a single hidden vector h.
What happens when there is a long-term dependency?
For example, a word in an essay is derived from a word
in the previous paragraph. Unfortunately, the more the gap
grows, RNNs become unable to learn to connect depen-
dencies in the sequence. Figure 20 shows the long-term
dependency problem in RNNs. Therefore, Long Short Term
Memory (LSTM) models have been proposed to overcome
this problem. LSTM are the subject of the next section.
3.2.3. References. Refer to the following paper [5] and blog
post [6] for a detailed illustration and studies of RNNs.
Figure 20: The output h at time t+1 depends on the input
x at times 0 and 1.
3.3. Long Short Term Memory (LSTM)
3.3.1. Purpose. Long Short Term Memory networks are
considered an improvement to recurrent neural networks that
solves the problem of long-term dependency. Real-world
implementations mostly depend on LSTM models rather
than the basic RNNs.
3.3.2. Architecture. Like RNNs, LSTMs also have a a
chain-like structure (when unrolled). However, instead of
a single neural network layer in the repeating module as in
figure 19, LSTMs have four neural network layers interact-
ing in a special harmony as in figure 21.
Figure 21: Four interactive neural network layers inside
the repeating module of LSTM. Each line carries an entire
vector from one node to another.The yellow boxes are
neural networks with the indicated activation function. The
pink circles resent point-wise operations like vector
addition. Lines merging denote concatenation. Lines
forking denote content being cloned to different locations.
The core idea behind LSTMs is the horizontal line
passing through the top of the module. The line represents
a cell state that carries information along from one cycle
to the next. Addition and multiplication gates control the
information being stored (or not) in the cell state vector.
Each of the four neural network layers is responsible for
a specific functionality to be carried out in the cell. The
operation of cell occurs in three steps as follows:
• First: the first neural network layer from the left (also
called forget gate layer) decides what information is
going to be thrown away from the sate vector. The
sigmoid layer looks at ht−1 and xt, and outputs a
number between 0 and 1 for each number in the
cell state vector; that passed through the top line. A
1 represents a ”completely keep this” decision and
a 0 represents a ”completely removes this” decision.
The output from the first layer is represented as ft
below:
ft = σ(Wf .[ht−1, xt] + bf )
• Second: the next two layers decide what new in-
formation we are going to store in the cell state
vector. The sigmoid layer (also called input gate
layer) decides which values will be updated:
it = σ(Wi.[ht−1, xt] + bi)
and the tanh layer creates a vector of new candidate
values, that could be added to the state:
˜Ct = tanh(WC.[ht−1, xt] + bC)
The new cell state vector Ct is computed as follows:
Ct = ft ∗ Ct−1 + ti ∗ ˜Ct
• Third: the last layer in the cell computes the actual
output ht. The output value is influenced by the last
sigmoid layer as well as the new cell state vector
that was just computed.
ot = σ(Wo.[ht−1, x] + bo)
ht = ot ∗ tanh(Ct)
Although what is described so far is a normal LSTM, every
paper involving LSTMs uses a slightly different architecture.
A common variation is to let the above functions ft, it and
ot consider the previous cell state vector Ct; a technique
known as peephole connections. Other variations exist de-
pending on the training task. Yet, all variations depend on
the idea of the cell state vector that can carry information
for long time, hence allowing long-term dependencies to be
taken in consideration for prediction.
3.3.3. References. Refer to the following papers [7], [8]
and generous blog post [9] for a detailed illustration and
studies of LSTMs.
3.4. Restricted Boltzmann Machine (RBM)
3.4.1. Purpose. The first Generative architecture we will
explore is the Restricted Boltzmann Machine. This is not
to be confused with the Boltzmann Machine. Figure 22
illustrates the difference and the next section will explain
the subtlty. A RBM is commonly utilized in Unsupervised
Learning tasks such as Dimensionality Reduction, Feature
Learning, and Collaborative Filtering.
3.4.2. Architecture. An RBM is composed of two layers,
an input layer and a hidden layer. These layers have undi-
rected connections between them. The restriction placed on
a RBM is that no two nodes in the same layer can have a
connection. This is the differentiator between a Boltzmann
Machine and a Restricted Boltzmann Machine. The former
has existed for many years but it was not until this slight
Figure 22: As you can see, the Boltzmann Machine
includes intra-layer connections whereas the RBM is
limited to only having inter-layer connections.
modification created the latter that this theoretical model
was utilizable. Without the restriction of having any intra-
layer connections, a Boltzmann Machine is completely un-
trainable and essentially folds into chaos. We can therefore
define an RBM formally as a two layer neural network
with many inter-layer, but no intra-layer connections. Each
connection bears a weight that is trained during the learning
procedure. By adjusting these weights, an RBM can fit its
parameters(hidden layer nodes) to represent the distribution
of the training data. Once this hidden layer is trained, one
can generate samples that fit the distribution of the training
data. This technique has been used to compensate for a
scarce amount of available data in certain fields.
Figure 23: The architecture of a RBM. The shaded nodes
represent the visible input layer and the white nodes
represent the hidden layer.
3.4.3. Training. The training procedure for a RBM has a
few differentiators over the methods used for Discriminant
models. While the final step in the procedure entails per-
forming stochastic gradient descent to decrease the error
rate, the means by which the error is computed differs. In
an RBM, a procedure called Contrastive Divergence is used.
In simplest terms, each iteration can be broken down into
3 phases. The hidden layer is created from the input layer
based on probabilities minimizing the free energy of the
model. This will create a hidden layer with certain activa-
tions based on that minimization function. This is called the
Positive Phase. The next phase is called the Negative Phase.
The input layer is the reconstructed based on this hidden
layer. This newly constructed layer is then propagated back
to the hidden layer to create a new set of activations. The
third phase is the Update phase where the hidden layer in
the Positive phase, and both the reconstructed input and
second created hidden layer in the Negative phase, are used
to determine the error and update the weights to minimize
this term.
All in all, this learning procedure requires hands-on
experience to master. There are many hyper-parameters such
as the learning rate, momentum, weight-cost, sparsity target,
initial values of the weights, number of hidden units, and
size of each batch [10].
For each specific application, a specific set of hyper-
parameters must be set. This is the art of training RBM’s
and there is no right or wrong way to set them, and only
through trial and error can one determine the correct set.
3.4.4. References. Refer to the following papers [11], [12],
[13] for a detailed illustration and studies of RBMs.
3.5. Deep Belief Network (DBN)
3.5.1. Purpose. Deep Belief Networks are utilized for learn-
ing a representation of some input data. Their purpose is
very similar to the RBM and in practice, researchers rarely
use RBMs anymore. The DBN can be viewed as the logical
next step in the timeline of the development of the RBM.
It is the next iteration and improvement to this type of
model and has been widely accepted as a replacement for
the RBM. Some have argued that since RBM’s have the
representational power for any function approximation, what
is the use of DBNs? Further research has only been able to
conclude that by adding an additional layer, the information
gain must be positive over a more shallow one. This implies
that there is no harm in adding an additional layer and
from our understanding, it is to able to detect higher level
abstractions in the data.
3.5.2. Architecture. A Deep Belief Networks is a stack of
feedforward RBMs. The output of layer k, which is the
hidden layer of an RBM, is the input of the next layer’s
RBM. The motivation of this architecture is the idea that an
efficient way to learn a complicated model is to combine
a set of simpler models that are learned sequentially. We
believe that by adding layers to the DBN as opposed to
adding nodes to the hidden layer of a RBM allows the
model to become more flexible, more representable, and
less dependent on the number of nodes in each hidden layer.
This requires less manual feature engineering and allows the
Neural Net to ”work its magic.” We believe this makes the
DBN a more preferable model over the RBM.
3.5.3. Training. The study of training DBNs has filled many
research papers and cannot be properly explained to the
Figure 24: This network represents the architecture of a
Deep Belief Net. Each pair of layers represents a RBM.
As explained, each RBM’s hidden layer is fed into the
input layer of the next RBM.
extent required in the scope of this paper. But more gen-
erally, training is performed in a greedy layer-wise fashion.
To summarize, all of the learning involved is localized. By
performing a greedy-layer wise procedure, the network can
train iteratively and the complexity becomes manageable.
This layer-by-layer unsupervised learning algorithm consists
of learning a stack of RBM’s, one RBM at a time, and is
illustrated in Figure 25. The first step consists of training
the first layer as an RBM that models the input. This first
layer is used as the input layer for the second layer. This
part is general modified by choosing only mean activations
or by sampling. This process is repeated for as many layers
as desired, each time propagating forward the hidden layer
of the previously trained RBM. The parameters(weights)
of the model are then updated on this deep architecture
with respect to the log-likelihood. In supervised training
scenarios, a target output can be substituted for the error
term instead of the log-likelihood.
3.5.4. References. Refer to the following paper [14] for a
detailed illustration and studies of DBNs.
3.6. Deep Boltzmann Machine (DBM)
3.6.1. Purpose. Deep Boltzmann Machines can be views as
multi-layer RBM’s. In contrast to the RBM network being
limited to one hidden layer, Deep Boltzmann Machines can
have many. This allows the weights to be visible to other lay-
ers and forms a more complex version of the RBM. DBM’s
have the potential of learning increasingly complex internal
Figure 25: This figure represents the layer-wise training
procedure of a Deep Belief Network. Each RBM is
trained, stacked, and their hidden layer is fed to the input
layer of the next RBM.
representations of the data, which is needed in the fields
of speech recognition and object assimilation. In practice,
however, a DBM is rarely used and often is substituted with
the more promising, and trainable, DBN. We include this ex-
planation of the architecture purely as a reference so readers
can differentiate the terms and understand the difference in
architectures between Deep Boltzmann Machine and Deep
Belief Networks.
3.6.2. Architecture. Although very similar to the archi-
tecture of a DBN, the architecture of a Deep Boltzmann
Machine has one striking difference. Instead of having di-
rected connections between each stacked RBM, a DBM has
undirected connections between each layer. This implies that
weights are shared throughout the entire model as opposed
to the more layer-wise approach of a DBN. The difference
is illustrated in Figure 26 A DBN is a stacked of connected
RBMs. A Deep Boltzmann Machine is a RBM with multiple
hidden layers. This implies a fundamental difference in the
training procedure. We will not cover the training procedure
for a DBM because it is out of the scope of this literature
survey, but bear in mind that it entails factoring in the
weights of not just one direction of inputs because signals
can be propagating from both directions of the network.
When comparing the two models, DBNs can be viewed as
a stack of RBMs whereas the DBM is a hybrid version of
the RBM.
3.6.3. References. Refer to the following paper [15] for a
detailed illustration and studies of DBMs.
3.7. Auto-encoders
3.7.1. Purpose. Auto-encoders are neural networks that aim
to learn a compressed representation, or encoding, of the
input data. The model is considered Generative because it
is trained to recreate the input data from its hidden layer.
Auto-encoders are great for Dimensionality Reduction and
are have spawned serious interest recently.
Figure 26: Although each layer is a stacked RBM, the
direction of the connections between layers in Deep Belief
Networks and Deep Boltzmann Machines differ.
3.7.2. Architecture. Auto-encoders have a unique architec-
ture. They are designed to have three layers. The first is the
input layer. The third layer is the output layer. This is shown
in Figure 27. The middle hidden layer between these two is
called the Feature layer. The input and output layers of an
Auto-encoder are intended to be same after training. This
middle layer that serves as an encoder of the input data.
This middle layer’s dimensionality can be greater than or
less than to the input layer depending on application. In the
case of the feature layer having a dimensionality less than
the input layer, this model is excellent at performing dimen-
sionality reduction. The real focus on these models is the
Feature layer that is created during training. Since the input
and output layers will be the same, they are of no interest
besides for training purposes. The middle layer represents
an encoding of the data. Architectures such as stacked Auto-
Encoders link these Feature Layers in a stacked fashion to
create higher level abstractions of the data as well. This
methodology of stacking neural networks to create high level
understanding of the data is the key to Deep Learning. By
allowing the network to have more representations, more
correlations are able to be automatically detected. This is
why this model is called an Auto-Encoder.
3.7.3. Training. Training can be conceptualized as the net-
work trying to ”recreate” the data. The network receives
its inputs and feeds this to the Feature layer. The first
part of the training process is called the encoding phase.
The input data from the first layer is encoded into the
Feature layer through adjustable weights. Each node in the
Feature layer then propagates a signal forward and, with
the assistance of adjustable weights and biases, maps this
encoded representation back to its original un-encoded state.
This is referred to as the decoding phase. To summarize,
data is fed into the input layer, encoded in the feature layer,
then decoded into the output layer. Error is determined by
comparing this output value to the inputted value, as they
Figure 27: The architecture of a Auto-Encoder. As you can
see, the first and last layers are the same. The middle layer
represents the Features(encoding) learned during training
should be exactly the same.
3.7.4. References. Refer to the following paper [16] for a
detailed illustration and studies of Auto-encoders.
4. Choosing a Model
With the growing number of variations of deep learning
models, it is important to choose a model that is suitable
for the task in hand. Many factors contribute to choosing a
model that can effectively represent a solution to the task in
hand.
• First, study the dataset in hand.
• Second, decide whether you want to do a classifica-
tion, a prediction, or learn about data representation.
• Third, choose a model and try out different varia-
tions of it until you reach the desired objective.
In table 1, we summarize decision factors for the surveyed
models in this study. These models cover a wide variety
of domains. Other models fall in one of these general
architectures.
5. Applications
At the time we do this extensive survey over deep
learning models, researchers from different labs are utilizing
these approaches in a myriad of real world applications and
achieve state of the art performance. In this section, we shed
the light on some of the trending projects being sponsored
by large tech companies.
5.1. Facebook’s DeepFace
Uploading a picture with your friends to Facebook au-
tomatically suggests tagging your friends in the picture by
recognizing their faces. Closing the gap to a human-level
performance in face verification is the main research focus
at Facebook’s DeepFace. They derive face representation
from a nine-layer deep neural network that involves more
than 120 million parameters using several locally connected
layers without weight sharing. The model was trained on
a four million facial images belonging to more than 4000
entities, and the result is: the most powerful face recognition
module we see in the largest social network in the world!
5.2. Google’s DeepMind
Founded in London in 2010 and acquired by Google
in early 2014, DeepMind algorithms are capable of learning
for themselves directly from raw experience or data, and are
general in that they can perform well across a wide variety
of tasks straight out of the box. Their team consists of many
renowned experts in their respective fields, including but not
limited to deep neural networks, reinforcement learning and
systems neuroscience-inspired models. One recent remark-
able achievement is AlphaGo – the first computer program
to ever beat a professional player of Go. It was a tremendous
milestone when we have seen a computer brain, powered by
deep learning models, beats a human brain.
create a probabilistic reconstruction of data
feature detectors
5.3. Apple’s Siri
Siri (Speech Interpretation and Recognition Interface)
is Apple’s intelligent personal assistant that comes pre-
installed with their iPhone devices. Siri’s primary technical
areas focus on a conversational interface, personal context
awareness and service delegation. In the core of the conver-
sational interface resides a strong speech recognition engine,
powered by deep learning models, that learns a user’s accent
and adapts itself to it to respond with better results. The
power of Siri comes not only from the speech recognition
engine, but also from other machine learning models that
can carry a full conversation between the user and the device
relying on a set of web services.
Model CNN RNN LSTM RBM DBN Auto-encoder
Type Discriminative Discriminative Discriminative Generative Generative Generative
Purpose Classification Prediction Prediction Unsupervised Feature Unsupervised Feature Dimensionality
Learning Learning Reduction
Suitable for Processing two- Processing sequence Processing long Learning distributions Creating a probabilistic Creating a compact
dimensional data data sequence data of data reconstruction of data representation of data
Example Images/Videos Language models Language models Generate samples from Trained layer used PCA-like
and Speech and Speech learned hidden representations as feature detectors tasks
TABLE 1: Summary of the surveyed deep learning models
5.4. Microsoft’s Cortana
Analogous to Siri, Cortana is the clever personal as-
sistant developed by Microsoft that helps you find things
on your PC, manage your calendar, track packages, find
files, chat with you, and tell jokes. Cortana learns the user
behavior through a deep learning model in the sense that
the more you use Cortana, the more personalized your ex-
perience will be. Cortana depends heavily on understanding
a user’s query and takes actions based on this request. A set
of deep learning language models enable setting reminders,
making calls, sending emails and answering questions when
requested by the user. Cortana is a massive improvement
in the field of artificial intelligence in human-computer
interaction.
6. Current Research Directions
As demonstrated, deep learning is a vast field [3], [17].
Following the theoretical claim that -with enough hidden
nodes- a model can be trained to represent any function
or distribution, we are seeing a re-emergence of many
classical techniques of machine learning, especially with the
increasing improvements in computational resources.
On the discriminant side of models, there is an aggres-
sive push towards memory-based models. As shown, one
successful model is the LSTM. But in no way is that all
that has been proposed. Memory Networks, Neural Turing
Networks, and Hierarchical Temporal Memory are all similar
memory-based deep neural networks. The advantage of this
direction is that the networks are able to retain state through-
out their lifetimes. The goal of these networks is to enable
tasks such as sequence learning and reinforcement learning
to be representable and trainable. These tasks require the use
of memory to be able to utilize previously seen inputs and
correlations in future models. In our opinion, reinforcement
learning will be the heavy focus of deep learning in the next
few years. We are seeing paradigm shift occurring as data
scientists are now realizing that deep neural networks can be
used as function approximations in reinforcement learning
algorithms. We believe this has a lot of potential and will
be pursuing research in this area in the future.
On the generative side of models, there has been a re-
emerging interest in the past few years. Hinton has dropped
a bomb and ignited the entire field of deep learning in his
influential paper [14] about a generative architecture. The
focus then shifted to the more shiny side of unsupervised
learning. With the explosion of unlabeled data spouting from
the various sources of big data, the need to improve these
unsupervised deep learning models has been growing. The
focus on these models varies from being a pre-training step
to be fed forward to a discriminant model to more ”all-in-
one” hybrid solutions. There is some past research in the
area of discriminant RBM’s and it’s variations highlighted
in Bengio’s paper [18] that we believe will be useful in
truly harnessing the representational power of these typically
generative models.
7. Conclusion
This concludes our survey of the field of deep learning.
To summarize in one statement, we believe deep learning
can be viewed as the art of utilizing deep neural network
structures to represent any machine learning task. Although
some of the theoretical strengths of neural networks has been
claimed since the 50’s, the recent advancement of computer
hardware have made these hypothesis’ verifiable. What we
are now seeing a complete redefinition of the tasks that have
been stapled in the field of machine learning, and the broader
domain of artificial intelligence.
We view these recent advancements as the beginning of
the era of truly thinking computers. Whereas old machine
learning techniques such as SVMs, clustering, PCA, ect.
are each based on certain statistical characteristics of the
data, neural networks can be viewed as a digital muscle that
can be strengthen in a certain manner to represent any of
those models. In our own opinion, old ML techniques can be
viewed as discrete learning methods whereas deep learning
is more of a continuous learning method. A simple example
of this is when comparing a stack of linear regressions on
top of each other as opposed to a deep neural network.
Ultimately, a stack of linear regressions is still going to be
linear no matter what. The equation may have a completely
different slope and bias, and be able to represent an arbitrary
function, but the capabilities are limited. As demonstrated,
neural networks are not bound by this linearity. The usage
of a nonlinear activation function boosts the representational
power of certain models so high that there are theoretical
claims that deep learning architectures can be learned to rep-
resent any distribution or function [19]. This representational
power stems from the differentiation of network structures
into discriminant and generative architectures.
To state that the emergence of the field of deep learning
has correlated with the rise of performance of computer
hardware does not illustrate the dependence quite enough.
If one thing was clear from our research, it was that the
techniques of deep learning are the most computationally
intensive problems that computers have been introduced
to. It is no understatement that deep learning is a field
that models its methods based on the world’s most power
processor – the human brain. With a strong rooted foun-
dation in neuroscience, we have no doubt that the models
developed by deep learning researchers will aid and push
the sister field. There is an innate link between the research
neuroscientists are performing on the brain to understand
how the human mind works and the work deep learning
experts are undergoing to emulate this process. We be-
lieve that only by the further integration of the fields of
deep learning and neuroscience, seen in models such as
Hierarchical Temporal Memory, true general intelligence
can be realized. As such computationally intensive software
methods are created, hardware will continue to push the
boundaries of what is considered possible.
References
[1] Y. Bengio, “Learning deep architectures for AI,” Foundations and
Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009, also
published as a book. Now Publishers, 2009.
[2] ——, “Practical recommendations for gradient-based train-
ing of deep architectures,” 06 2012. [Online]. Available:
http://arxiv.org/abs/1206.5533
[3] I. Arel, D. C. Rose, and T. P. Karnowski, “Deep machine learning -
a new frontier in artificial intelligence research [research frontier],”
IEEE Computational Intelligence Magazine, vol. 5, no. 4, pp. 13–18,
Nov 2010.
[4] O. Christopher, “Conv nets: A modular perspective.”
[5] I. Sutskever, “Training recurrent neural networks,” Ph.D. dissertation,
Toronto, Ont., Canada, Canada, 2013, aAINS22066.
[6] K. Andrej, “The unreasonable effectiveness of recurrent neural net-
works.”
[7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online].
Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735
[8] K. Greff, R. K. Srivastava, J. Koutn´ık, B. R. Ste-
unebrink, and J. Schmidhuber, “LSTM: A search space
odyssey,” CoRR, vol. abs/1503.04069, 2015. [Online]. Available:
http://arxiv.org/abs/1503.04069
[9] O. Christopher, “Understanding lstm networks.”
[10] G. E. Hinton, Neural Networks: Tricks of the Trade: Second Edition.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, ch. A Practical
Guide to Training Restricted Boltzmann Machines, pp. 599–619.
[11] ——, “Deterministic boltzmann learning performs steepest descent in
weight-space,” Neural Comput., vol. 1, no. 1, pp. 143–150, Mar. 1989.
[Online]. Available: http://dx.doi.org/10.1162/neco.1989.1.1.143
[12] N. Le Roux and Y. Bengio, “Representational power of restricted
boltzmann machines and deep belief networks,” Neural Comput.,
vol. 20, no. 6, pp. 1631–1649, Jun. 2008. [Online]. Available:
http://dx.doi.org/10.1162/neco.2008.04-07-510
[13] R. Salakhutdinov and G. Hinton, “Deep Boltzmann machines,” in
Proceedings of the International Conference on Artificial Intelligence
and Statistics, vol. 5, 2009, pp. 448–455.
[14] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast
learning algorithm for deep belief nets,” Neural Comput.,
vol. 18, no. 7, pp. 1527–1554, Jul. 2006. [Online]. Available:
http://dx.doi.org/10.1162/neco.2006.18.7.1527
[15] R. Salakhutdinov and G. Hinton, “An efficient learning procedure
for deep boltzmann machines,” Neural Comput., vol. 24, no. 8, pp.
1967–2006, Aug. 2012.
[16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
vol. 521, no. 7553, pp. 436–444, 05 2015. [Online]. Available:
http://dx.doi.org/10.1038/nature14539
[17] J. Schmidhuber, “Deep learning in neural networks: An overview,”
04 2014. [Online]. Available: http://arxiv.org/abs/1404.7828
[18] H. Larochelle and Y. Bengio, “Classification using discriminative
restricted Boltzmann machines,” in Proceedings of the Twenty-fifth
International Conference on Machine Learning (ICML’08), W. W.
Cohen, A. McCallum, and S. T. Roweis, Eds. ACM, 2008, pp.
536–543.
[19] Y. Bengio, A. Courville, and P. Vincent, “Representation learning:
A review and new perspectives,” 06 2012. [Online]. Available:
http://arxiv.org/abs/1206.5538
[20] W. W. Cohen, A. McCallum, and S. T. Roweis, Eds., Proceedings
of the Twenty-fifth International Conference on Machine Learning
(ICML’08). ACM, 2008.
[21] S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled
sampling for sequence prediction with recurrent neural networks,”
06 2015. [Online]. Available: http://arxiv.org/abs/1506.03099
[22] R. Sun, “Introduction to sequence learning,” in Sequence
Learning - Paradigms, Algorithms, and Applications. London,
UK, UK: Springer-Verlag, 2001, pp. 1–10. [Online]. Available:
http://dl.acm.org/citation.cfm?id=647073.713884
[23] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian Opti-
mization of Machine Learning Algorithms,” ArXiv e-prints, Jun. 2012.
[24] S. Rifai, Y. N. Dauphin, P. Vincent, Y. Bengio, and X. Muller,
“The manifold tangent classifier,” in Advances in Neural Information
Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett,
F. Pereira, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2011,
pp. 2294–2302. [Online]. Available: http://papers.nips.cc/paper/4409-
the-manifold-tangent-classifier.pdf
[25] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov, “Improving neural networks by preventing co-
adaptation of feature detectors,” CoRR, vol. abs/1207.0580, 2012.
[Online]. Available: http://arxiv.org/abs/1207.0580
[26] Y. Bengio, Y. Bengio, and S. Bengio, “Modeling high-dimensional
discrete data with multi-layer neural networks,” ADVANCES IN NEU-
RAL INFORMATION PROCESSING SYSTEMS 12, pp. 400–406,
2000.
[27] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proceedings of the IEEE,
vol. 86, no. 11, pp. 2278–2324, Nov 1998.

More Related Content

What's hot

Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Simplilearn
 
Ppt on artifishail intelligence
Ppt on artifishail intelligencePpt on artifishail intelligence
Ppt on artifishail intelligencesnehal_gongle
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
Neural networks of artificial intelligence
Neural networks of artificial  intelligenceNeural networks of artificial  intelligence
Neural networks of artificial intelligencealldesign
 
what is neural network....???
what is neural network....???what is neural network....???
what is neural network....???Adii Shah
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detailsonykhan3
 
Artificial neural networks seminar presentation using MSWord.
Artificial neural networks seminar presentation using MSWord.Artificial neural networks seminar presentation using MSWord.
Artificial neural networks seminar presentation using MSWord.Mohd Faiz
 
Artificial Neural Network (draft)
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)James Boulie
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
Neural networks
Neural networksNeural networks
Neural networksBasil John
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learninggrinu
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 

What's hot (20)

Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
 
Neural network
Neural networkNeural network
Neural network
 
Deep learning
Deep learning Deep learning
Deep learning
 
Ppt on artifishail intelligence
Ppt on artifishail intelligencePpt on artifishail intelligence
Ppt on artifishail intelligence
 
Project presentation
Project presentationProject presentation
Project presentation
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Neural networks of artificial intelligence
Neural networks of artificial  intelligenceNeural networks of artificial  intelligence
Neural networks of artificial intelligence
 
what is neural network....???
what is neural network....???what is neural network....???
what is neural network....???
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detail
 
Artificial neural networks seminar presentation using MSWord.
Artificial neural networks seminar presentation using MSWord.Artificial neural networks seminar presentation using MSWord.
Artificial neural networks seminar presentation using MSWord.
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Neural network
Neural networkNeural network
Neural network
 
Artificial Neural Network (draft)
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Neural networks
Neural networksNeural networks
Neural networks
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
 
Neural networks
Neural networksNeural networks
Neural networks
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 

Viewers also liked

Weather forecasting
Weather forecastingWeather forecasting
Weather forecastingsumirehan000
 
Persistent RNNs: Stashing Recurrent Weights On-Chip
Persistent RNNs: Stashing Recurrent Weights On-ChipPersistent RNNs: Stashing Recurrent Weights On-Chip
Persistent RNNs: Stashing Recurrent Weights On-ChipBaidu USA Research
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItHolberton School
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
Deep Learning in Natural Language Processing
Deep Learning in Natural Language ProcessingDeep Learning in Natural Language Processing
Deep Learning in Natural Language ProcessingDavid Dao
 
9/1 Top 5 Deep Learning
9/1 Top 5 Deep Learning9/1 Top 5 Deep Learning
9/1 Top 5 Deep LearningNVIDIA
 
The Forecast // Artificial Intelligence
The Forecast // Artificial IntelligenceThe Forecast // Artificial Intelligence
The Forecast // Artificial IntelligenceUsbek & Rica Trends
 
How Will Links Influence SEO in the Future
How Will Links Influence SEO in the FutureHow Will Links Influence SEO in the Future
How Will Links Influence SEO in the FutureRand Fishkin
 
Weather forecasting
Weather forecastingWeather forecasting
Weather forecastingESSBY
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendationsBalázs Hidasi
 
機械学習と深層学習の数理
機械学習と深層学習の数理機械学習と深層学習の数理
機械学習と深層学習の数理Ryo Nakamura
 
Fight Back Against Back: How Search Engines & Social Networks' AI Impacts Mar...
Fight Back Against Back: How Search Engines & Social Networks' AI Impacts Mar...Fight Back Against Back: How Search Engines & Social Networks' AI Impacts Mar...
Fight Back Against Back: How Search Engines & Social Networks' AI Impacts Mar...Rand Fishkin
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
 
Kimye North West pregnancy and baby stuff
Kimye North West pregnancy and baby stuffKimye North West pregnancy and baby stuff
Kimye North West pregnancy and baby stuffRomy Filby
 
Digital Intelligence Systems interview questions and answers
Digital Intelligence Systems interview questions and answersDigital Intelligence Systems interview questions and answers
Digital Intelligence Systems interview questions and answersbarabar906
 
L2G Facebook para Pymes - Introducción
L2G Facebook para Pymes - IntroducciónL2G Facebook para Pymes - Introducción
L2G Facebook para Pymes - Introducciónicontreras79
 
Confirmation limits
Confirmation limitsConfirmation limits
Confirmation limitsdescross
 
property in Neemrana-Ashu Group,7503367689
property in Neemrana-Ashu Group,7503367689property in Neemrana-Ashu Group,7503367689
property in Neemrana-Ashu Group,7503367689sahilkharkara5
 

Viewers also liked (20)

Weather forecasting
Weather forecastingWeather forecasting
Weather forecasting
 
Persistent RNNs: Stashing Recurrent Weights On-Chip
Persistent RNNs: Stashing Recurrent Weights On-ChipPersistent RNNs: Stashing Recurrent Weights On-Chip
Persistent RNNs: Stashing Recurrent Weights On-Chip
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
Deep Learning in Natural Language Processing
Deep Learning in Natural Language ProcessingDeep Learning in Natural Language Processing
Deep Learning in Natural Language Processing
 
9/1 Top 5 Deep Learning
9/1 Top 5 Deep Learning9/1 Top 5 Deep Learning
9/1 Top 5 Deep Learning
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
The Forecast // Artificial Intelligence
The Forecast // Artificial IntelligenceThe Forecast // Artificial Intelligence
The Forecast // Artificial Intelligence
 
How Will Links Influence SEO in the Future
How Will Links Influence SEO in the FutureHow Will Links Influence SEO in the Future
How Will Links Influence SEO in the Future
 
Weather forecast ppt
Weather forecast pptWeather forecast ppt
Weather forecast ppt
 
Weather forecasting
Weather forecastingWeather forecasting
Weather forecasting
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
機械学習と深層学習の数理
機械学習と深層学習の数理機械学習と深層学習の数理
機械学習と深層学習の数理
 
Fight Back Against Back: How Search Engines & Social Networks' AI Impacts Mar...
Fight Back Against Back: How Search Engines & Social Networks' AI Impacts Mar...Fight Back Against Back: How Search Engines & Social Networks' AI Impacts Mar...
Fight Back Against Back: How Search Engines & Social Networks' AI Impacts Mar...
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 
Kimye North West pregnancy and baby stuff
Kimye North West pregnancy and baby stuffKimye North West pregnancy and baby stuff
Kimye North West pregnancy and baby stuff
 
Digital Intelligence Systems interview questions and answers
Digital Intelligence Systems interview questions and answersDigital Intelligence Systems interview questions and answers
Digital Intelligence Systems interview questions and answers
 
L2G Facebook para Pymes - Introducción
L2G Facebook para Pymes - IntroducciónL2G Facebook para Pymes - Introducción
L2G Facebook para Pymes - Introducción
 
Confirmation limits
Confirmation limitsConfirmation limits
Confirmation limits
 
property in Neemrana-Ashu Group,7503367689
property in Neemrana-Ashu Group,7503367689property in Neemrana-Ashu Group,7503367689
property in Neemrana-Ashu Group,7503367689
 

Similar to A Study On Deep Learning

Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final ReportShikhar Agarwal
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paperAkashRanjandas1
 
Neural networks are parallel computing devices.docx.pdf
Neural networks are parallel computing devices.docx.pdfNeural networks are parallel computing devices.docx.pdf
Neural networks are parallel computing devices.docx.pdfneelamsanjeevkumar
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cseNaveenBhajantri1
 
Nature Inspired Reasoning Applied in Semantic Web
Nature Inspired Reasoning Applied in Semantic WebNature Inspired Reasoning Applied in Semantic Web
Nature Inspired Reasoning Applied in Semantic Webguestecf0af
 
BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxRiteshPandey184067
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks ShwethaShreeS
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementIOSR Journals
 
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATAAPPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATAIJDKP
 
Neural Networks.pptx
Neural Networks.pptxNeural Networks.pptx
Neural Networks.pptxshahinbme
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Neural Network
Neural NetworkNeural Network
Neural NetworkSayyed Z
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionIJCSIS Research Publications
 
machinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdfmachinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdfShivareddyGangam
 

Similar to A Study On Deep Learning (20)

deep learning
deep learningdeep learning
deep learning
 
Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final Report
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paper
 
Neural networks are parallel computing devices.docx.pdf
Neural networks are parallel computing devices.docx.pdfNeural networks are parallel computing devices.docx.pdf
Neural networks are parallel computing devices.docx.pdf
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
 
Nature Inspired Reasoning Applied in Semantic Web
Nature Inspired Reasoning Applied in Semantic WebNature Inspired Reasoning Applied in Semantic Web
Nature Inspired Reasoning Applied in Semantic Web
 
B42010712
B42010712B42010712
B42010712
 
BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptx
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In Management
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATAAPPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
 
Neural Networks.pptx
Neural Networks.pptxNeural Networks.pptx
Neural Networks.pptx
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Artifical Neural Network
Artifical Neural NetworkArtifical Neural Network
Artifical Neural Network
 
Neural Network
Neural NetworkNeural Network
Neural Network
 
ANN - UNIT 1.pptx
ANN - UNIT 1.pptxANN - UNIT 1.pptx
ANN - UNIT 1.pptx
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
 
machinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdfmachinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdf
 

More from Abdelrahman Hosny

Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Abdelrahman Hosny
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - ReviewAbdelrahman Hosny
 
Implementing a Caching Scheme for Media Streaming in a Proxy Server
Implementing a Caching Scheme for Media Streaming in a Proxy ServerImplementing a Caching Scheme for Media Streaming in a Proxy Server
Implementing a Caching Scheme for Media Streaming in a Proxy ServerAbdelrahman Hosny
 
Microsoft SharePoint 2010 Overview
Microsoft SharePoint 2010 OverviewMicrosoft SharePoint 2010 Overview
Microsoft SharePoint 2010 OverviewAbdelrahman Hosny
 
A Comparison of .NET Framework vs. Java Virtual Machine
A Comparison of .NET Framework vs. Java Virtual MachineA Comparison of .NET Framework vs. Java Virtual Machine
A Comparison of .NET Framework vs. Java Virtual MachineAbdelrahman Hosny
 
3.0 Introduction to .NET Framework
3.0 Introduction to .NET Framework3.0 Introduction to .NET Framework
3.0 Introduction to .NET FrameworkAbdelrahman Hosny
 
1.0 Introduction to Hardware Computer Architecture
1.0 Introduction to Hardware Computer Architecture1.0 Introduction to Hardware Computer Architecture
1.0 Introduction to Hardware Computer ArchitectureAbdelrahman Hosny
 
2.0 Introduction to Computer Science and Programming
2.0 Introduction to Computer Science and Programming2.0 Introduction to Computer Science and Programming
2.0 Introduction to Computer Science and ProgrammingAbdelrahman Hosny
 

More from Abdelrahman Hosny (17)

Teaching Philosophy
Teaching PhilosophyTeaching Philosophy
Teaching Philosophy
 
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
 
My Teaching Philosophy
My Teaching PhilosophyMy Teaching Philosophy
My Teaching Philosophy
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
 
Implementing a Caching Scheme for Media Streaming in a Proxy Server
Implementing a Caching Scheme for Media Streaming in a Proxy ServerImplementing a Caching Scheme for Media Streaming in a Proxy Server
Implementing a Caching Scheme for Media Streaming in a Proxy Server
 
A Servant Leader
A Servant LeaderA Servant Leader
A Servant Leader
 
Microsoft SharePoint 2010 Overview
Microsoft SharePoint 2010 OverviewMicrosoft SharePoint 2010 Overview
Microsoft SharePoint 2010 Overview
 
A Comparison of .NET Framework vs. Java Virtual Machine
A Comparison of .NET Framework vs. Java Virtual MachineA Comparison of .NET Framework vs. Java Virtual Machine
A Comparison of .NET Framework vs. Java Virtual Machine
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Office365
Office365Office365
Office365
 
The Silent Presentation
The Silent PresentationThe Silent Presentation
The Silent Presentation
 
Team Building
Team BuildingTeam Building
Team Building
 
Introduction to Marketing
Introduction to MarketingIntroduction to Marketing
Introduction to Marketing
 
Interviewing
InterviewingInterviewing
Interviewing
 
3.0 Introduction to .NET Framework
3.0 Introduction to .NET Framework3.0 Introduction to .NET Framework
3.0 Introduction to .NET Framework
 
1.0 Introduction to Hardware Computer Architecture
1.0 Introduction to Hardware Computer Architecture1.0 Introduction to Hardware Computer Architecture
1.0 Introduction to Hardware Computer Architecture
 
2.0 Introduction to Computer Science and Programming
2.0 Introduction to Computer Science and Programming2.0 Introduction to Computer Science and Programming
2.0 Introduction to Computer Science and Programming
 

Recently uploaded

Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...vershagrag
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 

Recently uploaded (20)

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

A Study On Deep Learning

  • 1. A Study On Deep Learning Abdelrahman Hosny Graduate Student, Master’s Computer Science University of Connecticut Email: abdelrahman@engr.uconn.edu Anthony Parziale Undergraduate Student, Junior Computer Science University of Connecticut Email: anthony.parziale@uconn.edu Abstract—With massive amounts of computational power, ma- chines can now recognize objects and translate speech in real time. Thanks to Deep Learning, Artificial Intelligence is now getting smart. Deep Learning models attempt to mimic the activity of the neocortex. It is understood that the activity of these layers of neurons is what constitutes a brain to be able to ”think.” These models learn to recognize patterns in digital representations of data in a very similar sense to humans. In this survey report, we introduce the most important concepts of Deep Learning along with the state of the art models that are now widely adopted in commercial products. 1. Introduction Machine learning is the science of getting computers to act without being explicitly programmed. It is the main engine to many of the modern software applications: from web searches to content filtering on social networks to recommendations on e-commerce websites and smartphone applications. Deep learning is a new area of machine learn- ing research, which has been introduced with the objective of moving machine learning closer to one of its original goals: Artificial Intelligence. When exploring the field of deep learning, it is easy to be overwhelmed with various models and in the process, lose sight of the end objective [1]. Researchers aim at utilizing deep learning models to make progress toward human-level AI. Many of them view deep learning is a direct extension to artificial neural networks, that are inspired by how the human brain works. In this survey, our aim is to provide a brief explanation of neural networks in addition to a concise explanation of the differing deep learning architectures, their objectives, and how they relate. In the next section, we start with the build- ing block of any deep learning architecture: Artificial Neural Networks. After that, we explore Deep learning models that generally either consist of a ”deep” neural network (more than 3 layers), or a stack of neural networks (where each layer in the deep architecture is in fact a neural network itself). In each model introduced, we shed the light on the purpose of the model and its architecture. At the end, we give practical tips on using each model and introduce some of the recent commercial applications that are empowered by deep learning models. 2. Artificial Neural Networks Artificial neural networks are a family of models that are inspired by biological neural networks. The idea behind artificial neural network is the observation that babies see adults moving around and after a few months, the accu- mulated knowledge start to stimulate them to make mini- pushups. This behavior encouraged neuroscientists to study the activity that happens in their brains to learn without being explicitly taught. In a similar analogy, computer sci- entists modeled the brain in a mathematical model called: artificial neural network. The question now is: how does the brain work? 2.1. Background Brains consist of neural cells. Each cell of these looks like the one in figure 1. In the body of the neuron, there is the nucleus that receives pulses of electricity from input wires (dendrites) and based on these signals, the neuron does some computation and sends a message (electrical impulses) to other neurons through output wires (axons). The human brain has billions of these neurons connected together. Different neurons in the brain are responsible for different senses, like the sight, smell and touch senses. It is scientifically observed that any neuron in the brain net can learn to do other jobs. For example, experiments on animals prove that if we disconnect the wires that connect an auditory neuron to the ears and connect it to the eyes, the Figure 1: Human Brain Neuron
  • 2. (a) Auditory cortex learns to see (b) Somatosensory cortex learns to see Figure 2: Neurons learn to do different tasks when original wires are disconnected and reconnected to other senses neuron will learn to see as in figure 2a. Similar experiments disconnect the somatosensory neuron connection to the hand and connect it to the eyes, it will eventually learn to see as in figure 2b. Now, let’s switch context to talk about mimicking this neural network in computers. In a software environment, we create a similar model that has the three major components: • Cell body that contains the neuron. This neuron is responsible for doing the computations. • Input wires that carry out signals as inputs. • Output wire(s) that transfer the output signal to other neurons. Figure 3 is a simple artificial (computer-one) neural network that has only one neuron (the orange circle). x1, x2, and x3 are the inputs to the neuron and they carry numerical values. The function h is called the hypothesis function. It computes its value by multiplying the input vector x by a wight vector w and then the output is passed through an activation function that computes the final scalar output Figure 3: Artificial neural network with one neuron Figure 4 shows a more advanced neural network. Each vertical set of neurons is called a layer. Layer 1 contains the neurons that represent inputs. Layer 2 is also called a hidden layer. It does the core computation. Layer 3 is called the output layer and does a computation on the data received from layer 2 and then outputs one final result. Now, the missing information in the one-neuron figure is: 1) What is the weight vector to be multiplied by the input vector? 2) After multiplying the two vectors, what is the acti- vation function that will output the final result? Besides the number of layers and the number of neurons in each layer, the answers to the above two questions are going to define the neural network model. If one could solve, or model, a specific mathematical problem by assigning values to the weight vector and choosing an appropriate activation function, the neural network model would satisfy its goal. Figure 4: Artificial neural network with two layers In practice, assigning weights and choosing an activation function is the most challenging part in designing a neural network. Therefore, computerized training procedures have been developed to let the software optimize the values of the weights. In the next two subsections, we discuss the back- propagation algorithm; the fundamental technique to train a neural network. 2.2. Activation Functions As stated in the previous subsection, each layer is com- posed of a set of neurons. The purpose of each neuron is to perform a non-linear transformation on the input. Using the network in Figure 3 as an example, input vector x will be multiplied by weight vector w. If N is the number of nodes in a layer, vector x will have a shape of [1, N] and vector w will have a shape of [N, 1]. Multiplying these two vectors will result in a scalar [1, 1] value. x = [x1, x2, ..., xn] (1)
  • 3. w =    w1 w2 ... wn    (2) x × w = xi ∗ wj = x1 ∗ w1 + x2 ∗ w2 + ... + xn ∗ wn (3a) y = x1 ∗ w1 + x2 ∗ w2 + ... + xn ∗ wn + bias (3b) As you can see from equation 3b, y represents a simple linear equation. Although interesting, this linearity serves no advantage over simple linear regression. If y were to be passed right onto the next layer’s nodes, we would say that it had a linear activation function. In fact, one can view a perceptron with a linear activation function as just that – linear regression! By passing y through non-linear activation functions, the network is able to truly represent any function. The following equations illustrate the most popular activation functions: Identity Figure 5: Identity: A(y) = y Binary Step Figure 6: Binary Step A(y) = 0 for y < 0 1 for y ≥ 0 From a biological standpoint, these Activation functions determine whether the neuron propagates a signal forward to a receiving neuron or not. Logistic Figure 7: Logistic A(y) = 1 1 + e−y TanH Figure 8: TanH A(y) = tanh(y) = 2 1 + e−2y − 1 Softsign A(y) = y 1 + |y| Figure 9: Softsign Rectified Linear Unit (ReLU) Figure 10: ReLU A(y) = 0 for y < 0 y for y ≥ 0
  • 4. 2.3. Backpropagation Algorithm A neural network is trained with the combination of two steps. The first step involves propagating the information forward through the activation functions. The previous sec- tion illustrated some of the most popular activation functions used for the nodes in a network. Once this first pass is completed, the model will produce an output. The error of the network represents how close this output was to the expected value. The second step in the training process involves adjusting the weights of the network in an attempt to minimize this error. As one can imagine, in a network where every layer is fully connected to the next, the number of weights that are produced is exponential. Therefore, min- imizing training error through the use of back propagation is a crucial need. Back propagation can be viewed as a clever use of the chain rule [2]. Figure 11: Demonstration of the chain rule Back Propagation propagates signals in the opposite direction. Starting at the output layer L, the error derivative is computed bases on all the input connections coming from the previous layer L−1. Stemming from the simple fact that the error of the output layer is the Ouput−Target, the error can then be ”recursively” defined, enabling fast training of the network. In reality, the error is usually defined as shown in the equation below. Etotal = Σ 1 2 (target − output)2 As you can see in figure 12, the error derivative of the unactivated input z to each layer is used to compute the error of the previous layer’s output. With the use of the true power of the matrix to perform many calculations in one step, neural networks are able to compute these error derivatives and update the weight matrices very fast. Back propagation was the key to finally being able to train and therefore utilize neural networks. Due to matrix operations, Back propagation can be parallelized to further decrease Figure 12: Back Propagation Algorithm training time, making deep neural networks possible. In fact, the emergence of the entire field of Deep Learning has been made possible with these advances to hardware. Bottom line, without the advent of Back propagation, neural networks would be borderline impossible to train efficiently. 2.4. Constraints of Neural Networks Although neural networks have proven to be very ef- ficient in many applications, research in cognitive neuro- science has revealed many important differences between brains and computers. Here, we list some of the major differences: • First, brains are analogue while computers are dig- ital. Brains transmit information at a rate that is essentially a continuous variable. Therefore, it is believed that to build a model that is absolutely identical to brains takes scientists either to build ana- logue computers (changing the whole computation model we know), or creatively develop a scheme for mapping continuous brain signals to the existing binary computing capabilities. • Second, brains retrieve information by content while computers retrieve them by address. For example, thinking of the word apple automatically stimulate your activation to think about other related fruits. In a computer, it is either the word apple is addressable and has a specific value or not. However, similar paradigms can be implemented in computers, mostly by building massive indices of stored data (like what Google does). • Third, while artificial neural networks are not ca- pable of storing in memory, processing and memory
  • 5. are performed by the same components in brains. In- spired by neurons memory, a model of deep learning has been developed called Long Short Term Memory (LSTM) that address this inability by introducing a technique to store information for longer time in artificial neurons (see section 3.3 below). Although the idea of artificial neural networks dates back to 1950s, their applications are now brought back to the table with the availability of large computational and storage powers. Computer scientists are continuously improving the models of neural networks to address different insuf- ficiencies. The evolving architectures are now called Deep Learning models, which will be the focus of the next section. 3. Deep Learning Models When exploring differing deep learning models, it is easy to tune into the ”buzzwords” that are frequently re- peated and lose sight of what the actual objective of the learning procedure is. It is easiest to divide the types of model into the following two categories: 1) Discriminant Architectures: these models char- acterize patterns based off posterior distributions of classes. This can be assimilated to techniques such as classification/regression. The paradigm of discriminant models is that for an input, produce an output. Discriminant models can be viewed as bottom up networks. Inputs are given and they propagate up through the network to produce out- puts. This is the main difference from its genera- tive counterpart that has no outputs. These models can be view as the Supervised Deep Learning. Examples for these models include Deep Neural Networks(neural networks with >2 layers), Convo- lutional Neural Networks (section 3.1), Recurrent Neural Networks (section 3.2), and Long Short Term Memory (section 3.3). 2) Generative Architectures: these models are em- ployed to discover high-order correlations in a given input. In these models, there are no classes/value to predict for the input data as seen in classification/regression techniques. The goal is to extract meaningful relationships between features in an effort to learn high-order features. Generative models can learn a distribution from training data and then be able to produce samples. The bottom layer of these networks generate a vector x. The goal is to train the model to give high probability to the training data. The reason these models are in fact called Generative is because they start from the top layer and aim to generate the inputs by propa- gating downwards through the network. The main domain of these architectures is therefore Unsuper- vised Feature Learning. Examples for these models include Restricted Boltzmann Machines (section 3.4), Deep Boltzmann Machines (section 3.6), Deep Belief Networks (section 3.5), and Auto-encoders (section 3.7). Figure 13: Common network architectures In each of the following subsections, we illustrate a deep learning model and its purpose. In general, discriminant architectures are trained with backpropagation whereas gen- erative architectures are trained with a modified free-energy method. Training procedures tend to vary on the generative side. Figure 13 illustrates some of the general schemes of network architectures. 3.1. Convolutional Neural Network (CNN) 3.1.1. Purpose. A CNN is primarily used for processing two dimensional data. Therefore, it is a prime candidate for data such as images and videos. In the area of image processing, a CNN (also called ConvNet) is able to extract high-order features from an image (such as horizontal edges, vertical edges, or color contrasts) and can lead to an impres- sive understanding of the content. Convolutional networks proved to be very efficient for learning representations of data. 3.1.2. Architecture. For simplicity, we start by describing the model on one-dimensional data then we move forward to see how the model is express its effectiveness on two- dimensional data. To classify a sample: x1, x2, x3, ... xn using a basic neural network, we connect all the inputs to a fully-connected layer where each input sample connects to each neuron in the hidden layer as in figure 14. Figure 14: Feeding input samples into a fully connected layer (denoted by F) in a basic neural network The architecture of CNNs follows a more sophisticated approach that notices a symmetry in the features it is looking
  • 6. for in the data. Therefore, we can create a group of neurons before the hidden layer that takes a segment of the data as in figure 15. This added layer is called a Convolutional Layer. The output from the convolutional layer is fed into the fully Figure 15: Adding a convolutional layer. Each A contains a group of neurons that are fully connected to a segment from the inputs. connected layer, which we previously added. Convolutional layer output can be fed into another convolutional layers, hence creating layers of convolutions. The idea of a con- volutional layer is to learn the appropriate feature filters as opposed to hand engineering them. To get a higher level representation of the data, a Pooling Layer is added after the convolutional layers. A pooling layer not only learns more abstract representations of the data, but also reduces the number of parameters that will be fed to the fully connected layer. For example, A max-pooling layer takes the maximum of features over small blocks of the the previous convolutional layer. Output from a pooling layers can also be fed into the input of another convolutional layer as in figure 16. Figure 16: Adding a max-pooling layer. The output is fed into another convolutional layer B. The same concepts are applied to two-dimensional in- puts such as images or videos. We can think of figure 17 from bottom to top as zooming out from the very specific details of the data representation toward the more general representation. A convolutional layer has A groups of neu- rons. Each group just feed on part of the two dimensional (e.g. a 5x5 pixel frame). As an example of face detection, a first convolutional neural layer learns representations of edges. After a first pooling layer, a second convolutional (a) 2-D input (b) A full 2-D input to a convolutional network Figure 17: A full convolutional neural network with two dimensional input. layer learns a more general representations of face parts such as the eye or the nose. After a second pooling layer, a third convolutional layer learns the most general representation to detect a human face. The output is then passed to a fully connected layer to produce the final classifications. In summary, a CNN is divided into two stages. The first is called the convolution layer. At this layer, each input has a filter applied to it. This filter is a function representing a certain transformation on the input data. The second stage is the pooling layer. This process consists of summing up neighborhoods in the output of the convolutional layer. These two alternating stages can be applied for as many layers as needed, each having a different filter A final fully connected layer is responsible for the classifications. This ensures that the model is able to detect high-order similari- ties within the data irrespective of orientation/rotation. 3.1.3. References. Refer to the following paper [3] and blog post [4] for a detailed illustration and studies of CNNs. 3.2. Recurrent Neural Network (RNN) 3.2.1. Purpose. A RNN is primarily used for processing data that come in the form of a sequence. Therefore, it is a prime candidate for speech recognition, language modeling and translation. One limitation of ConvNets is that they accept a fixed-size vector as input and produce a fixed-size vector as output, performing this mapping using a fixed amount of computational steps - the number of layers in the model and the number of units in each layer. The core difference in RNNs is that they operate over sequences of vectors in the input as well as the output. 3.2.2. Architecture. Traditional neural network (and Con- vNets) are memoryless. If a traditional neural network is to be used to classify what is the weather like from the forecast readings, it is unclear how the model will do that. They operate on a fixed size input and a fixed size output performing the computation using a pre-specified number of
  • 7. Figure 18: Recurrent neural network basic component. Left: a chunk of neural network A receives some input x and outputs a value h. Right: an unrolled RNN. hidden layers and units. Recurrent neural networks address this issue by introducing memory in the network in the form of a loop as in figure 18. You can think of an RNN as a stack of separate neural networks with some parameters of each network fed from the previous network; these parameters play the role of a memory. Inside each repeating module of the recurrent neural network, the input x at time-step t is concatenated with the output h at the time-step t-1 and together are passed through an activation function to result in the output h at the current time-step t. Figure 19 shows an unrolled illustration of the this behavior, where the yellow box represents a single neural network layer with a tanh activation function (other activation functions can be used as well). Figure 19: The repeating module in a RNN with tanh used as the activation function in the neural network. Although RNNs are simple in the way that they accept an input vector x and produce and output vector y, their effectiveness comes from the fact that the output vector’s content is influenced not only by the input x, but also by the entire history of that have been fed to the network in the past. The RNN has some internal state that gets updated every time an input is fed into the network. In the simplest case, this state is represented as a single hidden vector h. What happens when there is a long-term dependency? For example, a word in an essay is derived from a word in the previous paragraph. Unfortunately, the more the gap grows, RNNs become unable to learn to connect depen- dencies in the sequence. Figure 20 shows the long-term dependency problem in RNNs. Therefore, Long Short Term Memory (LSTM) models have been proposed to overcome this problem. LSTM are the subject of the next section. 3.2.3. References. Refer to the following paper [5] and blog post [6] for a detailed illustration and studies of RNNs. Figure 20: The output h at time t+1 depends on the input x at times 0 and 1. 3.3. Long Short Term Memory (LSTM) 3.3.1. Purpose. Long Short Term Memory networks are considered an improvement to recurrent neural networks that solves the problem of long-term dependency. Real-world implementations mostly depend on LSTM models rather than the basic RNNs. 3.3.2. Architecture. Like RNNs, LSTMs also have a a chain-like structure (when unrolled). However, instead of a single neural network layer in the repeating module as in figure 19, LSTMs have four neural network layers interact- ing in a special harmony as in figure 21. Figure 21: Four interactive neural network layers inside the repeating module of LSTM. Each line carries an entire vector from one node to another.The yellow boxes are neural networks with the indicated activation function. The pink circles resent point-wise operations like vector addition. Lines merging denote concatenation. Lines forking denote content being cloned to different locations. The core idea behind LSTMs is the horizontal line passing through the top of the module. The line represents a cell state that carries information along from one cycle to the next. Addition and multiplication gates control the information being stored (or not) in the cell state vector. Each of the four neural network layers is responsible for a specific functionality to be carried out in the cell. The operation of cell occurs in three steps as follows: • First: the first neural network layer from the left (also called forget gate layer) decides what information is going to be thrown away from the sate vector. The sigmoid layer looks at ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state vector; that passed through the top line. A
  • 8. 1 represents a ”completely keep this” decision and a 0 represents a ”completely removes this” decision. The output from the first layer is represented as ft below: ft = σ(Wf .[ht−1, xt] + bf ) • Second: the next two layers decide what new in- formation we are going to store in the cell state vector. The sigmoid layer (also called input gate layer) decides which values will be updated: it = σ(Wi.[ht−1, xt] + bi) and the tanh layer creates a vector of new candidate values, that could be added to the state: ˜Ct = tanh(WC.[ht−1, xt] + bC) The new cell state vector Ct is computed as follows: Ct = ft ∗ Ct−1 + ti ∗ ˜Ct • Third: the last layer in the cell computes the actual output ht. The output value is influenced by the last sigmoid layer as well as the new cell state vector that was just computed. ot = σ(Wo.[ht−1, x] + bo) ht = ot ∗ tanh(Ct) Although what is described so far is a normal LSTM, every paper involving LSTMs uses a slightly different architecture. A common variation is to let the above functions ft, it and ot consider the previous cell state vector Ct; a technique known as peephole connections. Other variations exist de- pending on the training task. Yet, all variations depend on the idea of the cell state vector that can carry information for long time, hence allowing long-term dependencies to be taken in consideration for prediction. 3.3.3. References. Refer to the following papers [7], [8] and generous blog post [9] for a detailed illustration and studies of LSTMs. 3.4. Restricted Boltzmann Machine (RBM) 3.4.1. Purpose. The first Generative architecture we will explore is the Restricted Boltzmann Machine. This is not to be confused with the Boltzmann Machine. Figure 22 illustrates the difference and the next section will explain the subtlty. A RBM is commonly utilized in Unsupervised Learning tasks such as Dimensionality Reduction, Feature Learning, and Collaborative Filtering. 3.4.2. Architecture. An RBM is composed of two layers, an input layer and a hidden layer. These layers have undi- rected connections between them. The restriction placed on a RBM is that no two nodes in the same layer can have a connection. This is the differentiator between a Boltzmann Machine and a Restricted Boltzmann Machine. The former has existed for many years but it was not until this slight Figure 22: As you can see, the Boltzmann Machine includes intra-layer connections whereas the RBM is limited to only having inter-layer connections. modification created the latter that this theoretical model was utilizable. Without the restriction of having any intra- layer connections, a Boltzmann Machine is completely un- trainable and essentially folds into chaos. We can therefore define an RBM formally as a two layer neural network with many inter-layer, but no intra-layer connections. Each connection bears a weight that is trained during the learning procedure. By adjusting these weights, an RBM can fit its parameters(hidden layer nodes) to represent the distribution of the training data. Once this hidden layer is trained, one can generate samples that fit the distribution of the training data. This technique has been used to compensate for a scarce amount of available data in certain fields. Figure 23: The architecture of a RBM. The shaded nodes represent the visible input layer and the white nodes represent the hidden layer. 3.4.3. Training. The training procedure for a RBM has a few differentiators over the methods used for Discriminant models. While the final step in the procedure entails per- forming stochastic gradient descent to decrease the error rate, the means by which the error is computed differs. In an RBM, a procedure called Contrastive Divergence is used. In simplest terms, each iteration can be broken down into 3 phases. The hidden layer is created from the input layer based on probabilities minimizing the free energy of the
  • 9. model. This will create a hidden layer with certain activa- tions based on that minimization function. This is called the Positive Phase. The next phase is called the Negative Phase. The input layer is the reconstructed based on this hidden layer. This newly constructed layer is then propagated back to the hidden layer to create a new set of activations. The third phase is the Update phase where the hidden layer in the Positive phase, and both the reconstructed input and second created hidden layer in the Negative phase, are used to determine the error and update the weights to minimize this term. All in all, this learning procedure requires hands-on experience to master. There are many hyper-parameters such as the learning rate, momentum, weight-cost, sparsity target, initial values of the weights, number of hidden units, and size of each batch [10]. For each specific application, a specific set of hyper- parameters must be set. This is the art of training RBM’s and there is no right or wrong way to set them, and only through trial and error can one determine the correct set. 3.4.4. References. Refer to the following papers [11], [12], [13] for a detailed illustration and studies of RBMs. 3.5. Deep Belief Network (DBN) 3.5.1. Purpose. Deep Belief Networks are utilized for learn- ing a representation of some input data. Their purpose is very similar to the RBM and in practice, researchers rarely use RBMs anymore. The DBN can be viewed as the logical next step in the timeline of the development of the RBM. It is the next iteration and improvement to this type of model and has been widely accepted as a replacement for the RBM. Some have argued that since RBM’s have the representational power for any function approximation, what is the use of DBNs? Further research has only been able to conclude that by adding an additional layer, the information gain must be positive over a more shallow one. This implies that there is no harm in adding an additional layer and from our understanding, it is to able to detect higher level abstractions in the data. 3.5.2. Architecture. A Deep Belief Networks is a stack of feedforward RBMs. The output of layer k, which is the hidden layer of an RBM, is the input of the next layer’s RBM. The motivation of this architecture is the idea that an efficient way to learn a complicated model is to combine a set of simpler models that are learned sequentially. We believe that by adding layers to the DBN as opposed to adding nodes to the hidden layer of a RBM allows the model to become more flexible, more representable, and less dependent on the number of nodes in each hidden layer. This requires less manual feature engineering and allows the Neural Net to ”work its magic.” We believe this makes the DBN a more preferable model over the RBM. 3.5.3. Training. The study of training DBNs has filled many research papers and cannot be properly explained to the Figure 24: This network represents the architecture of a Deep Belief Net. Each pair of layers represents a RBM. As explained, each RBM’s hidden layer is fed into the input layer of the next RBM. extent required in the scope of this paper. But more gen- erally, training is performed in a greedy layer-wise fashion. To summarize, all of the learning involved is localized. By performing a greedy-layer wise procedure, the network can train iteratively and the complexity becomes manageable. This layer-by-layer unsupervised learning algorithm consists of learning a stack of RBM’s, one RBM at a time, and is illustrated in Figure 25. The first step consists of training the first layer as an RBM that models the input. This first layer is used as the input layer for the second layer. This part is general modified by choosing only mean activations or by sampling. This process is repeated for as many layers as desired, each time propagating forward the hidden layer of the previously trained RBM. The parameters(weights) of the model are then updated on this deep architecture with respect to the log-likelihood. In supervised training scenarios, a target output can be substituted for the error term instead of the log-likelihood. 3.5.4. References. Refer to the following paper [14] for a detailed illustration and studies of DBNs. 3.6. Deep Boltzmann Machine (DBM) 3.6.1. Purpose. Deep Boltzmann Machines can be views as multi-layer RBM’s. In contrast to the RBM network being limited to one hidden layer, Deep Boltzmann Machines can have many. This allows the weights to be visible to other lay- ers and forms a more complex version of the RBM. DBM’s have the potential of learning increasingly complex internal
  • 10. Figure 25: This figure represents the layer-wise training procedure of a Deep Belief Network. Each RBM is trained, stacked, and their hidden layer is fed to the input layer of the next RBM. representations of the data, which is needed in the fields of speech recognition and object assimilation. In practice, however, a DBM is rarely used and often is substituted with the more promising, and trainable, DBN. We include this ex- planation of the architecture purely as a reference so readers can differentiate the terms and understand the difference in architectures between Deep Boltzmann Machine and Deep Belief Networks. 3.6.2. Architecture. Although very similar to the archi- tecture of a DBN, the architecture of a Deep Boltzmann Machine has one striking difference. Instead of having di- rected connections between each stacked RBM, a DBM has undirected connections between each layer. This implies that weights are shared throughout the entire model as opposed to the more layer-wise approach of a DBN. The difference is illustrated in Figure 26 A DBN is a stacked of connected RBMs. A Deep Boltzmann Machine is a RBM with multiple hidden layers. This implies a fundamental difference in the training procedure. We will not cover the training procedure for a DBM because it is out of the scope of this literature survey, but bear in mind that it entails factoring in the weights of not just one direction of inputs because signals can be propagating from both directions of the network. When comparing the two models, DBNs can be viewed as a stack of RBMs whereas the DBM is a hybrid version of the RBM. 3.6.3. References. Refer to the following paper [15] for a detailed illustration and studies of DBMs. 3.7. Auto-encoders 3.7.1. Purpose. Auto-encoders are neural networks that aim to learn a compressed representation, or encoding, of the input data. The model is considered Generative because it is trained to recreate the input data from its hidden layer. Auto-encoders are great for Dimensionality Reduction and are have spawned serious interest recently. Figure 26: Although each layer is a stacked RBM, the direction of the connections between layers in Deep Belief Networks and Deep Boltzmann Machines differ. 3.7.2. Architecture. Auto-encoders have a unique architec- ture. They are designed to have three layers. The first is the input layer. The third layer is the output layer. This is shown in Figure 27. The middle hidden layer between these two is called the Feature layer. The input and output layers of an Auto-encoder are intended to be same after training. This middle layer that serves as an encoder of the input data. This middle layer’s dimensionality can be greater than or less than to the input layer depending on application. In the case of the feature layer having a dimensionality less than the input layer, this model is excellent at performing dimen- sionality reduction. The real focus on these models is the Feature layer that is created during training. Since the input and output layers will be the same, they are of no interest besides for training purposes. The middle layer represents an encoding of the data. Architectures such as stacked Auto- Encoders link these Feature Layers in a stacked fashion to create higher level abstractions of the data as well. This methodology of stacking neural networks to create high level understanding of the data is the key to Deep Learning. By allowing the network to have more representations, more correlations are able to be automatically detected. This is why this model is called an Auto-Encoder. 3.7.3. Training. Training can be conceptualized as the net- work trying to ”recreate” the data. The network receives its inputs and feeds this to the Feature layer. The first part of the training process is called the encoding phase. The input data from the first layer is encoded into the Feature layer through adjustable weights. Each node in the Feature layer then propagates a signal forward and, with the assistance of adjustable weights and biases, maps this encoded representation back to its original un-encoded state. This is referred to as the decoding phase. To summarize, data is fed into the input layer, encoded in the feature layer, then decoded into the output layer. Error is determined by comparing this output value to the inputted value, as they
  • 11. Figure 27: The architecture of a Auto-Encoder. As you can see, the first and last layers are the same. The middle layer represents the Features(encoding) learned during training should be exactly the same. 3.7.4. References. Refer to the following paper [16] for a detailed illustration and studies of Auto-encoders. 4. Choosing a Model With the growing number of variations of deep learning models, it is important to choose a model that is suitable for the task in hand. Many factors contribute to choosing a model that can effectively represent a solution to the task in hand. • First, study the dataset in hand. • Second, decide whether you want to do a classifica- tion, a prediction, or learn about data representation. • Third, choose a model and try out different varia- tions of it until you reach the desired objective. In table 1, we summarize decision factors for the surveyed models in this study. These models cover a wide variety of domains. Other models fall in one of these general architectures. 5. Applications At the time we do this extensive survey over deep learning models, researchers from different labs are utilizing these approaches in a myriad of real world applications and achieve state of the art performance. In this section, we shed the light on some of the trending projects being sponsored by large tech companies. 5.1. Facebook’s DeepFace Uploading a picture with your friends to Facebook au- tomatically suggests tagging your friends in the picture by recognizing their faces. Closing the gap to a human-level performance in face verification is the main research focus at Facebook’s DeepFace. They derive face representation from a nine-layer deep neural network that involves more than 120 million parameters using several locally connected layers without weight sharing. The model was trained on a four million facial images belonging to more than 4000 entities, and the result is: the most powerful face recognition module we see in the largest social network in the world! 5.2. Google’s DeepMind Founded in London in 2010 and acquired by Google in early 2014, DeepMind algorithms are capable of learning for themselves directly from raw experience or data, and are general in that they can perform well across a wide variety of tasks straight out of the box. Their team consists of many renowned experts in their respective fields, including but not limited to deep neural networks, reinforcement learning and systems neuroscience-inspired models. One recent remark- able achievement is AlphaGo – the first computer program to ever beat a professional player of Go. It was a tremendous milestone when we have seen a computer brain, powered by deep learning models, beats a human brain. create a probabilistic reconstruction of data feature detectors 5.3. Apple’s Siri Siri (Speech Interpretation and Recognition Interface) is Apple’s intelligent personal assistant that comes pre- installed with their iPhone devices. Siri’s primary technical areas focus on a conversational interface, personal context awareness and service delegation. In the core of the conver- sational interface resides a strong speech recognition engine, powered by deep learning models, that learns a user’s accent and adapts itself to it to respond with better results. The power of Siri comes not only from the speech recognition engine, but also from other machine learning models that can carry a full conversation between the user and the device relying on a set of web services.
  • 12. Model CNN RNN LSTM RBM DBN Auto-encoder Type Discriminative Discriminative Discriminative Generative Generative Generative Purpose Classification Prediction Prediction Unsupervised Feature Unsupervised Feature Dimensionality Learning Learning Reduction Suitable for Processing two- Processing sequence Processing long Learning distributions Creating a probabilistic Creating a compact dimensional data data sequence data of data reconstruction of data representation of data Example Images/Videos Language models Language models Generate samples from Trained layer used PCA-like and Speech and Speech learned hidden representations as feature detectors tasks TABLE 1: Summary of the surveyed deep learning models 5.4. Microsoft’s Cortana Analogous to Siri, Cortana is the clever personal as- sistant developed by Microsoft that helps you find things on your PC, manage your calendar, track packages, find files, chat with you, and tell jokes. Cortana learns the user behavior through a deep learning model in the sense that the more you use Cortana, the more personalized your ex- perience will be. Cortana depends heavily on understanding a user’s query and takes actions based on this request. A set of deep learning language models enable setting reminders, making calls, sending emails and answering questions when requested by the user. Cortana is a massive improvement in the field of artificial intelligence in human-computer interaction. 6. Current Research Directions As demonstrated, deep learning is a vast field [3], [17]. Following the theoretical claim that -with enough hidden nodes- a model can be trained to represent any function or distribution, we are seeing a re-emergence of many classical techniques of machine learning, especially with the increasing improvements in computational resources. On the discriminant side of models, there is an aggres- sive push towards memory-based models. As shown, one successful model is the LSTM. But in no way is that all that has been proposed. Memory Networks, Neural Turing Networks, and Hierarchical Temporal Memory are all similar memory-based deep neural networks. The advantage of this direction is that the networks are able to retain state through- out their lifetimes. The goal of these networks is to enable tasks such as sequence learning and reinforcement learning to be representable and trainable. These tasks require the use of memory to be able to utilize previously seen inputs and correlations in future models. In our opinion, reinforcement learning will be the heavy focus of deep learning in the next few years. We are seeing paradigm shift occurring as data scientists are now realizing that deep neural networks can be used as function approximations in reinforcement learning algorithms. We believe this has a lot of potential and will be pursuing research in this area in the future. On the generative side of models, there has been a re- emerging interest in the past few years. Hinton has dropped a bomb and ignited the entire field of deep learning in his influential paper [14] about a generative architecture. The focus then shifted to the more shiny side of unsupervised learning. With the explosion of unlabeled data spouting from the various sources of big data, the need to improve these unsupervised deep learning models has been growing. The focus on these models varies from being a pre-training step to be fed forward to a discriminant model to more ”all-in- one” hybrid solutions. There is some past research in the area of discriminant RBM’s and it’s variations highlighted in Bengio’s paper [18] that we believe will be useful in truly harnessing the representational power of these typically generative models. 7. Conclusion This concludes our survey of the field of deep learning. To summarize in one statement, we believe deep learning can be viewed as the art of utilizing deep neural network structures to represent any machine learning task. Although some of the theoretical strengths of neural networks has been claimed since the 50’s, the recent advancement of computer hardware have made these hypothesis’ verifiable. What we are now seeing a complete redefinition of the tasks that have been stapled in the field of machine learning, and the broader domain of artificial intelligence. We view these recent advancements as the beginning of the era of truly thinking computers. Whereas old machine learning techniques such as SVMs, clustering, PCA, ect. are each based on certain statistical characteristics of the data, neural networks can be viewed as a digital muscle that can be strengthen in a certain manner to represent any of those models. In our own opinion, old ML techniques can be viewed as discrete learning methods whereas deep learning is more of a continuous learning method. A simple example of this is when comparing a stack of linear regressions on top of each other as opposed to a deep neural network. Ultimately, a stack of linear regressions is still going to be linear no matter what. The equation may have a completely different slope and bias, and be able to represent an arbitrary function, but the capabilities are limited. As demonstrated, neural networks are not bound by this linearity. The usage of a nonlinear activation function boosts the representational power of certain models so high that there are theoretical
  • 13. claims that deep learning architectures can be learned to rep- resent any distribution or function [19]. This representational power stems from the differentiation of network structures into discriminant and generative architectures. To state that the emergence of the field of deep learning has correlated with the rise of performance of computer hardware does not illustrate the dependence quite enough. If one thing was clear from our research, it was that the techniques of deep learning are the most computationally intensive problems that computers have been introduced to. It is no understatement that deep learning is a field that models its methods based on the world’s most power processor – the human brain. With a strong rooted foun- dation in neuroscience, we have no doubt that the models developed by deep learning researchers will aid and push the sister field. There is an innate link between the research neuroscientists are performing on the brain to understand how the human mind works and the work deep learning experts are undergoing to emulate this process. We be- lieve that only by the further integration of the fields of deep learning and neuroscience, seen in models such as Hierarchical Temporal Memory, true general intelligence can be realized. As such computationally intensive software methods are created, hardware will continue to push the boundaries of what is considered possible. References [1] Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009, also published as a book. Now Publishers, 2009. [2] ——, “Practical recommendations for gradient-based train- ing of deep architectures,” 06 2012. [Online]. Available: http://arxiv.org/abs/1206.5533 [3] I. Arel, D. C. Rose, and T. P. Karnowski, “Deep machine learning - a new frontier in artificial intelligence research [research frontier],” IEEE Computational Intelligence Magazine, vol. 5, no. 4, pp. 13–18, Nov 2010. [4] O. Christopher, “Conv nets: A modular perspective.” [5] I. Sutskever, “Training recurrent neural networks,” Ph.D. dissertation, Toronto, Ont., Canada, Canada, 2013, aAINS22066. [6] K. Andrej, “The unreasonable effectiveness of recurrent neural net- works.” [7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735 [8] K. Greff, R. K. Srivastava, J. Koutn´ık, B. R. Ste- unebrink, and J. Schmidhuber, “LSTM: A search space odyssey,” CoRR, vol. abs/1503.04069, 2015. [Online]. Available: http://arxiv.org/abs/1503.04069 [9] O. Christopher, “Understanding lstm networks.” [10] G. E. Hinton, Neural Networks: Tricks of the Trade: Second Edition. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, ch. A Practical Guide to Training Restricted Boltzmann Machines, pp. 599–619. [11] ——, “Deterministic boltzmann learning performs steepest descent in weight-space,” Neural Comput., vol. 1, no. 1, pp. 143–150, Mar. 1989. [Online]. Available: http://dx.doi.org/10.1162/neco.1989.1.1.143 [12] N. Le Roux and Y. Bengio, “Representational power of restricted boltzmann machines and deep belief networks,” Neural Comput., vol. 20, no. 6, pp. 1631–1649, Jun. 2008. [Online]. Available: http://dx.doi.org/10.1162/neco.2008.04-07-510 [13] R. Salakhutdinov and G. Hinton, “Deep Boltzmann machines,” in Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 5, 2009, pp. 448–455. [14] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, Jul. 2006. [Online]. Available: http://dx.doi.org/10.1162/neco.2006.18.7.1527 [15] R. Salakhutdinov and G. Hinton, “An efficient learning procedure for deep boltzmann machines,” Neural Comput., vol. 24, no. 8, pp. 1967–2006, Aug. 2012. [16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 05 2015. [Online]. Available: http://dx.doi.org/10.1038/nature14539 [17] J. Schmidhuber, “Deep learning in neural networks: An overview,” 04 2014. [Online]. Available: http://arxiv.org/abs/1404.7828 [18] H. Larochelle and Y. Bengio, “Classification using discriminative restricted Boltzmann machines,” in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08), W. W. Cohen, A. McCallum, and S. T. Roweis, Eds. ACM, 2008, pp. 536–543. [19] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” 06 2012. [Online]. Available: http://arxiv.org/abs/1206.5538 [20] W. W. Cohen, A. McCallum, and S. T. Roweis, Eds., Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08). ACM, 2008. [21] S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” 06 2015. [Online]. Available: http://arxiv.org/abs/1506.03099 [22] R. Sun, “Introduction to sequence learning,” in Sequence Learning - Paradigms, Algorithms, and Applications. London, UK, UK: Springer-Verlag, 2001, pp. 1–10. [Online]. Available: http://dl.acm.org/citation.cfm?id=647073.713884 [23] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian Opti- mization of Machine Learning Algorithms,” ArXiv e-prints, Jun. 2012. [24] S. Rifai, Y. N. Dauphin, P. Vincent, Y. Bengio, and X. Muller, “The manifold tangent classifier,” in Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2011, pp. 2294–2302. [Online]. Available: http://papers.nips.cc/paper/4409- the-manifold-tangent-classifier.pdf [25] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Improving neural networks by preventing co- adaptation of feature detectors,” CoRR, vol. abs/1207.0580, 2012. [Online]. Available: http://arxiv.org/abs/1207.0580 [26] Y. Bengio, Y. Bengio, and S. Bengio, “Modeling high-dimensional discrete data with multi-layer neural networks,” ADVANCES IN NEU- RAL INFORMATION PROCESSING SYSTEMS 12, pp. 400–406, 2000. [27] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov 1998.