An introduction to Deep Learning
Who am I?
• David Rostcheck
• I am a data science
consultant
• Follow my articles on
LinkedIn
DEEP LEARNING
in some tests, Deep Learning has already
shown abilities at the same level as humans
These include:
• computers that understand natural
language
• autonomous vehicles
• programs that can identify what is
occurring in a video
It’s notable that
these solutions to diverse
in very different
use the same powerful technology
NEURAL NET
a neural net is a
simulation
of the brain,
a mathematical abstraction
in the real brain,
the neurons send signals with
fre cuen cies
not discrete signals
tools exist that try to simulate the brain in a
way that’s
more accurate
to the real brain
Example: Numenta NuPIC, a type of Hierarchical
Temporal Memory (HTM)
but the techniques of neural nets
are sufficient
to deliver results
similar or better than humans
in specific cognitive tests
therefore:
Deep Learning
what is it?
common point of view:
a with
neural distinct
net levels
is correct, but…
there is another point of view,
maybe more useful,
that we are going to present here
it comes from Vincent Vanhoucke, Principal
Research Scientist at Google.
the following
comes from
his course on
Deep
Learning, on
Udacity
He thinks about Deep Learning as
a framework for calculating
linear and almost linear
equations in an efficient way
to develop this framework,
we are going to construct a
classifier
the simplest (and worst)
possible
but wait a minute…
why
a classifier?
Because classification (or more
generally prediction) is a central
technique in Machine Learning
with this, we can achieve ranking,
regression, detection, reinforcement
learning, and more…
we start with a linear equation, in vector
form…
Think about constructing a simple classifier to
predict, for each occurrence of X, which is:
to do this, we must learn the values of W and b
Does it work well?
No.
It’s the worst.
Why?
there are two problems…
No. 1:
it gives values,
and what we want
are probabilities
we can fix it with the“softmax” function:
we express the
correct values in a
vector of values 1
(correct) and 0 (the
others).
we call this“one-hot
encoding”
to evaluate errors, we
compare the probabilities
with the correct values
using what we call“cross-entropy”
better, but…
there remains the second problem:
our equation is linear
and doesn’t represent non-linear
equations well
this problem killed the perceptron (single
level neural net)
it doesn’t help to just add levels to the network
because we can represent whatever combination
of linear operations as another linear
operation – we can reduce the new network to
another WX + b with the same problem 
What do we do?
without another option,
we have to introduce non-linear
functions
logistic
function
but it’s expensive to calculate – we can use a
simplified approximation called a “Rectified
Linear Unit” , o ReLU
now we can construct our neural net, in a way
that’s efficient to calculate
we can express this in a modular , with a
series of linear or almost linear operations with
a matrix ... that allows us to us the power of a
GPU
this is good, but we are still lacking
something…
to improve our estimation, we must
minimize the error,
and this requires us to calculate the
derivative of the function
think about the chain rule of calculus:
d f(x) = d du f(x)
dx du dx
that can convert a derivative into a
product (of other derivatives):
that fits in our modular framework 
now we have it! a general, modular
framework that incorporates
everything we need!
and we can construct deep neural nets,
adding more levels as we need them
…but wait a minute:
why do we like deep networks?
the most interesting problems,
like language and vision,
have very complex rules
we need a lot of parameters to represent
them
yes, but why don’t we use wider
networks?
why is it better to have deep ones?
are more efficient and better capture the structure
inherent in many problems
CONVNETS
the convolutional network, or convnet,
transforms the input
so that the translation
of the input does not matter
we use it for visual recognition
Let’s start with a photo:
We use a region (kernel) of a photo like an input to
another small neural net, with K as the output
we slice the window across the photo
this transforms the photo into another new one,
with K color channels, and different dimensions
this operation is called
a convolution
if the region (the “kernel”) has
the same size as the original,
what did we obtain?
?
in this case,
we recover the original photo
Questions?
Contact: drostcheck@leopardllc.com, twitter: @davidrostcheck
Articles: http://linkedin.com/in/davidrostcheck

An introduction to Deep Learning