An introduction to Deep Learning

Who am I?
• David Rostcheck
• I am a data science
consultant
• Follow my articles on
LinkedIn

in some tests, Deep Learning has already
shown abilities at the same level as humans

These include:
• computers that understand natural
language
• autonomous vehicles
• programs that can identify what is
occurring in a video

It’s notable that
these solutions to diverse
in very different
use the same powerful technology

a neural net is a
simulation
of the brain,
a mathematical abstraction

in the real brain,
the neurons send signals with
fre cuen cies
not discrete signals

tools exist that try to simulate the brain in a
way that’s
more accurate
to the real brain

Example: Numenta NuPIC, a type of Hierarchical
Temporal Memory (HTM)

but the techniques of neural nets
are sufficient
to deliver results
similar or better than humans
in specific cognitive tests

therefore:
Deep Learning
what is it?

common point of view:
a with
neural distinct
net levels
is correct, but…

there is another point of view,
maybe more useful,
that we are going to present here

it comes from Vincent Vanhoucke, Principal
Research Scientist at Google.
the following
comes from
his course on
Deep
Learning, on
Udacity

He thinks about Deep Learning as
a framework for calculating
linear and almost linear
equations in an efficient way

to develop this framework,
we are going to construct a
classifier
the simplest (and worst)
possible

but wait a minute…
why
a classifier?

Because classification (or more
generally prediction) is a central
technique in Machine Learning
with this, we can achieve ranking,
regression, detection, reinforcement
learning, and more…

we start with a linear equation, in vector
form…

Think about constructing a simple classifier to
predict, for each occurrence of X, which is:

to do this, we must learn the values of W and b

No. 1:
it gives values,
and what we want
are probabilities

we can fix it with the“softmax” function:

we express the
correct values in a
vector of values 1
(correct) and 0 (the
others).
we call this“one-hot
encoding”

to evaluate errors, we
compare the probabilities
with the correct values

using what we call“cross-entropy”

better, but…
there remains the second problem:
our equation is linear
and doesn’t represent non-linear
equations well

this problem killed the perceptron (single
level neural net)

it doesn’t help to just add levels to the network
because we can represent whatever combination
of linear operations as another linear
operation – we can reduce the new network to
another WX + b with the same problem 

without another option,
we have to introduce non-linear
functions
logistic
function

but it’s expensive to calculate – we can use a
simplified approximation called a “Rectified
Linear Unit” , o ReLU

now we can construct our neural net, in a way
that’s efficient to calculate

we can express this in a modular , with a
series of linear or almost linear operations with
a matrix ... that allows us to us the power of a
GPU

this is good, but we are still lacking
something…
to improve our estimation, we must
minimize the error,
and this requires us to calculate the
derivative of the function

think about the chain rule of calculus:
d f(x) = d du f(x)
dx du dx

that can convert a derivative into a
product (of other derivatives):

that fits in our modular framework 

now we have it! a general, modular
framework that incorporates
everything we need!

and we can construct deep neural nets,
adding more levels as we need them
…but wait a minute:
why do we like deep networks?

the most interesting problems,
like language and vision,
have very complex rules
we need a lot of parameters to represent
them

yes, but why don’t we use wider
networks?
why is it better to have deep ones?

are more efficient and better capture the structure
inherent in many problems

the convolutional network, or convnet,
transforms the input
so that the translation
of the input does not matter
we use it for visual recognition

We use a region (kernel) of a photo like an input to
another small neural net, with K as the output

we slice the window across the photo

this transforms the photo into another new one,
with K color channels, and different dimensions

this operation is called
a convolution

if the region (the “kernel”) has
the same size as the original,
what did we obtain?
?

in this case,
we recover the original photo

Questions?
Contact: drostcheck@leopardllc.com, twitter: @davidrostcheck
Articles: http://linkedin.com/in/davidrostcheck

An introduction to Deep Learning

More Related Content

What's hot

Viewers also liked

Similar to An introduction to Deep Learning

More from David Rostcheck

Recently uploaded

An introduction to Deep Learning