Deep Learning A-Z™: Artificial Neural Networks (ANN) - Module 1

What we will learn in this section:
• The Neuron
• The Activation Function
• How do Neural Networks work? (example)
• How do Neural Networks learn?
• Gradient Descent
• Stochastic Gradient Descent
• Backpropagation

Image Source: www.austincc.edu

Image Source: Wikipedia

Image Source: Wikipedia
Neuron
Axon
Dendrites

neuron
Node

Input signal 1
Input signal 2
Input signal m
neuron

Input signal 1
Input signal 2
Input signal m
Output signalneuron

neuron
Input value 1
Input value 2
Input value m
X1
X2
Xm
Output signal
Synapse

neuron
Input value 1
Input value 2
Input value m
X1
X2
Xm
y Output value
Synapse

neuron
X1
X2
Xm
Output value
Independent
variable 1
Independent
variable 2
Independent
variable m
Input value 1
Input value 2
Input value m
y
Standardize

Efficient BackProp
By Yann LeCun et al. (1998)
Link:
http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
Additional Reading:

neuron
Input value 1
Input value 2
Input value m
Output value
X1
X2
Xm
y Output value
Can be:
• Continuous (price)
• Binary (will exit yes/no)
• Categorical
Independent
variable 1
Independent
variable 2
Independent
variable m

neuron
Input value 1
Input value 2
Input value m
Output value 1
Output value 2
Output value p
X1
X2
Xm
yy2
y1
y3
Independent
variable 1
Independent
variable 2
Independent
variable m

neuron
Single Observation
Same observation
X1
X2
Xm
y
Input value 1
Input value 2
Input value m
Output value
Independent
variable 1
Independent
variable 2
Independent
variable m
Single Observation

neuron
Input value 1
Input value 2
Input value m
Output value
X1
X2
Xm
y
w1
w2
wm

?neuron
Input value 1
Input value 2
Input value m
y
X1
X2
Xm
w1
w2
wm
Output value

Input value 1
Input value 2
Input value m
y
1st step:
X1
X2
Xm
w1
w2
wm
Output value

Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value

Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
3rd step

y
0
1

0
y
Threshold Function
1

y
1
0
Sigmoid

0
y
Rectifier
1

0
y
1
-1
Hyperbolic Tangent (tanh)

Deep sparse rectifier
neural networks
By Xavier Glorot et al. (2011)
Link:
http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf
Additional Reading:

Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
3rd step
If threshold activation function:
If sigmoid activation function:
Assuming the DV is binary (y = 0 or 1)

y
4
5
6
7
Input value 1
Input value 2
Input value m
Output value
Input
Layer
Hidden
Layer
Output
Layer
X2
X1
Xm

.
Output Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
w1
w2
w3
w4
Price = w1*x1+ w2*x2+ w3*x3+ w4*x4
Input Layer
X4
X3
X2
X1
y

Input Layer
X4
X3
X2
X1Area (feet2)
Bedrooms
Age
y
Hidden Layer Output Layer
.

Input Layer
X4
X3
X2
X1Area (feet2)
Bedrooms
Age
y

Input Layer
Area (feet2)
Bedrooms
Age
y
X4
X3
X2
X1

Input Layer
Area (feet2)
Bedrooms
Age
X4
X3
X2
X1
y

Input Layer
Area (feet2)
Bedrooms
Age
X4
X3
X2
X1
y Price

y
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
y Actual value
ŷ

Input value 1
Input value 2
Input value m
ŷ
X1
X2
Xm
w1
w2
wm
Output value
y Actual value
C = ½(ŷ- y)2
ŷ y C

Input value 1
Input value 2
Input value m
Output value
Actual value
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y C

Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C

C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3

A list of cost functions used in
neural networks, alongside
applications
CrossValidated (2015)
Link:
http://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-
used-in-neural-networks-alongside-applications
Additional Reading:

yInput value w1 Output value
y Actual value
ŷ
C = ½(ŷ- y)2
X1

C = ½(ŷ- y)2
C
ŷ
Best!

Input Layer
Area (feet2)
Bedrooms
Age
y Price
X4
X3
X2
X1
25 weights

1,000 x 1,000 x … x 1,000 = 1,00025 = 1075 combinations
Sunway TaihuLight: World’s fastest Super Computer
93 PFLOPS
93 x 1015
1075 / (93 x 1015)
= 1.08 x 1058 seconds
= 3.42 x 1050 years

Image Source: neuralnetworksanddeeplearning.com

C = ½(ŷ- y)2
C
ŷ

C
ŷ
Best!

C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3Adjust w1, w2, w3Adjust w1, w2, w3

Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s

A Neural Network in 13 lines of
Python (Part 2 - Gradient
Descent)
Andrew Trask (2015)
Link:
https://iamtrask.github.io/2015/07/27/python-network-part2/
Additional Reading:

Neural Networks and Deep
Learning
Michael Nielsen (2015)
Link:
http://neuralnetworksanddeeplearning.com/chap2.html
Additional Reading:

Image Source: neuralnetworksanddeeplearning.com
Forward PropagationBackpropagation

STEP 1: Randomly initialise the weights to small numbers close to 0 (but not 0).
STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.
STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each neuron’s
activation is limited by the weights. Propagate the activations until getting the predicted result y.
STEP 4: Compare the predicted result to the actual result. Measure the generated error.
STEP 5: Back-Propagation: from right to left, the error is back-propagated. Update the weights according to how much
they are responsible for the error. The learning rate decides by how much we update the weights.
STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or:
Repeat Steps 1 to 5 but update the weights only after a batch of observations (Batch Learning).
STEP 7: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.

Deep Learning A-Z™: Artificial Neural Networks (ANN) - Module 1

More Related Content

What's hot

Similar to Deep Learning A-Z™: Artificial Neural Networks (ANN) - Module 1

More from Kirill Eremenko

Recently uploaded

Deep Learning A-Z™: Artificial Neural Networks (ANN) - Module 1