© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
What we will learn in this section:
• The Neuron
• The Activation Function
• How do Neural Networks work? (example)
• How do Neural Networks learn?
• Gradient Descent
• Stochastic Gradient Descent
• Backpropagation
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
Image Source: www.austincc.edu
© SuperDataScienceDeep Learning A-Z
Image Source: Wikipedia
© SuperDataScienceDeep Learning A-Z
Image Source: Wikipedia
Neuron
Axon
Dendrites
© SuperDataScienceDeep Learning A-Z
Image Source: Wikipedia
© SuperDataScienceDeep Learning A-Z
neuron
Node
© SuperDataScienceDeep Learning A-Z
Input signal 1
Input signal 2
Input signal m
neuron
© SuperDataScienceDeep Learning A-Z
Input signal 1
Input signal 2
Input signal m
Output signalneuron
© SuperDataScienceDeep Learning A-Z
neuron
Input value 1
Input value 2
Input value m
X1
X2
Xm
Output signal
Synapse
© SuperDataScienceDeep Learning A-Z
neuron
Input value 1
Input value 2
Input value m
X1
X2
Xm
y Output value
Synapse
© SuperDataScienceDeep Learning A-Z
neuron
X1
X2
Xm
Output value
Independent
variable 1
Independent
variable 2
Independent
variable m
Input value 1
Input value 2
Input value m
y
Standardize
© SuperDataScienceDeep Learning A-Z
Efficient BackProp
By Yann LeCun et al. (1998)
Link:
http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
Additional Reading:
© SuperDataScienceDeep Learning A-Z
neuron
Input value 1
Input value 2
Input value m
Output value
X1
X2
Xm
y Output value
Can be:
• Continuous (price)
• Binary (will exit yes/no)
• Categorical
Independent
variable 1
Independent
variable 2
Independent
variable m
© SuperDataScienceDeep Learning A-Z
neuron
Input value 1
Input value 2
Input value m
Output value 1
Output value 2
Output value p
X1
X2
Xm
yy2
y1
y3
Independent
variable 1
Independent
variable 2
Independent
variable m
© SuperDataScienceDeep Learning A-Z
neuron
Single Observation
Same observation
X1
X2
Xm
y
Input value 1
Input value 2
Input value m
Output value
Independent
variable 1
Independent
variable 2
Independent
variable m
Single Observation
© SuperDataScienceDeep Learning A-Z
neuron
Input value 1
Input value 2
Input value m
Output value
X1
X2
Xm
y
w1
w2
wm
© SuperDataScienceDeep Learning A-Z
?neuron
Input value 1
Input value 2
Input value m
y
X1
X2
Xm
w1
w2
wm
Output value
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
y
1st step:
X1
X2
Xm
w1
w2
wm
Output value
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
3rd step
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
3rd step
© SuperDataScienceDeep Learning A-Z
y
0
1
© SuperDataScienceDeep Learning A-Z
0
y
Threshold Function
1
© SuperDataScienceDeep Learning A-Z
y
1
0
Sigmoid
© SuperDataScienceDeep Learning A-Z
0
y
Rectifier
1
© SuperDataScienceDeep Learning A-Z
0
y
1
-1
Hyperbolic Tangent (tanh)
© SuperDataScienceDeep Learning A-Z
Deep sparse rectifier
neural networks
By Xavier Glorot et al. (2011)
Link:
http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf
Additional Reading:
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
3rd step
If threshold activation function:
If sigmoid activation function:
Assuming the DV is binary (y = 0 or 1)
© SuperDataScienceDeep Learning A-Z
y
4
5
6
7
Input value 1
Input value 2
Input value m
Output value
Input
Layer
Hidden
Layer
Output
Layer
X2
X1
Xm
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
.
Output Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
w1
w2
w3
w4
Price = w1*x1+ w2*x2+ w3*x3+ w4*x4
Input Layer
X4
X3
X2
X1
y
© SuperDataScienceDeep Learning A-Z
Input Layer
X4
X3
X2
X1Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
.
© SuperDataScienceDeep Learning A-Z
Input Layer
X4
X3
X2
X1Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
.
© SuperDataScienceDeep Learning A-Z
Input Layer
X4
X3
X2
X1Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
© SuperDataScienceDeep Learning A-Z
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
X4
X3
X2
X1
© SuperDataScienceDeep Learning A-Z
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
X4
X3
X2
X1
© SuperDataScienceDeep Learning A-Z
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
X4
X3
X2
X1
© SuperDataScienceDeep Learning A-Z
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
X4
X3
X2
X1
y
© SuperDataScienceDeep Learning A-Z
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
X4
X3
X2
X1
y
© SuperDataScienceDeep Learning A-Z
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
X4
X3
X2
X1
y Price
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
y
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
y Actual value
ŷ
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
ŷ
X1
X2
Xm
w1
w2
wm
Output value
y Actual value
C = ½(ŷ- y)2
ŷ y C
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
Output value
Actual value
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y C
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
© SuperDataScienceDeep Learning A-Z
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
© SuperDataScienceDeep Learning A-Z
C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3
© SuperDataScienceDeep Learning A-Z
A list of cost functions used in
neural networks, alongside
applications
CrossValidated (2015)
Link:
http://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-
used-in-neural-networks-alongside-applications
Additional Reading:
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3
© SuperDataScienceDeep Learning A-Z
yInput value w1 Output value
y Actual value
ŷ
C = ½(ŷ- y)2
X1
© SuperDataScienceDeep Learning A-Z
C = ½(ŷ- y)2
C
ŷ
Best!
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
X4
X3
X2
X1
y Price
© SuperDataScienceDeep Learning A-Z
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
y Price
X4
X3
X2
X1
25 weights
© SuperDataScienceDeep Learning A-Z
1,000 x 1,000 x … x 1,000 = 1,00025 = 1075 combinations
Sunway TaihuLight: World’s fastest Super Computer
93 PFLOPS
93 x 1015
1075 / (93 x 1015)
= 1.08 x 1058 seconds
= 3.42 x 1050 years
© SuperDataScienceDeep Learning A-Z
Image Source: neuralnetworksanddeeplearning.com
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
C = ½(ŷ- y)2
C
ŷ
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
C = ½(ŷ- y)2
C
ŷ
© SuperDataScienceDeep Learning A-Z
C
ŷ
Best!
© SuperDataScienceDeep Learning A-Z
C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3
© SuperDataScienceDeep Learning A-Z
C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3Adjust w1, w2, w3Adjust w1, w2, w3
© SuperDataScienceDeep Learning A-Z
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
© SuperDataScienceDeep Learning A-Z
A Neural Network in 13 lines of
Python (Part 2 - Gradient
Descent)
Andrew Trask (2015)
Link:
https://iamtrask.github.io/2015/07/27/python-network-part2/
Additional Reading:
© SuperDataScienceDeep Learning A-Z
Neural Networks and Deep
Learning
Michael Nielsen (2015)
Link:
http://neuralnetworksanddeeplearning.com/chap2.html
Additional Reading:
© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
Image Source: neuralnetworksanddeeplearning.com
Forward PropagationBackpropagation
© SuperDataScienceDeep Learning A-Z
Neural Networks and Deep
Learning
Michael Nielsen (2015)
Link:
http://neuralnetworksanddeeplearning.com/chap2.html
Additional Reading:
© SuperDataScienceDeep Learning A-Z
STEP 1: Randomly initialise the weights to small numbers close to 0 (but not 0).
STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.
STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each neuron’s
activation is limited by the weights. Propagate the activations until getting the predicted result y.
STEP 4: Compare the predicted result to the actual result. Measure the generated error.
STEP 5: Back-Propagation: from right to left, the error is back-propagated. Update the weights according to how much
they are responsible for the error. The learning rate decides by how much we update the weights.
STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or:
Repeat Steps 1 to 5 but update the weights only after a batch of observations (Batch Learning).
STEP 7: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.

Deep Learning A-Z™: Artificial Neural Networks (ANN) - Module 1