Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Neural Networks in R
Venkat Reddy
Statinfer.com
Data Science Training and R&D
statinfer.com
2
Corporate Training
Classroom Training
Online Training
Contact ...
Note
•This presentation is just my class notes. The course notes for data
science training is written by me, as an aid for...
Contents
Contents
•Neural network Intuition
•Neural network and vocabulary
•Neural network algorithm
•Math behind neural network al...
Recap of Logistic Regression
Recap of Logistic Regression
•Categorical output YES/NO type
•Using the predictor variables to predict the categorical out...
Decision Boundary
Decision Boundary – Logistic Regression
•The line or margin that separates
the classes
•Classification algorithms are all
...
Decision Boundary – Logistic Regression
•In logistic regression, Decision Boundary can be derived from the
logistic regres...
Decision Boundary – Logistic Regression
•Rewriting it in mx+c form
• X2=(-b1/b2)X1+(-b0/b2)
•Anything above this line is c...
LAB: Logistic Regression and
Decision Boundary
LAB: Logistic Regression
•Dataset: Emp_Productivity/Emp_Productivity.csv
•Filter the data and take a subset from above dat...
LAB: Decision Boundary
•Draw a scatter plot that shows Age on X axis and Experience on Y-axis.
Try to distinguish the two ...
Code: Logistic Regression
15
statinfer.com
Code: Logistic Regression
16
statinfer.com
Code: Logistic Regression
17
statinfer.com
Code: Logistic Regression
18
statinfer.com
Code: Decision Boundary
19
statinfer.com
Code: Decision Boundary
20
statinfer.com
New representation for logistic
regression
New representation for logistic regression
22
𝑦 =
𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2
1 + 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2
𝑦 =
1
1 + 𝑒−(𝛽0+𝛽1𝑥1+𝛽2𝑥2)
x1
x2
w1
w2...
Finding the weights in logistic regression
23
x1
x2
W0+w1x1+w2x
2
w1
w2
w0
y
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑤 𝑘 𝑥 ...
LAB: Non-Linear Decision
Boundaries
LAB: Non-Linear Decision Boundaries
•Dataset: “Emp_Productivity/ Emp_Productivity_All_Sites.csv”
•Draw a scatter plot that...
Code: Non-Linear Decision Boundaries
26
statinfer.com
Code: Non-Linear Decision Boundaries
27
statinfer.com
Code: Non-Linear Decision Boundaries
28
statinfer.com
Code: Non-Linear Decision Boundaries
29
statinfer.com
Code: Non-Linear Decision Boundaries
30
statinfer.com
Non-Linear Decision
Boundaries-Issue
Non-Linear Decision Boundaries
32x1
x2
statinfer.com
Non-Linear Decision Boundaries-issues
•Logistic Regression line doesn’t seam to be a good option when we
have non-linear d...
Non-Linear Decision
Boundaries-Solution
Intermediate outputs
35
x1
x2
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1
𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) ,Say h1
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘...
The Intermediate output
•Using the x’s Directly predicting y is challenging.
•We can predict h, the intermediate output, w...
Finding the weights for intermediate outputs
37
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ...
LAB: Intermediate output
LAB: Intermediate output
•Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv
•Filter the data and take first 74 obs...
Code: Intermediate output
40
statinfer.com
Code: Intermediate output
41
statinfer.com
Code: Intermediate output
42
statinfer.com
Code: Intermediate output
43
statinfer.com
Code: Intermediate output
44
statinfer.com
Code: Intermediate output
45
statinfer.com
Code: Intermediate output
46
statinfer.com
Code: Intermediate output
47
statinfer.com
Code: Intermediate output
48
statinfer.com
Code: Intermediate output
49
statinfer.com
Code: Intermediate output
50
statinfer.com
Code: Intermediate output
51
statinfer.com
Code: Intermediate output
52
statinfer.com
Code: Intermediate output
53
statinfer.com
Neural Network intuition
Neural Network intuition
55
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘))
ℎ𝑗 = 𝑜𝑢𝑡 𝑥 = 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘...
Neural Network intuition
56
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1...
The Neural Networks
•The neural networks methodology is similar to the intermediate output
method explained above.
•But we...
Neural network and vocabulary
58
1
x1
x2
h1
h2
y
Hidden
Layer
Input Output
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21...
Why are they called hidden layers?
•A hidden layer “hides” the desired output.
•Instead of predicting the actual output us...
The Neural network Algorithm
Algorithm for Finding weights
61
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W2
W1
W3
• Algorithm is all about finding the weigh...
The Neural Network Algorithm
•Step 1: Initialization of weights: Randomly select some weights
•Step 2 : Training & Activat...
Randomly initialize weights
63
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W2
W1
W3
Step 1: Initialization
of weights: Randomly
...
Training & Activation
64
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W1
W0
W2
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21
+𝑤...
Error Calculation at Output
65
Step 3: Calculate the
error at the outputs.
Use the output error
to calculate error
fractio...
Error Calculation at hidden layers
66
x1
x2
h1
𝑦
h2
Step 3: Calculate the
error at the outputs.
Use the output error to
ca...
Calculate weight corrections
67
x1
x2
h1
𝑦
h2
Dw11
Dw21
Dw12
Dw13
Dw22
Dw23
Step 4: Update the
weights to reduce
the error...
Update Weights
68
x1
x2
h1
𝑦
h2
w11 :=w11+Dw11
w12 :=w12+Dw12
w22 :=w22+Dw22
w13 :=w13+Dw13
w21 :=w21+Dw21 Step 4: Update ...
Stopping Criteria
69
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W1
W0
W2
෍
𝒊=𝟏
𝒏
𝒚𝒊 − 𝒈 ෍
𝒌=𝟏
𝒎
𝒘 𝒌 𝒉 𝒌𝒊
𝟐
Step 5: Stop the
tra...
Once Again ….Neural network Algorithm
•Step 1: Initialization of weights: Randomly select some weights
•Step 2 : Training ...
Neural network Algorithm-
Demo
Neural network Algorithm-Demo
Looks like a dataset that can’t be separated by using single linear decision boundary/percep...
Neural network Algorithm-Demo
•Lets consider a similar but simple classification example
•XOR Gate Dataset
73
Input1(x1) I...
Randomly initialize weights
74
x1
x2
h1
𝑦
h2
0.5
Step 1: Initialization
of weights: Randomly
select some weights
statinfer...
Activation
75
1
1
0.818
0.7137126
0.731
0.5
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21
+𝑤13
𝑥1
+𝑤23
𝑥2
)
𝑦 =
1
1 + 𝑒−...
Back-Propagate Errors
76
Ytarget-Yobs -0.7137126
Y(1-Y) 0.20432693
𝛿 at Y -0.1458307
1
1
h1
𝑦
h2
0.5
𝛿 at Y -0.1458307
h2(...
Calculate Weight Corrections
77
1
1
𝛿 at h1
0.0217501
𝛿 at Y
−0.1458307
𝛿 at h2
−0.02867
0.5
∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘
∆𝑊=0.1*1*0.0...
Updated Weights
78
1
1
𝛿 at h1
0.0217501
𝛿 at Y
−0.1458307
𝛿 at h2
−0.02867
0.502175
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
W(new)=W(old)+Corre...
Updated Weights..contd
79
x1
x2
h1
𝑦
h2
0.50218
statinfer.com
Iterations and Stopping Criteria
•This iteration is just for one training example (1,1,0). This is just the
first epoch.
•...
XOR Gate final NN Model
81
statinfer.com
Building the Neural network
The good news is..
•We don’t need to write the code for weights calculation and updating
•There readymade codes, libraries...
Building the neural network in R
•We have a couple of packages available in R
• net
• neuralnet
•We need to mention the da...
LAB: Building the neural
network in R
LAB: Building the neural network in R
•Build a neural network for XOR data
86
statinfer.com
Code: Building the neural network in R
87
statinfer.com
R Code Options
• neuralnet(Productivity~Age+Experience,data=Emp_Productivity_raw, hidden=2,
stepmax = 1e+07, threshold=0.0...
R Code Options
•Threshold
• Connected to weights optimization on error function
• By default, neuralnet requires the model...
Code: Building the neural network in R
90
Execute a couple of
times to get zero error
statinfer.com
Code: Building the neural network in R
91
statinfer.com
Code: Building the neural network in R
92
statinfer.com
Code: Building the neural network in R
93
statinfer.com
Lab: Building Neural network on Employee
productivity data
•Dataset: Emp_Productivity/Emp_Productivity.csv
•Draw a 2D grap...
Code: Neural network on Employee
productivity data
95
statinfer.com
Code: Neural network on Employee
productivity data
96
statinfer.com
Code: Neural network on Employee
productivity data
97
statinfer.com
Code: Neural network on Employee
productivity data
98
statinfer.com
Code: Neural network on Employee
productivity data
99
statinfer.com
Code: Neural network on Employee
productivity data
100
statinfer.com
Code: Neural network on Employee
productivity data
101
statinfer.com
There can be many solutions
102
Set-1
statinfer.com
There can be many solutions
103
Set-2
statinfer.com
There can be many solutions
104
Set-3
statinfer.com
Local vs. Global Minimum
Local vs. Global Minimum
• The neural network might give different results with different start weights.
• The algorithm t...
Hidden layers and their role
Multi Layer Neural Network
108
1
x1
x2
H11
H12
Y
Hidden LayersInput Output
H21
H22
H22
statinfer.com
The role of hidden layers
109
• The First hidden layer
• The first layer is nothing but the liner
decision boundaries
• Th...
The role of hidden layers
110
• The Second hidden layer
• The Second layer combines these
lines and forms simple decision
...
The Number of hidden layers
The Number of hidden layers
•There is no concrete rule to choose the right number. We need to
choose by trail and error va...
LAB: Digit Recognizer
LAB: Digit Recognizer
• Take an image of a handwritten single digit, and determine what that digit is.
• Normalized handwr...
Code: Digit Recognizer
#Importing test and training data - USPS Data
digits_train <- read.table("D:Google DriveTrainingDat...
Code: Digit Recognizer
116
statinfer.com
Code: Digit Recognizer
#####Creating multiple columns for multiple outputs
#####We need these variables while building the...
Code: Digit Recognizer
118
statinfer.com
Code: Digit Recognizer
119
statinfer.com
Code: Digit Recognizer
120
statinfer.com
Code: Digit Recognizer
121
statinfer.com
Code: Digit Recognizer
122
statinfer.com
Real-world applications
Real-world applications
•Self driving car by taking the video as input
•Speech recognition
•Face recognition
•Cancer cell ...
Real-world applications
•Face recognition :
• https://www.youtube.com/watch?v=57VkfXqJ1LU
• https://www.youtube.com/watch?...
Drawbacks of Neural Networks
Drawbacks of Neural Networks
•No real theory that explains how to choose the number of hidden
layers
•Takes lot of time wh...
Why the name neural
network?
Why the name neural network?
•The neural network algorithm for
solving complex learning problems is
inspired by human brai...
Why the name neural network?
130
Dendrites Input(X)
Cell body  Processor(Swx)
Axon  Output(Y)
statinfer.com
Conclusion
Conclusion
•Neural network is a vast subject. Many data scientists solely focus on only
Neural network techniques
•In this...
Appendix
Math- How to update the
weights?
Math- How to update the weights?
•We update the weights backwards by iteratively calculating the error
•The formula for we...
Math- How to update the weights?
• 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
• 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
• 𝜂 is the learning parameter
• 𝛿 𝑘 = 𝑦 𝑘(1 ...
Math-How does the delta rule
work?
How does the delta rule work?
• Lets consider a simple example to understand the weight updating using delta rule.
138
• I...
How does the delta rule work?
139
• If we repeat the same process of adding delta
and updating weights, we can finally end...
Math-How does gradient
descent work?
How does gradient descent work?
•Gradient descent is one of the famous ways to calculate the local minimum
•By Changing th...
Demo-How does gradient
descent work?
Does this method really work?
• We changed the weights did it reduce the overall error?
• Lets calculate the error with ne...
Gradient Descent method validation
•With our initial set of weights the overall error was 0.7137,Y Actual is
0, Y Predicte...
Thank you
Statinfer.com
statinfer.com
146
Download the course videos and handouts from the below link
https://statinfer.com/course/m...
Upcoming SlideShare
Loading in …5
×

Neural Networks made easy

1,195 views

Published on

Neural network Intuition
Neural network and vocabulary
Neural network algorithm
Math behind neural network algorithm
Building the neural networks
Validating the neural network model
Neural network applications
Image recognition using neural networks

Published in: Education

Neural Networks made easy

  1. 1. Neural Networks in R Venkat Reddy
  2. 2. Statinfer.com Data Science Training and R&D statinfer.com 2 Corporate Training Classroom Training Online Training Contact us info@statinfer.com venkat@statinfer.com
  3. 3. Note •This presentation is just my class notes. The course notes for data science training is written by me, as an aid for myself. •The best way to treat this is, as a high-level summary; the actual session went more in depth and contained detailed information and examples •Most of this material was written as informal notes, not intended for publication •Please send questions/comments/corrections to info@statinfer.com •Please check our website statinfer.com for latest version of this document -Venkata Reddy Konasani (Cofounder statinfer.com) statinfer.com 3
  4. 4. Contents
  5. 5. Contents •Neural network Intuition •Neural network and vocabulary •Neural network algorithm •Math behind neural network algorithm •Building the neural networks •Validating the neural network model •Neural network applications •Image recognition using neural networks 5 statinfer.com
  6. 6. Recap of Logistic Regression
  7. 7. Recap of Logistic Regression •Categorical output YES/NO type •Using the predictor variables to predict the categorical output 7 statinfer.com
  8. 8. Decision Boundary
  9. 9. Decision Boundary – Logistic Regression •The line or margin that separates the classes •Classification algorithms are all about finding the decision boundaries •It need not be straight line always •The final function of our decision boundary looks like • Y=1 if wTx+w0>0 ; else Y=0 9 x1 x2 statinfer.com
  10. 10. Decision Boundary – Logistic Regression •In logistic regression, Decision Boundary can be derived from the logistic regression coefficients and the threshold. • Imagine the logistic regression line p(y)=e(b0+b1x1+b2x2)/1+exp(b0+b1x1+b2x2) • Suppose if p(y)>0.5 then class-1 or else class-0 • log(y/1-y)=b0+b1x1+b2x2 • Log(0.5/0.5)=b0+b1x1+b2x2 • 0=b0+b1x1+b2x2 • b0+b1x1+b2x2=0 is the line 10 statinfer.com
  11. 11. Decision Boundary – Logistic Regression •Rewriting it in mx+c form • X2=(-b1/b2)X1+(-b0/b2) •Anything above this line is class-1, below this line is class-0 • X2>(-b1/b2)X1+(-b0/b2)is class-1 • X2<(-b1/b2)X1+(-b0/b2) is class-0 • X2=(-b1/b2)X1+(-b0/b2) tie probability of 0.5 •We can change the decision boundary by changing the threshold value(here 0.5) 11 statinfer.com
  12. 12. LAB: Logistic Regression and Decision Boundary
  13. 13. LAB: Logistic Regression •Dataset: Emp_Productivity/Emp_Productivity.csv •Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3 •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Create the confusion matrix •Calculate the accuracy and error rates 13 statinfer.com
  14. 14. LAB: Decision Boundary •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Finally draw the decision boundary for this logistic regression model 14 statinfer.com
  15. 15. Code: Logistic Regression 15 statinfer.com
  16. 16. Code: Logistic Regression 16 statinfer.com
  17. 17. Code: Logistic Regression 17 statinfer.com
  18. 18. Code: Logistic Regression 18 statinfer.com
  19. 19. Code: Decision Boundary 19 statinfer.com
  20. 20. Code: Decision Boundary 20 statinfer.com
  21. 21. New representation for logistic regression
  22. 22. New representation for logistic regression 22 𝑦 = 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2 1 + 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2 𝑦 = 1 1 + 𝑒−(𝛽0+𝛽1𝑥1+𝛽2𝑥2) x1 x2 w1 w2 w0 yW0+w1x1+w2x 2 𝑦 = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) 𝑦 = 𝑔 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 𝑤ℎ𝑒𝑟𝑒 𝑔 𝑥 = 1 1 + 𝑒−𝑥 statinfer.com
  23. 23. Finding the weights in logistic regression 23 x1 x2 W0+w1x1+w2x 2 w1 w2 w0 y 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑤 𝑘 𝑥 𝑘 ]2 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) The above output is a non linear function of linear combination of inputs – A typical multiple logistic regression line statinfer.com
  24. 24. LAB: Non-Linear Decision Boundaries
  25. 25. LAB: Non-Linear Decision Boundaries •Dataset: “Emp_Productivity/ Emp_Productivity_All_Sites.csv” •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Finally draw the decision boundary for this logistic regression model •Create the confusion matrix •Calculate the accuracy and error rates 25 statinfer.com
  26. 26. Code: Non-Linear Decision Boundaries 26 statinfer.com
  27. 27. Code: Non-Linear Decision Boundaries 27 statinfer.com
  28. 28. Code: Non-Linear Decision Boundaries 28 statinfer.com
  29. 29. Code: Non-Linear Decision Boundaries 29 statinfer.com
  30. 30. Code: Non-Linear Decision Boundaries 30 statinfer.com
  31. 31. Non-Linear Decision Boundaries-Issue
  32. 32. Non-Linear Decision Boundaries 32x1 x2 statinfer.com
  33. 33. Non-Linear Decision Boundaries-issues •Logistic Regression line doesn’t seam to be a good option when we have non-linear decision boundaries 33 statinfer.com
  34. 34. Non-Linear Decision Boundaries-Solution
  35. 35. Intermediate outputs 35 x1 x2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) ,Say h1 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) , Say h2 Model2Model1 statinfer.com
  36. 36. The Intermediate output •Using the x’s Directly predicting y is challenging. •We can predict h, the intermediate output, which will indeed predict Y 36 x1 x2 w11 w12 y w22 w21 h1 h2 W1 W2 statinfer.com
  37. 37. Finding the weights for intermediate outputs 37 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2 x1 x2 h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘) 𝑦 = 𝑔(σ 𝑊𝑗ℎ𝑗) ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘) statinfer.com
  38. 38. LAB: Intermediate output
  39. 39. LAB: Intermediate output •Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv •Filter the data and take first 74 observations from above dataset . •Build a logistic regression model to predict Productivity using age and experience •Calculate the prediction probabilities for all the inputs. Store the probabilities in inter1 variable •Filter the data and take observations from row 34 onwards. •Build a logistic regression model to predict Productivity using age and experience •Calculate the prediction probabilities for all the inputs. Store the probabilities in inter2 variable •Build a consolidated model to predict productivity using inter-1 and inter-2 variables •Create the confusion matrix and find the accuracy and error rates for the consolidated model 39 statinfer.com
  40. 40. Code: Intermediate output 40 statinfer.com
  41. 41. Code: Intermediate output 41 statinfer.com
  42. 42. Code: Intermediate output 42 statinfer.com
  43. 43. Code: Intermediate output 43 statinfer.com
  44. 44. Code: Intermediate output 44 statinfer.com
  45. 45. Code: Intermediate output 45 statinfer.com
  46. 46. Code: Intermediate output 46 statinfer.com
  47. 47. Code: Intermediate output 47 statinfer.com
  48. 48. Code: Intermediate output 48 statinfer.com
  49. 49. Code: Intermediate output 49 statinfer.com
  50. 50. Code: Intermediate output 50 statinfer.com
  51. 51. Code: Intermediate output 51 statinfer.com
  52. 52. Code: Intermediate output 52 statinfer.com
  53. 53. Code: Intermediate output 53 statinfer.com
  54. 54. Neural Network intuition
  55. 55. Neural Network intuition 55 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘)) ℎ𝑗 = 𝑜𝑢𝑡 𝑥 = 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘) • So h is a non linear function of linear combination of inputs – A multiple logistic regression line • Y is a non linear function of linear combination of outputs of logistic regressions • Y is a non linear function of linear combination of non linear functions of linear combination of inputs 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 {𝑊𝑗} & {𝑤𝑗𝑘} 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘))]2 Neural networks is all about finding the sets of weights {Wj,} and {wjk} using Gradient Descent Method statinfer.com
  56. 56. Neural Network intuition 56 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2 x1 x2 h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘) 𝑦 = 𝑔(σ 𝑊𝑗ℎ𝑗) ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘) statinfer.com
  57. 57. The Neural Networks •The neural networks methodology is similar to the intermediate output method explained above. •But we will not manually subset the data to create the different models. •The neural network technique automatically takes care of all the intermediate outputs using hidden layers •It works very well for the data with non-linear decision boundaries •The intermediate output layer in the network is known as hidden layer •In Simple terms, neural networks are multi layer nonlinear regression models. •If we have sufficient number of hidden layers, then we can estimate any complex non-linear function 57 statinfer.com
  58. 58. Neural network and vocabulary 58 1 x1 x2 h1 h2 y Hidden Layer Input Output h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) • X1,X2 inputs • 1  bias term • W’s are weights • 1/(1+e-u) is the sigmoid function • Y is output statinfer.com
  59. 59. Why are they called hidden layers? •A hidden layer “hides” the desired output. •Instead of predicting the actual output using a single model, build multiple models to predict intermediate output •There is no standard way of deciding the number of hidden layers. 59 statinfer.com
  60. 60. The Neural network Algorithm
  61. 61. Algorithm for Finding weights 61 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 • Algorithm is all about finding the weights/coefficients • We randomly initialize some weights; Calculate the output by supplying training input; If there is an error the weights are adjusted to reduce this error. statinfer.com
  62. 62. The Neural Network Algorithm •Step 1: Initialization of weights: Randomly select some weights •Step 2 : Training & Activation: Input the training values and perform the calculations forward. •Step 3 : Error Calculation: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer •Step 4: Weight training : Update the weights to reduce the error, recalculate and repeat the process of training & updating the weights for all the examples. •Step 5: Stopping criteria: Stop the training and weights updating process when the minimum error criteria is met 62 statinfer.com
  63. 63. Randomly initialize weights 63 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 Step 1: Initialization of weights: Randomly select some weights statinfer.com
  64. 64. Training & Activation 64 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W1 W0 W2 h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) Training input & calculations – Feed Forward Step 2 : Input the training values and perform the calculations forward statinfer.com
  65. 65. Error Calculation at Output 65 Step 3: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 statinfer.com
  66. 66. Error Calculation at hidden layers 66 x1 x2 h1 𝑦 h2 Step 3: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 Back Propagation - Calculate Errors signals backwards 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  67. 67. Calculate weight corrections 67 x1 x2 h1 𝑦 h2 Dw11 Dw21 Dw12 Dw13 Dw22 Dw23 Step 4: Update the weights to reduce the error, recalculate and repeat the process DW1 DW0 DW2 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  68. 68. Update Weights 68 x1 x2 h1 𝑦 h2 w11 :=w11+Dw11 w12 :=w12+Dw12 w22 :=w22+Dw22 w13 :=w13+Dw13 w21 :=w21+Dw21 Step 4: Update the weights to reduce the error, recalculate and repeat the process W1:=W1+DW1 W0:=W0+DW0 W2:=W2+DW2 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  69. 69. Stopping Criteria 69 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W1 W0 W2 ෍ 𝒊=𝟏 𝒏 𝒚𝒊 − 𝒈 ෍ 𝒌=𝟏 𝒎 𝒘 𝒌 𝒉 𝒌𝒊 𝟐 Step 5: Stop the training and weights updating process when the minimum error criteria is met statinfer.com
  70. 70. Once Again ….Neural network Algorithm •Step 1: Initialization of weights: Randomly select some weights •Step 2 : Training & Activation: Input the training values and perform the calculations forward. •Step 3 : Error Calculation: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer •Step 4: Weight training : Update the weights to reduce the error, recalculate and repeat the process of training & updating the weights for all the examples. •Step 5: Stopping criteria: Stop the training and weights updating process when the minimum error criteria is met 70 statinfer.com
  71. 71. Neural network Algorithm- Demo
  72. 72. Neural network Algorithm-Demo Looks like a dataset that can’t be separated by using single linear decision boundary/perceptron 72 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 00 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  73. 73. Neural network Algorithm-Demo •Lets consider a similar but simple classification example •XOR Gate Dataset 73 Input1(x1) Input2(x2) Output(y) 1 1 0 1 0 1 0 1 1 0 0 0 0 0 1 1 statinfer.com
  74. 74. Randomly initialize weights 74 x1 x2 h1 𝑦 h2 0.5 Step 1: Initialization of weights: Randomly select some weights statinfer.com
  75. 75. Activation 75 1 1 0.818 0.7137126 0.731 0.5 h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) input1 input2 output 1 1 0 • In this step we input 1 and 1 as input & expect 0 as output. • With these weights we got an error of - 0.714 at output layer • We need to adjust weights statinfer.com
  76. 76. Back-Propagate Errors 76 Ytarget-Yobs -0.7137126 Y(1-Y) 0.20432693 𝛿 at Y -0.1458307 1 1 h1 𝑦 h2 0.5 𝛿 at Y -0.1458307 h2(1-h2) 0.1966119 𝑊 1 𝛿 at h2 -0.0286721 𝛿 at Y -0.1458307 h1(1-h1) 0.1491465 𝑊 -1 𝛿 at h1 0.0217501 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  77. 77. Calculate Weight Corrections 77 1 1 𝛿 at h1 0.0217501 𝛿 at Y −0.1458307 𝛿 at h2 −0.02867 0.5 ∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘 ∆𝑊=0.1*1*0.0217501 ∆𝑊=0.00217501 ∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘 ∆𝑊=0.1*1*−0.02867 ∆𝑊=-0.002867 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  78. 78. Updated Weights 78 1 1 𝛿 at h1 0.0217501 𝛿 at Y −0.1458307 𝛿 at h2 −0.02867 0.502175 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 W(new)=W(old)+Correction W(new) =0.502175 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 W(new)=W(old)+Correction W(new) =-1.002867 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  79. 79. Updated Weights..contd 79 x1 x2 h1 𝑦 h2 0.50218 statinfer.com
  80. 80. Iterations and Stopping Criteria •This iteration is just for one training example (1,1,0). This is just the first epoch. •We repeat the same process of training and updating of weights for all the data points •We continue and update the weights until we see there is no significant change in the error or when the maximum permissible error criteria is met. •By updating the weights in this method, we reduce the error slightly. When the error reaches the minimum point the iterations will be stopped and the weights will be considered as optimum for this training set 80 statinfer.com
  81. 81. XOR Gate final NN Model 81 statinfer.com
  82. 82. Building the Neural network
  83. 83. The good news is.. •We don’t need to write the code for weights calculation and updating •There readymade codes, libraries and packages available in R •The gradient descent method is not very easy to understand for a non mathematics students •Neural network tools don’t expect the user to write the code for the full length back propagation algorithm 83 statinfer.com
  84. 84. Building the neural network in R •We have a couple of packages available in R • net • neuralnet •We need to mention the dataset, input, output & number of hidden layers as input. •Neural network calculations are very complex. The algorithm may take sometime to produce the results •One need to be careful while setting the parameters. The runtime changed based on the input parameter values 84 statinfer.com
  85. 85. LAB: Building the neural network in R
  86. 86. LAB: Building the neural network in R •Build a neural network for XOR data 86 statinfer.com
  87. 87. Code: Building the neural network in R 87 statinfer.com
  88. 88. R Code Options • neuralnet(Productivity~Age+Experience,data=Emp_Productivity_raw, hidden=2, stepmax = 1e+07, threshold=0.00001, linear.output = FALSE) •The number of hidden layers in the neural network. It is actually the number of nodes. We can input a vector to add more hidden layers •Stepmax: • The number of iterations while executing algorithm. • Sometimes we may need more than 100,000 steps for the algorithm to converge. • Some times we may get an error “Alogorithm didn't converge with the default step max”; We need to increase the stepmax parameter value in such cases. • Additional info • One epoch one complete run of training data. If epoch=500 then algorithm sees the entire data set 500 times • One iteration is the number of times a "batch" of data passed through the algorithm(Steps). If batch size is same as full training data then iterations is equal to epochs 88 statinfer.com
  89. 89. R Code Options •Threshold • Connected to weights optimization on error function • By default, neuralnet requires the model partial derivative error to change at least 0.01 otherwise it will stop changing. • It can be used as a stopping criteria. If the partial derivative of error function reaches this threshold then the algorithm will stop. • A lower threshold value will force the algorithm for more iterations and accuracy. •The output is expected to be linear by default. We need to specifically mention linear.output = FALSE for classification problems 89 statinfer.com
  90. 90. Code: Building the neural network in R 90 Execute a couple of times to get zero error statinfer.com
  91. 91. Code: Building the neural network in R 91 statinfer.com
  92. 92. Code: Building the neural network in R 92 statinfer.com
  93. 93. Code: Building the neural network in R 93 statinfer.com
  94. 94. Lab: Building Neural network on Employee productivity data •Dataset: Emp_Productivity/Emp_Productivity.csv •Draw a 2D graph between age, experience and productivity •Build neural network algorithm to predict the productivity based on age and experience •Plot the neural network with final weights •Increase the hidden layers and see the change in accuracy 94 statinfer.com
  95. 95. Code: Neural network on Employee productivity data 95 statinfer.com
  96. 96. Code: Neural network on Employee productivity data 96 statinfer.com
  97. 97. Code: Neural network on Employee productivity data 97 statinfer.com
  98. 98. Code: Neural network on Employee productivity data 98 statinfer.com
  99. 99. Code: Neural network on Employee productivity data 99 statinfer.com
  100. 100. Code: Neural network on Employee productivity data 100 statinfer.com
  101. 101. Code: Neural network on Employee productivity data 101 statinfer.com
  102. 102. There can be many solutions 102 Set-1 statinfer.com
  103. 103. There can be many solutions 103 Set-2 statinfer.com
  104. 104. There can be many solutions 104 Set-3 statinfer.com
  105. 105. Local vs. Global Minimum
  106. 106. Local vs. Global Minimum • The neural network might give different results with different start weights. • The algorithm tries to find the local minima rather than global minima. • There can be many local minima’s, which means there can be many solutions to neural network problem • We need to perform the validation checks before choosing the final model. 106Global minimu m Local Minimu m statinfer.com
  107. 107. Hidden layers and their role
  108. 108. Multi Layer Neural Network 108 1 x1 x2 H11 H12 Y Hidden LayersInput Output H21 H22 H22 statinfer.com
  109. 109. The role of hidden layers 109 • The First hidden layer • The first layer is nothing but the liner decision boundaries • The simple logistic regression line outputs • We can see them as multiple lines on the decision space 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  110. 110. The role of hidden layers 110 • The Second hidden layer • The Second layer combines these lines and forms simple decision boundary shapes • The third hidden layer forms even complex shapes within the boundaries generated by second layer. • You can imagine All these layers together divide the whole objective space into multiple decision boundary shapes, the cases within the shape are class-1 outside the shape are class-2 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  111. 111. The Number of hidden layers
  112. 112. The Number of hidden layers •There is no concrete rule to choose the right number. We need to choose by trail and error validation •Too few hidden layers might result in imperfect models. The error rate will be high •High number of hidden layers might lead to over‐fitting, but it can be identified by using some validation techniques •The final number is based on the number of predictor variables, training data size and the complexity in the target. •When we are in doubt, its better to go with many hidden nodes than few. It will ensure higher accuracy. The training process will be slower though •Cross validation and testing error can help us in determining the model with optimal hidden layers 112 statinfer.com
  113. 113. LAB: Digit Recognizer
  114. 114. LAB: Digit Recognizer • Take an image of a handwritten single digit, and determine what that digit is. • Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990). • The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values. • Build a neural network model that can be used as the digit recognizer • Use the test dataset to validate the true classification power of the model • What is the final accuracy of the model? 114 statinfer.com
  115. 115. Code: Digit Recognizer #Importing test and training data - USPS Data digits_train <- read.table("D:Google DriveTrainingDatasetsDigit RecognizerUSPSzip.train.txt", quote=""", comment.char="") digits_test <- read.table("D:Google DriveTrainingDatasetsDigit RecognizerUSPSzip.test.txt", quote=""", comment.char="") dim(digits_train) col_names <- names(digits_train[,-1]) label_levels<-names(table(digits_train$V1)) #Lets see some images. for(i in 1:10) { data_row<-digits_train[i,-1] pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE) image(pixels, axes = FALSE) title(main = paste("Label is" , digits_train[i,1]), font.main = 4) } 115 statinfer.com
  116. 116. Code: Digit Recognizer 116 statinfer.com
  117. 117. Code: Digit Recognizer #####Creating multiple columns for multiple outputs #####We need these variables while building the model digit_labels<-data.frame(label=digits_train[,1]) for (i in 1:10) { digit_labels<-cbind(digit_labels, digit_labels$label==i-1) names(digit_labels)[i+1]<-paste("l",i-1,sep="") } label_names<-names(digit_labels[,-1]) #Update the training dataset digits_train1<-cbind(digits_train,digit_labels) names(digits_train1) #formula y~. doesn't work in neuralnet function model_form <- as.formula(paste(paste(label_names, collapse = " + "), "~", paste(col_names, collapse = " + "))) 117 statinfer.com
  118. 118. Code: Digit Recognizer 118 statinfer.com
  119. 119. Code: Digit Recognizer 119 statinfer.com
  120. 120. Code: Digit Recognizer 120 statinfer.com
  121. 121. Code: Digit Recognizer 121 statinfer.com
  122. 122. Code: Digit Recognizer 122 statinfer.com
  123. 123. Real-world applications
  124. 124. Real-world applications •Self driving car by taking the video as input •Speech recognition •Face recognition •Cancer cell analysis •Heart attack predictions •Currency predictions and stock price predictions •Credit card default and loan predictions •Marketing and advertising by predicting the response probability •Weather forecasting and rainfall prediction 124 statinfer.com
  125. 125. Real-world applications •Face recognition : • https://www.youtube.com/watch?v=57VkfXqJ1LU • https://www.youtube.com/watch?v=xVQLBbXdVUY •Autonomous car software • https://www.youtube.com/watch?v=gG72-SjwxAM 125 statinfer.com
  126. 126. Drawbacks of Neural Networks
  127. 127. Drawbacks of Neural Networks •No real theory that explains how to choose the number of hidden layers •Takes lot of time when the input data is large, needs powerful computing machines •Difficult to interpret the results. Very hard to interpret and measure the impact of individual predictors •Its not easy to choose the right training sample size and learning rate. •The local minimum issue. The gradient descent algorithm produces the optimal weights for the local minimum, the global minimum of the error function is not guaranteed 127 statinfer.com
  128. 128. Why the name neural network?
  129. 129. Why the name neural network? •The neural network algorithm for solving complex learning problems is inspired by human brain •Our brains are a huge network of processing elements. It contains a network of billions of neurons. •In our brain, a neuron receives input from other neurons. Inputs are combined and send to next neuron •The artificial neural network algorithm is built on the same logic. 129 statinfer.com
  130. 130. Why the name neural network? 130 Dendrites Input(X) Cell body  Processor(Swx) Axon  Output(Y) statinfer.com
  131. 131. Conclusion
  132. 132. Conclusion •Neural network is a vast subject. Many data scientists solely focus on only Neural network techniques •In this session we practiced the introductory concepts only. Neural Networks has much more advanced techniques. There are many algorithms other than back propagation. •Neural networks particularly work well on some particular class of problems like image recognition. •The neural networks algorithms are very calculation intensive. They require highly efficient computing machines. Large datasets take significant amount of runtime on R. We need to try different types of options and packages. •Currently there is a lot of exciting research is going on, around neural networks. •After gaining sufficient knowledge in this basic session, you may want to explore reinforced learning, deep learning etc., 132 statinfer.com
  133. 133. Appendix
  134. 134. Math- How to update the weights?
  135. 135. Math- How to update the weights? •We update the weights backwards by iteratively calculating the error •The formula for weights updating is done using gradient descent method or delta rule also known as Widrow-Hoff rule •First we calculate the weight corrections for the output layer then we take care of hidden layers 135 statinfer.com
  136. 136. Math- How to update the weights? • 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 • 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 • 𝜂 is the learning parameter • 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) • Err=Expected output-Actual output •The weight corrections is calculated based on the error function •The new weights are chosen in such way that the final error in that network is minimized 136 statinfer.com
  137. 137. Math-How does the delta rule work?
  138. 138. How does the delta rule work? • Lets consider a simple example to understand the weight updating using delta rule. 138 • If we building a simple logistic regression line. We would like to find the weights using weight update rule • Y=1/(1+e-wx) is the equation • We are searching for the optimal w for our data • Let w be 1 • Y=1/(1+e-x) is the initial equation • The error in our initial step is 3.59 • To reduce the error we will add a delta to w and make it 1.5 • Now w is 1.5 (blue line) • Y=1/(1+e-1.5x) the updated equation • With the updated weight, the error is 1.57 • We can further reduce the error by increasing w by delta statinfer.com
  139. 139. How does the delta rule work? 139 • If we repeat the same process of adding delta and updating weights, we can finally end up with minimum error • The weight at that final step is the optimal weight • In this example the weight is 8, and the error is 0 • Y=1/(1+e-8x) is the final equation • In this example, we manually changed the weights to reduce the error. This is just for intuition, manual updating is not feasible for complex optimization problems. • In gradient descent is a scientific optimization method. We update the weights by calculating gradient of the function. statinfer.com
  140. 140. Math-How does gradient descent work?
  141. 141. How does gradient descent work? •Gradient descent is one of the famous ways to calculate the local minimum •By Changing the weights we are moving towards the minimum value of the error function. The weights are changed by taking steps in the negative direction of the function gradient(derivative). 141 Error Weight statinfer.com
  142. 142. Demo-How does gradient descent work?
  143. 143. Does this method really work? • We changed the weights did it reduce the overall error? • Lets calculate the error with new weights and see the change 143 1 1 0.818545647 0.706552799 0.729364041 0.50218 statinfer.com
  144. 144. Gradient Descent method validation •With our initial set of weights the overall error was 0.7137,Y Actual is 0, Y Predicted is 0.7137 error =0.7137 •The new weights give us a predicted value of 0.70655 •In one iteration, we reduced the error from 0.7137 to 0.70655 •The error is reduced by 1%. Repeat the same process with multiple epochs and training examples, we can reduce the error further. 144 input1 input2 Output(Y-Actual) Y Predicted Error Old Weights 1 1 0 0.71371259 0.71371259 Updated Weights 1 1 0 0.706552799 0.706552799 statinfer.com
  145. 145. Thank you
  146. 146. Statinfer.com statinfer.com 146 Download the course videos and handouts from the below link https://statinfer.com/course/machine-learning-with-r-2/curriculum/?c=b433a9be3189

×