SlideShare a Scribd company logo
1 of 49
Download to read offline
Advanced Topics in Systems
Machine Learning
Neural Networks
Inas A. Yassine
Systems and Biomedical Engineering Department,
Faculty of Engineering - Cairo University
iyassine@eng.cu.edu.eg
Neurons and Brain
§ Neural network mimic the brain , hear,
visualize, find the geometric relations, …..
through single learning algorithm
§ Simulating network of neurons
Machine Learning Fall 2018 Inas A.Yassine 2
Perceptron
The Perceptron
§ A number of McCulloch-Pitts neurons can be connected
together in any way .
§ An arrangement of one input layer of McCulloch-Pitts neurons
feeding forward to one output layer of McCulloch-Pitts neurons
is known as a Perceptron.
§ Powerful computational device.
Logic Gate Implementation
§ McCulloch-Pitts neurons can be used to to
implement the basic logic gates.
§ find the appropriate connection weights and neuron
thresholds to produce the right outputs for each set of
inputs.
§ construct simple networks that perform NOT,AND, and
OR.
§ we can construct any logical function from these three
operations that have a much more complex architecture
§ Try to avoid decomposing complex problems into simple
logic gates, by finding the weights and thresholds that
work directly in a Perceptron architecture.
Logic Gates Implementation
§ We need to determine the weights and
thresholds.
Decision Boundaries for Logic
circuits
Solve analytically for And gate’s weights
§ two weights w1 and w2 and the threshold q, and for each
training pattern we need to satisfy
§ The training data lead to four inequalities:
§ It is easy to see that there are an infinite number of
solutions. Similarly, there are an infinite number of
solutions for the NOT and OR networks.
Decision Boundaries for XOR
Limitations of single Perceptron
§ for the XOR network:
§ the second and third inequalities are incompatible with the
fourth,
§ No solution.
§ More complex networks is needed , e.g. that combine together
many simple networks, or use different activation/
thresholding/transfer functions.
§ It then becomes much more difficult to determine all the
weights and thresholds by hand.
Activation /Transfer Functions
§ Step Function
§ Sigmoid Function
§ Sigmoid Function
§ Hyperbolic Tangent
§ Piecewise Linear
Threshold as a weight component
§ To simplify the mathematical description
§ Assume W0j=−𝜃#, out0=1
§ The perceptron Equation Becomes
Perceptron Learning
§ If the network weights at time t are wij(t) , then
the shifting process corresponds to moving them
by an amount Δwij(t) so that at time t+1 we have
weights
wij (t +1) = wij (t) + ∆wij (t)
∆wij (t)= η(tj-oj )xi
§ It is convenient to treat the thresholds as
weights, as discussed previously, so we don’t
need separate equations for them.
Convergence of Perceptron Learning
§ The weight changes ∆wij need to be applied repeatedly – for each
weight wij in the network, and for each training pattern in the
training set. One pass through all the weights for the whole
training set is called one epoch of training.
§ Eventually, usually after many epochs, when all the network
outputs match the targets for all the training patterns, all the ∆
wij will be zero and the process of training will increase.We then say
that the training process has converged to a solution.
§ If the weights can be found in a finite number of iterations.
§ if a problem is linearly separable.
§ problem correctly defined.
§ The step is sufficiently small.
General Decision Boundaries
Learning by Error Minimization
§ minimize the difference between the actual
outputs outj and the desired outputs targj.
§ Error Function to quantify this difference
using the sum Square Error:
E (wij )=∑∑(targj -out j)2
§ A systematic procedure for doing this
requires the knowledge of how the error
E(wij) varies as we change the weights wij, i.e.
the gradient of E with respect to wij.
Computing Gradient and Derivatives
§ The gradient, or rate of change, of f(x) at a particular
value of x, as we change x can be approximated by Dy/Dx.
which is known as the partial derivative of f(x) with
respect to x.
Gradient Descent Minimization
§ Suppose we have a function f(x) and we want to change the value of x to minimize f(x).
§ What we need to do depends on the gradient of f(x).There are three cases to consider:
§ If > 0 then f(x) increases as x increases so we should decrease x
§ If < 0 then f(x) decreases as x increases so we should increase x
§ If = 0 then f(x) is at a maximum or minimum so we should not change x
§ In summary, we can decrease f(x) by changing x by the amount:
§ where η is a small positive constant specifying how much we change x by, and the derivative
∂f/∂x tells us which direction to go in. If we repeatedly use this equation, f(x) will (assuming h is
sufficiently small) keep descending towards its minimum, and hence this procedure is known as
gradient descent minimization.
x
f
¶
¶
x
f
¶
¶
x
f
¶
¶
Gradients in more than one Direction
Gradient Descent Error Minimization
Gradient Descent Error Minimization
§ Remember that we want to train our neural networks by adjusting
their weights wij in order to minimize the error function:
§ We now see it makes sense to do this by a series of gradient descent
weight updates:
§ If the transfer function for the output neurons is f(x), and the activations
of the previous layer of neurons are ini , then the outputs are
and
§ Dealing with equations like this is easy if we use the chain rules for
derivatives.
Weights Derivatives Calculation
§ Chain Rule
§ Calculating the derivative of
Weights Derivative Calculation
Kronecker Delta symbol 𝛿	ijdefined such that 𝛿	ij= 1 when i = j and Type	equation	here.ij= 0 when i ≠j
Delta Rule
§ the basic gradient descent learning algorithm for single layer
networks:
§ Involving the derivative of the transfer function f(x).
§ problematic for the simple Perceptron that uses the step function
sgn(x) as its threshold function, because this has zero derivative
everywhere except at x = 0 where it is infinite.
§ zx
Multi Layer Neural Network
𝑎5
(#)
= 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛	𝑜𝑓	𝑢𝑛𝑖𝑡	𝑖	𝑖𝑛	𝑙𝑎𝑦𝑒𝑟	𝑗
𝑤
(#)
= 𝑚𝑎𝑡𝑟𝑖𝑥	𝑜𝑓	𝑤𝑒𝑖𝑔ℎ𝑡𝑠	𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑙𝑖𝑛𝑔
	𝑓𝑢𝑛𝑐𝑖𝑜𝑛	𝑚𝑎𝑝𝑝𝑖𝑛𝑔	𝑓𝑟𝑜𝑚	𝑙𝑎𝑦𝑒𝑟	𝑗	
𝑡𝑜	𝑙𝑎𝑦𝑒𝑟	𝑗 + 1
𝑎O
(P)
= 𝑠𝑖𝑔(𝑤OQ
O
𝑥Q + 𝑤OO
O
𝑥O + 𝑤OP
O
𝑥P + 𝑤OR
(O)
𝑥R)
𝑎P
(P)
= 𝑠𝑖𝑔(𝑤PQ
O
𝑥Q + 𝑤PO
O
𝑥O + 𝑤PP
O
𝑥P + 𝑤PR
(O)
𝑥R)
𝑎R
(P)
= 𝑠𝑖𝑔(𝑤RQ
O
𝑥Q + 𝑤RO
O
𝑥O + 𝑤RP
O
𝑥P + 𝑤RR
(O)
𝑥R)
ℎS (𝑥) = 𝑠𝑖𝑔(𝑤OQ
P
𝑎Q
(P)
+ 𝑤OO
P
𝑎O
(P)
+ 𝑤OP
P
𝑎P
(P)
+ 𝑤OR
(P)
𝑎R
(P)
)
Machine Learning Fall 2018 Inas A.Yassine 25
Xnor Using MultiLayer ANN
Machine Learning Fall 2018 Inas A.Yassine 26
Multi Output Multi Layer
ℎS 𝑥 ∈ ℝV
ℎS(𝑥) ≈
1
0
0
0
, ℎS(𝑥) ≈
0
1
0
0
, ℎS(𝑥) ≈
0
0
1
0
, etc..
Machine Learning Fall 2018 Inas A.Yassine 27
MultiLayer Neural Networks
Back propagation Algorithm
30
Derivation of the Backpropagation algorithm
For output units
So:
Source: http://www.speech.sri.com/people/anand/771/html/node37.html
output
hidden
input
31
Derivation of the Backpropagation algorithm
For Hidden units
Also:
So:
Source: http://www.speech.sri.com/people/anand/771/html/node37.html
output
hidden
input
j
k
32
Backpropagation - example
§ First calculate error of output units and use this
to change the output layer of weights.
Current output: oj=0.2
Correct output: tj=1.0
Error δj = oj(1–oj)(tj–oj)
0.2(1–0.2)(1–0.2)=0.128
output
hidden
input
Update weights into j
ijji ow hd=D
Source: Raymond J. Mooney, University ofTexas at Austin, CS 391L: Machine Learning Neural Networks
33
Backpropagation - example
§ Next calculate error for hidden units based on
errors on the output units it feeds into.
å-=
k
kjkjjj woo dd )1(
output
hidden
input
Source: Raymond J. Mooney, University ofTexas at Austin, CS 391L: Machine Learning Neural Networks
34
Backpropagation - example
§ Finally update bottom layer of weights based on
errors calculated for hidden units.
å-=
k
kjkjjj woo dd )1(
output
hidden
input
Update weights into j
jijji xw hd=D
Source: Raymond J. Mooney, University ofTexas at Austin, CS 391L: Machine Learning Neural Networks
35
Error Backpropagation
§ Next calculate error for hidden units based on
errors on the output units it feeds into.
output
hidden
input
å-=
k
kjkjjj woo dd )1(
36
Error Backpropagation
§ Finally update bottom layer of weights based on
errors calculated for hidden units.
output
hidden
input
å-=
k
kjkjjj woo dd )1(
Update weights into j
ijji ow hd=D
Notes on Back propagation
Algorithm
§ Gradient Descent over entire network weight
vector
§ Easily generalized to arbitrary directed graphs
§ Will find a local, not necessarily global error
minimum
§ In practice, often works will (can turn multiple times)
§ Often include weight with a momentum
§ Minimize error over training examples
§ Will it generalize well to subsequent examples?
38
Sample Learned XOR Network
3.11
-7.386.96
-5.24
-3.6
-3.58
-5.57
-5.74
-2.03A
X Y
B
Hidden Unit A represents: ¬(X ÙY)
Hidden Unit B represents: ¬(X ÚY)
Output O represents: A Ù ¬B = ¬(X ÙY) Ù (X ÚY)
= X ÅY
O
39
Hidden Unit Representations
§ Trained hidden units can be seen as newly
constructed features that make the target concept
linearly separable in the transformed space.
§ can be interpreted as representing meaningful
features such as vowel detectors or edge
detectors, etc..
§ become a distributed representation of the input
in which each individual unit is not easily
interpretable as a meaningful feature.
Learning Hidden layer
Representations
Convergence of Backpropagation
§ Gradient descent to some local minimum
§ Perhaps not the global minimum
§ Add momentum
§ Stochastic gradient descent
§ Use multiple initial weights
§ Nature of convergence
§ Initialize weights near zero
§ Initial network can be nonlinear
Expressive Capabilities of ANN
§ Boolean Functions
§ Every Boolean function can be expressed by a
network of single hidden layer.
§ How many hidden units?
§ Continuous functions
§ Every bounded continuous function can be
approximated with arbitrarily small error.
§ Any function can be approximated to arbitrary
accuracy by a network with 2 hidden layers.
Overfitting in ANN
§ If we have too many
features, the learned
hypothesis may fit the
training set very well,
but fail to generalize to
new examples…
How to address overfitting
§ Plot the hypothesis
§ Lot of features?, lot of classes?
§ Reduce number of features:
§ Manually select which features to keep
§ Model selection algorithm ( feature reduction, throwing
some information
§ Regularization
§ Keep all features but reduce magnitude/values of the
parameters theta
§ Works well in case of lots of features, where each
contributes a bit to predict y
Machine Learning Fall 2018 Inas A.Yassine 44
45
Determining the Best
Number of Hidden Units
§ Too few hidden units prevents the network from
adequately fitting the data.
§ Too many hidden units can result in over-fitting.
§ Use internal cross-validation to empirically determine an
optimal number of hidden units.
error
on training data
on test data
0
# hidden units
Penalize …
§ 𝑚𝑖𝑛S ∑ (ℎS 𝑥(5) − 𝑦(5))PZ
5[O
§ 𝑚𝑖𝑛S ∑ (ℎS 𝑥 5 − 𝑦 5 )P + 100𝑤R
PZ
5[O +
100𝑤V
P
Machine Learning Fall 2018 Inas A.Yassine 46
Regularization
§ Small values to parameters thetas:
§ Simpler hypothesis
§ Less prone to overfitting
§ Which features to pick to screw it down, then
add a regularization term to shrink every single
parameter
𝑗 𝑤 =
1
2𝑚
(](ℎS 𝑥 5 − 𝑦 5 )P + 𝜆 ] 𝑤#
P
_
#[O
)
Z
5[O
Machine Learning Fall 2018 Inas A.Yassine 47
Regularization
§ Control the trade of between fitting and
keeping the parameter small to decrease
overfitting problem:
§ getting a curve much smoother and much
simpler
§ How to choose lambda,
§ if high, then we almost got to the hypothesis
=𝑤Q .
Machine Learning Fall 2018 Inas A.Yassine 48
Regularized Gradient Descent
§ Gradient descent
§ Repeat{
§ 𝑤Q : = 𝑤Q − 𝜂
O
Z
∑ ℎS 𝑥 5 − 𝑦 5 𝑥Q
(5)Z
5[O
§ 𝑤# : = 𝑤# − 𝜂
O
Z
∑ ℎS 𝑥 5 − 𝑦 5 𝑥#
5
+
b
Z
𝑤#
Z
5[O
§ 𝑤# : = 𝑤# (1 − 𝜂
b
Z
) − 𝜂
O
Z
∑ ℎS 𝑥 5 − 𝑦 5 𝑥#
(5)Z
5[O
(1 − 𝜂
b
Z
)<1
Machine Learning Fall 2018 Inas A.Yassine 49

More Related Content

What's hot

Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
Hiroshi Kuwajima
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMM
Jungkyu Lee
 
Murpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelMurpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. Kernel
Jungkyu Lee
 
머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model
Jungkyu Lee
 
Skiena algorithm 2007 lecture18 application of dynamic programming
Skiena algorithm 2007 lecture18 application of dynamic programmingSkiena algorithm 2007 lecture18 application of dynamic programming
Skiena algorithm 2007 lecture18 application of dynamic programming
zukun
 
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Avichai Cohen
 

What's hot (20)

Lecture 06 marco aurelio ranzato - deep learning
Lecture 06   marco aurelio ranzato - deep learningLecture 06   marco aurelio ranzato - deep learning
Lecture 06 marco aurelio ranzato - deep learning
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMM
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
 
Murpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelMurpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. Kernel
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Mgm
MgmMgm
Mgm
 
Lesson 38
Lesson 38Lesson 38
Lesson 38
 
Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
 
The Perceptron and its Learning Rule
The Perceptron and its Learning RuleThe Perceptron and its Learning Rule
The Perceptron and its Learning Rule
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model
 
Skiena algorithm 2007 lecture18 application of dynamic programming
Skiena algorithm 2007 lecture18 application of dynamic programmingSkiena algorithm 2007 lecture18 application of dynamic programming
Skiena algorithm 2007 lecture18 application of dynamic programming
 
Cs229 notes11
Cs229 notes11Cs229 notes11
Cs229 notes11
 
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
 
SVM (Support Vector Machine & Kernel)
SVM (Support Vector Machine & Kernel)SVM (Support Vector Machine & Kernel)
SVM (Support Vector Machine & Kernel)
 
[DL輪読会]Generative Models of Visually Grounded Imagination
[DL輪読会]Generative Models of Visually Grounded Imagination[DL輪読会]Generative Models of Visually Grounded Imagination
[DL輪読会]Generative Models of Visually Grounded Imagination
 
Approximate Tree Kernels
Approximate Tree KernelsApproximate Tree Kernels
Approximate Tree Kernels
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 

Similar to Machine Learning 1

Integral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewerIntegral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewer
JoshuaAgcopra
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
MayuraD1
 

Similar to Machine Learning 1 (20)

PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Illustrative Introductory Neural Networks
Illustrative Introductory Neural NetworksIllustrative Introductory Neural Networks
Illustrative Introductory Neural Networks
 
Integral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewerIntegral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewer
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Linear algebra havard university
Linear algebra havard universityLinear algebra havard university
Linear algebra havard university
 
PRML Chapter 6
PRML Chapter 6PRML Chapter 6
PRML Chapter 6
 
Function Approx2009
Function Approx2009Function Approx2009
Function Approx2009
 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptx
 

More from cairo university

More from cairo university (20)

Tocci chapter 13 applications of programmable logic devices extended
Tocci chapter 13 applications of programmable logic devices extendedTocci chapter 13 applications of programmable logic devices extended
Tocci chapter 13 applications of programmable logic devices extended
 
Tocci chapter 12 memory devices
Tocci chapter 12 memory devicesTocci chapter 12 memory devices
Tocci chapter 12 memory devices
 
Tocci ch 9 msi logic circuits
Tocci ch 9 msi logic circuitsTocci ch 9 msi logic circuits
Tocci ch 9 msi logic circuits
 
Tocci ch 7 counters and registers modified x
Tocci ch 7 counters and registers modified xTocci ch 7 counters and registers modified x
Tocci ch 7 counters and registers modified x
 
Tocci ch 6 digital arithmetic operations and circuits
Tocci ch 6 digital arithmetic operations and circuitsTocci ch 6 digital arithmetic operations and circuits
Tocci ch 6 digital arithmetic operations and circuits
 
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...
 
A15 sedra ch 15 memory circuits
A15  sedra ch 15 memory circuitsA15  sedra ch 15 memory circuits
A15 sedra ch 15 memory circuits
 
A14 sedra ch 14 advanced mos and bipolar logic circuits
A14  sedra ch 14 advanced mos and bipolar logic circuitsA14  sedra ch 14 advanced mos and bipolar logic circuits
A14 sedra ch 14 advanced mos and bipolar logic circuits
 
A13 sedra ch 13 cmos digital logic circuits
A13  sedra ch 13 cmos digital logic circuitsA13  sedra ch 13 cmos digital logic circuits
A13 sedra ch 13 cmos digital logic circuits
 
A09 sedra ch 9 frequency response
A09  sedra ch 9 frequency responseA09  sedra ch 9 frequency response
A09 sedra ch 9 frequency response
 
5 sedra ch 05 mosfet.ppsx
5  sedra ch 05  mosfet.ppsx5  sedra ch 05  mosfet.ppsx
5 sedra ch 05 mosfet.ppsx
 
5 sedra ch 05 mosfet
5  sedra ch 05  mosfet5  sedra ch 05  mosfet
5 sedra ch 05 mosfet
 
5 sedra ch 05 mosfet revision
5  sedra ch 05  mosfet revision5  sedra ch 05  mosfet revision
5 sedra ch 05 mosfet revision
 
Fields Lec 2
Fields Lec 2Fields Lec 2
Fields Lec 2
 
Fields Lec 1
Fields Lec 1Fields Lec 1
Fields Lec 1
 
Fields Lec 5&amp;6
Fields Lec 5&amp;6Fields Lec 5&amp;6
Fields Lec 5&amp;6
 
Fields Lec 4
Fields Lec 4Fields Lec 4
Fields Lec 4
 
Fields Lec 3
Fields Lec 3Fields Lec 3
Fields Lec 3
 
Lecture 2 (system overview of c8051 f020) rv01
Lecture 2 (system overview of c8051 f020) rv01Lecture 2 (system overview of c8051 f020) rv01
Lecture 2 (system overview of c8051 f020) rv01
 
Lecture 1 (course overview and 8051 architecture) rv01
Lecture 1 (course overview and 8051 architecture) rv01Lecture 1 (course overview and 8051 architecture) rv01
Lecture 1 (course overview and 8051 architecture) rv01
 

Recently uploaded

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (20)

Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 

Machine Learning 1

  • 1. Advanced Topics in Systems Machine Learning Neural Networks Inas A. Yassine Systems and Biomedical Engineering Department, Faculty of Engineering - Cairo University iyassine@eng.cu.edu.eg
  • 2. Neurons and Brain § Neural network mimic the brain , hear, visualize, find the geometric relations, ….. through single learning algorithm § Simulating network of neurons Machine Learning Fall 2018 Inas A.Yassine 2
  • 4. The Perceptron § A number of McCulloch-Pitts neurons can be connected together in any way . § An arrangement of one input layer of McCulloch-Pitts neurons feeding forward to one output layer of McCulloch-Pitts neurons is known as a Perceptron. § Powerful computational device.
  • 5. Logic Gate Implementation § McCulloch-Pitts neurons can be used to to implement the basic logic gates. § find the appropriate connection weights and neuron thresholds to produce the right outputs for each set of inputs. § construct simple networks that perform NOT,AND, and OR. § we can construct any logical function from these three operations that have a much more complex architecture § Try to avoid decomposing complex problems into simple logic gates, by finding the weights and thresholds that work directly in a Perceptron architecture.
  • 6. Logic Gates Implementation § We need to determine the weights and thresholds.
  • 7. Decision Boundaries for Logic circuits
  • 8. Solve analytically for And gate’s weights § two weights w1 and w2 and the threshold q, and for each training pattern we need to satisfy § The training data lead to four inequalities: § It is easy to see that there are an infinite number of solutions. Similarly, there are an infinite number of solutions for the NOT and OR networks.
  • 10. Limitations of single Perceptron § for the XOR network: § the second and third inequalities are incompatible with the fourth, § No solution. § More complex networks is needed , e.g. that combine together many simple networks, or use different activation/ thresholding/transfer functions. § It then becomes much more difficult to determine all the weights and thresholds by hand.
  • 11. Activation /Transfer Functions § Step Function § Sigmoid Function § Sigmoid Function § Hyperbolic Tangent § Piecewise Linear
  • 12. Threshold as a weight component § To simplify the mathematical description § Assume W0j=−𝜃#, out0=1 § The perceptron Equation Becomes
  • 13. Perceptron Learning § If the network weights at time t are wij(t) , then the shifting process corresponds to moving them by an amount Δwij(t) so that at time t+1 we have weights wij (t +1) = wij (t) + ∆wij (t) ∆wij (t)= η(tj-oj )xi § It is convenient to treat the thresholds as weights, as discussed previously, so we don’t need separate equations for them.
  • 14. Convergence of Perceptron Learning § The weight changes ∆wij need to be applied repeatedly – for each weight wij in the network, and for each training pattern in the training set. One pass through all the weights for the whole training set is called one epoch of training. § Eventually, usually after many epochs, when all the network outputs match the targets for all the training patterns, all the ∆ wij will be zero and the process of training will increase.We then say that the training process has converged to a solution. § If the weights can be found in a finite number of iterations. § if a problem is linearly separable. § problem correctly defined. § The step is sufficiently small.
  • 16. Learning by Error Minimization § minimize the difference between the actual outputs outj and the desired outputs targj. § Error Function to quantify this difference using the sum Square Error: E (wij )=∑∑(targj -out j)2 § A systematic procedure for doing this requires the knowledge of how the error E(wij) varies as we change the weights wij, i.e. the gradient of E with respect to wij.
  • 17. Computing Gradient and Derivatives § The gradient, or rate of change, of f(x) at a particular value of x, as we change x can be approximated by Dy/Dx. which is known as the partial derivative of f(x) with respect to x.
  • 18. Gradient Descent Minimization § Suppose we have a function f(x) and we want to change the value of x to minimize f(x). § What we need to do depends on the gradient of f(x).There are three cases to consider: § If > 0 then f(x) increases as x increases so we should decrease x § If < 0 then f(x) decreases as x increases so we should increase x § If = 0 then f(x) is at a maximum or minimum so we should not change x § In summary, we can decrease f(x) by changing x by the amount: § where η is a small positive constant specifying how much we change x by, and the derivative ∂f/∂x tells us which direction to go in. If we repeatedly use this equation, f(x) will (assuming h is sufficiently small) keep descending towards its minimum, and hence this procedure is known as gradient descent minimization. x f ¶ ¶ x f ¶ ¶ x f ¶ ¶
  • 19. Gradients in more than one Direction
  • 20. Gradient Descent Error Minimization
  • 21. Gradient Descent Error Minimization § Remember that we want to train our neural networks by adjusting their weights wij in order to minimize the error function: § We now see it makes sense to do this by a series of gradient descent weight updates: § If the transfer function for the output neurons is f(x), and the activations of the previous layer of neurons are ini , then the outputs are and § Dealing with equations like this is easy if we use the chain rules for derivatives.
  • 22. Weights Derivatives Calculation § Chain Rule § Calculating the derivative of
  • 23. Weights Derivative Calculation Kronecker Delta symbol 𝛿 ijdefined such that 𝛿 ij= 1 when i = j and Type equation here.ij= 0 when i ≠j
  • 24. Delta Rule § the basic gradient descent learning algorithm for single layer networks: § Involving the derivative of the transfer function f(x). § problematic for the simple Perceptron that uses the step function sgn(x) as its threshold function, because this has zero derivative everywhere except at x = 0 where it is infinite. § zx
  • 25. Multi Layer Neural Network 𝑎5 (#) = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑢𝑛𝑖𝑡 𝑖 𝑖𝑛 𝑙𝑎𝑦𝑒𝑟 𝑗 𝑤 (#) = 𝑚𝑎𝑡𝑟𝑖𝑥 𝑜𝑓 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑖𝑜𝑛 𝑚𝑎𝑝𝑝𝑖𝑛𝑔 𝑓𝑟𝑜𝑚 𝑙𝑎𝑦𝑒𝑟 𝑗 𝑡𝑜 𝑙𝑎𝑦𝑒𝑟 𝑗 + 1 𝑎O (P) = 𝑠𝑖𝑔(𝑤OQ O 𝑥Q + 𝑤OO O 𝑥O + 𝑤OP O 𝑥P + 𝑤OR (O) 𝑥R) 𝑎P (P) = 𝑠𝑖𝑔(𝑤PQ O 𝑥Q + 𝑤PO O 𝑥O + 𝑤PP O 𝑥P + 𝑤PR (O) 𝑥R) 𝑎R (P) = 𝑠𝑖𝑔(𝑤RQ O 𝑥Q + 𝑤RO O 𝑥O + 𝑤RP O 𝑥P + 𝑤RR (O) 𝑥R) ℎS (𝑥) = 𝑠𝑖𝑔(𝑤OQ P 𝑎Q (P) + 𝑤OO P 𝑎O (P) + 𝑤OP P 𝑎P (P) + 𝑤OR (P) 𝑎R (P) ) Machine Learning Fall 2018 Inas A.Yassine 25
  • 26. Xnor Using MultiLayer ANN Machine Learning Fall 2018 Inas A.Yassine 26
  • 27. Multi Output Multi Layer ℎS 𝑥 ∈ ℝV ℎS(𝑥) ≈ 1 0 0 0 , ℎS(𝑥) ≈ 0 1 0 0 , ℎS(𝑥) ≈ 0 0 1 0 , etc.. Machine Learning Fall 2018 Inas A.Yassine 27
  • 30. 30 Derivation of the Backpropagation algorithm For output units So: Source: http://www.speech.sri.com/people/anand/771/html/node37.html output hidden input
  • 31. 31 Derivation of the Backpropagation algorithm For Hidden units Also: So: Source: http://www.speech.sri.com/people/anand/771/html/node37.html output hidden input j k
  • 32. 32 Backpropagation - example § First calculate error of output units and use this to change the output layer of weights. Current output: oj=0.2 Correct output: tj=1.0 Error δj = oj(1–oj)(tj–oj) 0.2(1–0.2)(1–0.2)=0.128 output hidden input Update weights into j ijji ow hd=D Source: Raymond J. Mooney, University ofTexas at Austin, CS 391L: Machine Learning Neural Networks
  • 33. 33 Backpropagation - example § Next calculate error for hidden units based on errors on the output units it feeds into. å-= k kjkjjj woo dd )1( output hidden input Source: Raymond J. Mooney, University ofTexas at Austin, CS 391L: Machine Learning Neural Networks
  • 34. 34 Backpropagation - example § Finally update bottom layer of weights based on errors calculated for hidden units. å-= k kjkjjj woo dd )1( output hidden input Update weights into j jijji xw hd=D Source: Raymond J. Mooney, University ofTexas at Austin, CS 391L: Machine Learning Neural Networks
  • 35. 35 Error Backpropagation § Next calculate error for hidden units based on errors on the output units it feeds into. output hidden input å-= k kjkjjj woo dd )1(
  • 36. 36 Error Backpropagation § Finally update bottom layer of weights based on errors calculated for hidden units. output hidden input å-= k kjkjjj woo dd )1( Update weights into j ijji ow hd=D
  • 37. Notes on Back propagation Algorithm § Gradient Descent over entire network weight vector § Easily generalized to arbitrary directed graphs § Will find a local, not necessarily global error minimum § In practice, often works will (can turn multiple times) § Often include weight with a momentum § Minimize error over training examples § Will it generalize well to subsequent examples?
  • 38. 38 Sample Learned XOR Network 3.11 -7.386.96 -5.24 -3.6 -3.58 -5.57 -5.74 -2.03A X Y B Hidden Unit A represents: ¬(X ÙY) Hidden Unit B represents: ¬(X ÚY) Output O represents: A Ù ¬B = ¬(X ÙY) Ù (X ÚY) = X ÅY O
  • 39. 39 Hidden Unit Representations § Trained hidden units can be seen as newly constructed features that make the target concept linearly separable in the transformed space. § can be interpreted as representing meaningful features such as vowel detectors or edge detectors, etc.. § become a distributed representation of the input in which each individual unit is not easily interpretable as a meaningful feature.
  • 41. Convergence of Backpropagation § Gradient descent to some local minimum § Perhaps not the global minimum § Add momentum § Stochastic gradient descent § Use multiple initial weights § Nature of convergence § Initialize weights near zero § Initial network can be nonlinear
  • 42. Expressive Capabilities of ANN § Boolean Functions § Every Boolean function can be expressed by a network of single hidden layer. § How many hidden units? § Continuous functions § Every bounded continuous function can be approximated with arbitrarily small error. § Any function can be approximated to arbitrary accuracy by a network with 2 hidden layers.
  • 43. Overfitting in ANN § If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples…
  • 44. How to address overfitting § Plot the hypothesis § Lot of features?, lot of classes? § Reduce number of features: § Manually select which features to keep § Model selection algorithm ( feature reduction, throwing some information § Regularization § Keep all features but reduce magnitude/values of the parameters theta § Works well in case of lots of features, where each contributes a bit to predict y Machine Learning Fall 2018 Inas A.Yassine 44
  • 45. 45 Determining the Best Number of Hidden Units § Too few hidden units prevents the network from adequately fitting the data. § Too many hidden units can result in over-fitting. § Use internal cross-validation to empirically determine an optimal number of hidden units. error on training data on test data 0 # hidden units
  • 46. Penalize … § 𝑚𝑖𝑛S ∑ (ℎS 𝑥(5) − 𝑦(5))PZ 5[O § 𝑚𝑖𝑛S ∑ (ℎS 𝑥 5 − 𝑦 5 )P + 100𝑤R PZ 5[O + 100𝑤V P Machine Learning Fall 2018 Inas A.Yassine 46
  • 47. Regularization § Small values to parameters thetas: § Simpler hypothesis § Less prone to overfitting § Which features to pick to screw it down, then add a regularization term to shrink every single parameter 𝑗 𝑤 = 1 2𝑚 (](ℎS 𝑥 5 − 𝑦 5 )P + 𝜆 ] 𝑤# P _ #[O ) Z 5[O Machine Learning Fall 2018 Inas A.Yassine 47
  • 48. Regularization § Control the trade of between fitting and keeping the parameter small to decrease overfitting problem: § getting a curve much smoother and much simpler § How to choose lambda, § if high, then we almost got to the hypothesis =𝑤Q . Machine Learning Fall 2018 Inas A.Yassine 48
  • 49. Regularized Gradient Descent § Gradient descent § Repeat{ § 𝑤Q : = 𝑤Q − 𝜂 O Z ∑ ℎS 𝑥 5 − 𝑦 5 𝑥Q (5)Z 5[O § 𝑤# : = 𝑤# − 𝜂 O Z ∑ ℎS 𝑥 5 − 𝑦 5 𝑥# 5 + b Z 𝑤# Z 5[O § 𝑤# : = 𝑤# (1 − 𝜂 b Z ) − 𝜂 O Z ∑ ℎS 𝑥 5 − 𝑦 5 𝑥# (5)Z 5[O (1 − 𝜂 b Z )<1 Machine Learning Fall 2018 Inas A.Yassine 49