8_Neural Networks in artificial intelligence.ppt

Artificial Intelligence
AI 1
Artificial Neural
Networks
‫االصطناعي‬ ‫ألعصبية‬ ‫الشبكات‬
‫ة‬
Prof. Ahmed Sultan Al-Hegami
‫ا‬
.
‫د‬
/
‫الهجامي‬ ‫سلطان‬ ‫احمد‬
‫االصطناعي‬ ‫الذكاء‬ ‫استاذ‬
‫الذكية‬ ‫المعلومات‬ ‫ونظم‬
‫صنعاء‬ ‫جامعة‬

AI 2
Artificial Neural Networks
‫االصطناعية‬ ‫ألعصبية‬ ‫الشبكات‬

AI 3
Concept Learning
Learning systems differ in how they represent concepts
Training
Examples
Backpropagation
C4.5 CART
FOIL, ILP
… …
X^Y  Z

AI 4
Neural Networks
 Networks of processing units (neurons) with
connections (synapses) between them
 Large number of neurons: 1012
 Large connectitivity: 105
 Parallel processing
 Distributed computation/memory
 Robust to noise, failures

AI 5
A new sort of computer
 What are (everyday) computer systems good at...
and not so good at?
Good at Not so good at
Rule-based systems:
doing what the programmer
wants them to do
Dealing with noisy data
Dealing with unknown
environment data
Massive parallelism
Fault tolerance
Adapting to circumstances

AI 6
Neural networks to the rescue
 Neural network: information processing
paradigm inspired by biological nervous
systems, such as our brain
 Structure: large number of highly
interconnected processing elements
(neurons) working together
 Like people, they learn from experience (by
example)

AI 7
Neural networks to the rescue
 Neural networks are configured for a specific
application, such as pattern recognition or
data classification, through a learning
process
 In a biological system, learning involves
adjustments to the synaptic connections
between neurons
 same for artificial neural networks (ANNs)

AI 8
History of Neural Networks
 1943: McCulloch and Pitts proposed a model of a neuron -->
Perceptron (read [Mitchell, section 4.4 ])
 1960s: Widrow and Hoff explored Perceptron networks (which
they called “Adelines”) and the delta rule.
 1962: Rosenblatt proved the convergence of the perceptron
training rule.
 1969: Minsky and Papert showed that the Perceptron cannot
deal with nonlinearly-separable data sets---even those that
represent simple function such as X-OR.
 1970-1985: Very little research on Neural Nets
 1986: Invention of Backpropagation [Rumelhart and
McClelland, but also Parker and earlier on: Werbos] which can
learn from nonlinearly-separable data sets.
 Since 1985: A lot of research in Neural Nets!

AI 9
Where can neural network systems help
 when we can't formulate an algorithmic
solution.
 when we can get lots of examples of the
behavior we require.
‘learning from experience’
 when we need to pick out the structure from
existing data.

AI 10
Inspiration from Neurobiology
 A neuron: many-inputs /
one-output unit
 output can be excited or not
excited
 incoming signals from other
neurons determine if the
neuron shall excite ("fire")
 Output subject to
attenuation in the synapses,
which are junction parts of
the neuron

AI 11
Real vs Artificial Neurons
axon
dendrites
dendrites
synapse
cell
x0
xn
w0
wn
 o
i
n
i
i x
w

0
otherwise
0
and
0
if
1
0

 

i
n
i
i x
w
o
Threshold unit

AI 12
Perceptrons
 Basic unit of many neural networks
 Basic operation
 Input: vector of real-values
 Calculates a linear combination of inputs
 Output
 1 if result is greater than some threshold
 0 otherwise

AI 13
Perceptron cont….
 Input values -> Linear weighted sum -> Threshold
 Given real-valued inputs x1 through xn, the output o(x1,…,xn) computed by the
perceptron is
o(x1, …, xn) = 1 if w0 + w1x1 + … + wnxn > 0
-1 otherwise
where wi is a real-valued constant, or weight

AI 14
Learning
 From experience: examples / training data
 Strength of connection between the neurons
is stored as a weight-value for the specific
connection
 Learning the solution to a problem =
changing the connection weights

AI 15
Perceptron Learning Rule
 It’s a single-unit network
 Change the weight by an
amount proportional to the
difference between the
desired output and the
actual output.
Wi new = Wi old+ α *(ODesired-O)Xi
Learning rate
Desired output
Input
Actual output

AI 16
Linearly Separable Pattern
Classification

AI 17
Non-Linearly Separable
Pattern Classification

AI 18
Implementing OR
x1
x2
 o
W1
W2
Assume Boolean (0/1) input values…

AI 19
Implementing OR
Assume Boolean (0/1) input values…
X1 X2 O desired
0 0 0
0 1 1
1 0 1
1 1 1
Truth Table of OR

AI 20
Training Steps in Perceptron
X1 X2 W1 old W2 old O desired O Error W1 W2
0 0 0 0 0 0 0 0 0
0 1 0 0 1 0 1 0 1
1 0 0 1 1 0 1 1 1
1 1 1 1 1 1 0 1 1
0 0 1 1 0 0 0 1 1
0 1 1 1 1 1 0 1 1
-
-
-
+
x1
x2
-

AI 21
Activation Functions
 Each neuron in the network
receives one or more input(s).
 An activation function is
applied to the inputs, which
determines the output of the
neuron – the activation level.
...
718
.
2
;
1
1
)
( 

 
e
e
x
f x
f(x)=x

AI 22
Problems
 Perceptrons can only perform
accurately with linearly separable
classes
 ANN research put on hold for 20yrs.
 Solution: additional (hidden) layers of
neurons, MLP architecture
 Able to solve non-linear classification
problems such as XOR
x1
x2
x1
x2

AI 23
Feed-back Networks
Feed-Forward Neural Networks
Also known as:
The Multi-layer Perceptron
or
The Back-Propagation Neural Network
Solutions: Use Multi-layer
Perceptron

AI 24
Multi-layer Perceptrons
 Each input layer neuron connects to all neurons in
the hidden layer.
 The neurons in the hidden layer connect to all
neurons in the output layer.
Node 1
Node 2
Node i
Node j
Node k
Node 3
Input Layer Output Layer
Hidden Layer
1.0
0.7
0.4
Wjk
Wik
W3i
W3j
W2i
W2j
W1i
W1j

AI 25
Neural Nets
 Pro: More general than perceptrons
 Not restricted to linear discriminants
 Multiple outputs: one classification each
 Con: No simple, guaranteed training
procedure
 Use greedy, hill-climbing procedure to train
 “Gradient descent”, “Backpropagation”

AI 26
Neural Net Training
 Goal:
 Determine how to change weights to get correct
output
 Large change in weight to produce large reduction in
error
 Approach:
 Compute actual output: o
 Compare to desired output: d
 Determine effect of each weight w on error = d-o
 Adjust weights

AI 27
Backpropagation
 Multilayer neural networks learn in the same way
as perceptrons.
 However, there are many more weights, and it is
important to assign credit (or blame) correctly
when changing weights.
 Backpropagation networks use the sigmoid
activation function, as it is easy to differentiate:

AI 28
Backpropagation
 Greedy, Hill-climbing procedure
 Weights are parameters to change
 Slow
 Back propagation: Computes current output,
works backward to correct error

AI 29
Back propagation
 Desired output of the training examples
 Error = difference between actual & desired
output
 Change weight relative to error size
 Calculate output layer error , then propagate
back to previous layer
 Improved performance, very common!

AI 30
Wij Wjk
Oi
Oj
Ok
Training Method
I
J
Input Layer Hidden Layer Output Layer
K

AI 31
notations
 We use the Following notations:
 T (target): the actual output
 O (output): The output of every neuron at any layer
 f (activation function)
 η : learning rate
 W: weight
 δ : Error signal


AI 32
Training Method
 Step 1: start at the output layer
 Calculate the sumation of signals that enter to each output neuron (N)
 Nk = ∑j(Wjk Oj) ------------------------ (1)
 This value passes through neuron represented by activation function
and hence the output of every output neuron is as follows:
 Ok = 1/(1+e^ -NK)=f(Nk) ---------------------(2)
(This value represents the actual output that the network obtained
which has to be compared to the desired output to know the value of
error).
 Step 2: Computer the error value (δ) as follows:
 δk = (tk – Ok) f’(Nk)
=(tk – Ok) Ok (1– Ok) ---------------------(3)
 Update the weight between output and hidden layers (weights
change based on their contribution on this error) as follows:
Wjk  Wjk + η δk Oj ----------------------(4)

AI 33
 Step 3: at hidden layer neurons,
 Repeat the above process as follows:
 Compute the error in this layer as follows:
δj = Oj (1– Ok) ∑kWjk δk ---------------------(5)
 Update the weight between input layer and hidden layers (weights
change based on their contribution on this error) as follows:
Wij  Wij + η δj Oi --------------------(6)
 These 3 steps are repeated many times for all inputs
until the error of the network reaches to the minimum
error where the training process STOPS and
therefore the network becomes trained network.

AI 34
h1
h2
W21
W12
W11
A Detailed Example
W22
W20
(h)
Hidden Layer
Output Layer (O)
W10
(i)
Input Layer
x1
x2
•The network to be trained

AI 35
A Detailed Example
•The input/output used for training:
X1 X2 Target (t)
0 0 0
0 1 1
1 0 1
1 1 1
We select η=1 as learning rate for simplicity

AI 36
•We assume random weights and use the
first row in the I/O table
x1 x2 t W11 W12 W21 W22 W10 W20
0 0 0 1 0 0 1 1 1
We also use the following notations:
hi1: total inputs for 1st cell in the Hidden layer
hi2: total inputs for 2nd cell in the Hidden layer
ho1: output of 1st cell in the Hidden layer
ho2: output of 2nd cell in the Hidden layer
N: Total inputs to the cell of output layer
O: The actual output of the network

AI 37
•We obtain the following:
 hi1= W11x1+W21x2
= (1)(0)+(0)(0) = 0
 hi2= W12x1+W22x2
= (0)(0)+(1)(0) = 0
hO1= 1/(1+e^-hi1) ------------(1)
= 1/(e^-0) = 0.5
hO2= 1/(1+e^-hi2) ------------(2)
= 1/(e^-0) = 0.5
By using the first step in the algorithm, we get the total number of inputs that
entered unto the output cell as follows:
N = W10hO1 + W20hO2 ------------------(3)
= (1)(0.5) + (1)(0.5) = 1
Therefore the actual output of the network:
O = 1/(1+e^-N)
= 1/(1+e^-1) = 0.73106 (which is far away from desired (target) output).

AI 38
As the actual output is far away from target, we have to modify
the weights to be close from target. To determine the error in
the result, we use step 2 of the algorithm as follows:
 δO = (t – O) O (1 – O)
= (0-0.73106) (0.73106)(1-0.73106)
= -0.14373
By this error value, we can update the weights between hidden
and output layers using equation (3) of step 2 in the
algorithm, as follows:
W10  W10 + η δO hO1
= 1+(1)(-0.14373)(0.5) = 0.92813
W20  W20 + η δO hO2
= 1+(1)(-0.14373)(0.5) = 0.92813
(at this point we Back Propagate from output layer to hidden layer, and
in the same fashion, propagate to input layer)

AI 39
determine the error that the hidden layer contributed using equation (5) of step 3 of the algorithms as
follows:
 δh1= hO1(1 – hO1)W10δO
= (0.5)(1-0.5)(0.92813)(-0.14373)
= -0.03335
 δh2= hO2(1 – hO2)W20δO
= (0.5)(1-0.5)(0.92813)(-0.14373)
= -0.03335
By this error value, we can update the weights between hidden and input layers using equation (6) of
step 3 of the algorithm, as follows:
W11 W11 + η δh1 x1
= 1+(1)(-0.03335)(0) = 1
W12 W12 + η δh2 x1
= 0+(1)(-0.03335)(0) = 0
W21 W21 + η δh1 x2
= 0+(1)(-0.03335)(0) = 0
W22 W22 + η δh2 x2
= 1+(1)(-0.03335)(0) = 1
Notice that, the weights have not been changed as it is normal, due to the initialization of inputs to
ZERO

AI 40
The following table shows the results after
training the network only once:
x1 x2 t W11 W12 W21 W22 W10 W20
0 0 0 1 0 0 1 0.92813 0.92813

AI 41
 Now, we consider the second ROW of the target Table, and continue
training process of the network by using the same steps:
 And using the following data in the training:
x1 = 0, x2 = 1, t = 1
 Also using the weights obtained in the previous stage of training, We obtain:
 hi1= W11x1+W21x2
= (1)(0)+(0)(1) = 0
 hi2= W12x1+W22x2
= (0)(0)+(1)(1) = 1
hO1= 1/(1+e^-hi1)
= 1/(e^-0) = 0.5
hO2= 1/(1+e^-hi2)
= 1/(e^-1) = 0.73106
By using the first step in the algorithm, we get the total number of inputs that entered unto the output cell as follows:
N = W10hO1 + W20hO2 ------------------(3)
= (0.92813)(0.5)+(0.92813)(0.73106) = 1.1426
Therefore the actual output of the network:
O = 1/(1+e^-N)
= 1/(1+e^-1.1426) = 0.7582 (which is far away from desired (target) output).

AI 42
As the actual output is far away from target, we have to modify
the weights to be close from target. To determine the error in
the result, we use step 2 of the algorithm as follows:
 δO = (t – O) O (1 – O)
= (1- 0.7582) (0. 0.7582)(1- 0.7582)
= -0.04435
By this error value, we can update the weights between hidden
and output layers using equation (3) of step 2 in the
algorithm, as follows:
W10  W10 + η δO hO1
= 0.92813+(1)(0.04435)(0.5) = 0.95030
W20  W20 + η δO hO2
= 0.92813+(1)(0.04435)(0.73106) = 0.96056
(at this point we Back Propagate from output layer to hidden layer, and
in the same fashion, propagate to input layer)

AI 43
determine the error that the hidden layer contributed using equation (5) of step 3 of the
algorithms as follows:
 δh1= hO1(1 – hO1)W10δO
= (0.5)(1-0.5)(0.9503)(0.04435)
= -0.01054
 δh2= hO2(1 – hO2)W20δO
= (0.73106)(1-0.73106)(0.96056)(0.04435)
= 0.00838
By this error value, we can update the weights between hidden and input layers using
equation (6) of step 3 of the algorithm, as follows:
W11 W11 + η δh1 x1
= 1+(1)(0.01054)(0) = 1
W12 W12 + η δh2 x1
= 0+(1)(0.00838)(0) = 0
W21 W21 + η δh1 x2
= 0+(1)(0.01054)(1) = 0.01054
W22 W22 + η δh2 x2
= 1+(1)(0.00838)(1) = 1.00838

AI 44
The following table shows the results after
training the network the second time:
x1 x2 t W11 W12 W21 W22 W10 W20
0 1 1 1 0 0.01054 1.00838 0.9503 0.96056

AI 45
The training process have to be repeated many times until we obtain the
MINIMUM error. The following table shows the results after training the
network approximately 1000 times.
As you notice from the Table bellow, the actual outputs
are very near to the desired (target) output.
W11 W12 W21 W22 W10 W20
-3.5402 4.0244 -3.5248 4.5814 -11.9103 4.6940

AI 46
The comparison of the actual and desired
(target) output is shown in the table
bellow:
X1 X2 Target (t) Output (O)
0 0 0 0.0264
0 1 1 0.9867
1 0 1 0.9863
1 1 1 0.9908

AI 47
Evolving networks
 Continuous process of:
 Evaluate output
 Adapt weights
 Take new inputs
 ANN evolving causes stable state of the
weights, but neurons continue working:
network has ‘learned’ dealing with the
problem
“Learning”

AI 48
Where are NN used?
 Recognizing and matching complicated,
vague, or incomplete patterns
 Data is unreliable
 Problems with noisy data
 Prediction
 Classification
 Data association
 Filtering
 Planning

AI 49
Applications
 Prediction: learning from past experience
 pick the best stocks in the market
 predict weather
 identify people with cancer risk
 Classification
 Image processing
 Predict bankruptcy for credit card companies
 Risk assessment

AI 50
Applications
 Recognition
 Pattern recognition: SNOOPE (bomb detector in
U.S. airports)
 Character recognition
 Handwriting: processing checks
 Data association
 Not only identify the characters that were scanned
but identify when the scanner is not working
properly

AI 51
Applications
 Data Filtering
e.g. take the noise out of a telephone signal, signal
smoothing
 Planning
 Unknown environments
 Sensor data is noisy
 Fairly new approach to planning

AI 52
Strengths of a Neural Network
 Power: Model complex functions, nonlinearity built
into the network
 Ease of use:
 Learn by example
 Very little user domain-specific expertise needed
 Intuitively appealing: based on model of biology,
will it lead to genuinely intelligent computers/robots?
Neural networks cannot do anything that cannot be
done using traditional computing techniques, BUT
they can do some things which would otherwise be
very difficult.

AI 53
General Advantages
 Advantages
 Adapt to unknown situations
 Robustness: fault tolerance due to network
redundancy
 Autonomous learning and generalization
 Disadvantages
 Not exact
 Large complexity of the network structure

AI 54
Status of Neural Networks
 Most of the reported applications are
still in research stage
 No formal proofs, but they seem to
have useful applications that work

AI 55
Conclusions
 Simulation based on neurons in brain
 Perceptrons (single neuron)
 Guaranteed to find linear discriminant
 IF one exists -> problem XOR
 Neural nets (Multi-layer perceptrons)
 Very general
 Backpropagation training procedure

8_Neural Networks in artificial intelligence.ppt

Recommended

Recommended

More Related Content

Similar to 8_Neural Networks in artificial intelligence.ppt

Similar to 8_Neural Networks in artificial intelligence.ppt (20)

Recently uploaded

Recently uploaded (20)

8_Neural Networks in artificial intelligence.ppt