1. Artificial Intelligence
AI 1
Artificial Neural
Networks
االصطناعي ألعصبية الشبكات
ة
Prof. Ahmed Sultan Al-Hegami
ا
.
د
/
الهجامي سلطان احمد
االصطناعي الذكاء استاذ
الذكية المعلومات ونظم
صنعاء جامعة
2. AI 2
Artificial Neural Networks
االصطناعية ألعصبية الشبكات
Prof. Ahmed Sultan Al-Hegami
3. AI 3
Concept Learning
Learning systems differ in how they represent concepts
Training
Examples
Backpropagation
C4.5 CART
FOIL, ILP
… …
X^Y Z
Prof. Ahmed Sultan Al-Hegami
4. AI 4
Neural Networks
Networks of processing units (neurons) with
connections (synapses) between them
Large number of neurons: 1012
Large connectitivity: 105
Parallel processing
Distributed computation/memory
Robust to noise, failures
Prof. Ahmed Sultan Al-Hegami
5. AI 5
A new sort of computer
What are (everyday) computer systems good at...
and not so good at?
Good at Not so good at
Rule-based systems:
doing what the programmer
wants them to do
Dealing with noisy data
Dealing with unknown
environment data
Massive parallelism
Fault tolerance
Adapting to circumstances
Prof. Ahmed Sultan Al-Hegami
6. AI 6
Neural networks to the rescue
Neural network: information processing
paradigm inspired by biological nervous
systems, such as our brain
Structure: large number of highly
interconnected processing elements
(neurons) working together
Like people, they learn from experience (by
example)
Prof. Ahmed Sultan Al-Hegami
7. AI 7
Neural networks to the rescue
Neural networks are configured for a specific
application, such as pattern recognition or
data classification, through a learning
process
In a biological system, learning involves
adjustments to the synaptic connections
between neurons
same for artificial neural networks (ANNs)
Prof. Ahmed Sultan Al-Hegami
8. AI 8
History of Neural Networks
1943: McCulloch and Pitts proposed a model of a neuron -->
Perceptron (read [Mitchell, section 4.4 ])
1960s: Widrow and Hoff explored Perceptron networks (which
they called “Adelines”) and the delta rule.
1962: Rosenblatt proved the convergence of the perceptron
training rule.
1969: Minsky and Papert showed that the Perceptron cannot
deal with nonlinearly-separable data sets---even those that
represent simple function such as X-OR.
1970-1985: Very little research on Neural Nets
1986: Invention of Backpropagation [Rumelhart and
McClelland, but also Parker and earlier on: Werbos] which can
learn from nonlinearly-separable data sets.
Since 1985: A lot of research in Neural Nets!
Prof. Ahmed Sultan Al-Hegami
9. AI 9
Where can neural network systems help
when we can't formulate an algorithmic
solution.
when we can get lots of examples of the
behavior we require.
‘learning from experience’
when we need to pick out the structure from
existing data.
Prof. Ahmed Sultan Al-Hegami
10. AI 10
Inspiration from Neurobiology
A neuron: many-inputs /
one-output unit
output can be excited or not
excited
incoming signals from other
neurons determine if the
neuron shall excite ("fire")
Output subject to
attenuation in the synapses,
which are junction parts of
the neuron
Prof. Ahmed Sultan Al-Hegami
11. AI 11
Real vs Artificial Neurons
axon
dendrites
dendrites
synapse
cell
x0
xn
w0
wn
o
i
n
i
i x
w
0
otherwise
0
and
0
if
1
0
i
n
i
i x
w
o
Threshold unit
Prof. Ahmed Sultan Al-Hegami
12. AI 12
Perceptrons
Basic unit of many neural networks
Basic operation
Input: vector of real-values
Calculates a linear combination of inputs
Output
1 if result is greater than some threshold
0 otherwise
Prof. Ahmed Sultan Al-Hegami
13. AI 13
Perceptron cont….
Input values -> Linear weighted sum -> Threshold
Given real-valued inputs x1 through xn, the output o(x1,…,xn) computed by the
perceptron is
o(x1, …, xn) = 1 if w0 + w1x1 + … + wnxn > 0
-1 otherwise
where wi is a real-valued constant, or weight
Prof. Ahmed Sultan Al-Hegami
14. AI 14
Learning
From experience: examples / training data
Strength of connection between the neurons
is stored as a weight-value for the specific
connection
Learning the solution to a problem =
changing the connection weights
Prof. Ahmed Sultan Al-Hegami
15. AI 15
Perceptron Learning Rule
It’s a single-unit network
Change the weight by an
amount proportional to the
difference between the
desired output and the
actual output.
Wi new = Wi old+ α *(ODesired-O)Xi
Learning rate
Desired output
Input
Actual output
Prof. Ahmed Sultan Al-Hegami
19. AI 19
Implementing OR
Assume Boolean (0/1) input values…
X1 X2 O desired
0 0 0
0 1 1
1 0 1
1 1 1
Truth Table of OR
Prof. Ahmed Sultan Al-Hegami
20. AI 20
Training Steps in Perceptron
X1 X2 W1 old W2 old O desired O Error W1 W2
0 0 0 0 0 0 0 0 0
0 1 0 0 1 0 1 0 1
1 0 0 1 1 0 1 1 1
1 1 1 1 1 1 0 1 1
0 0 1 1 0 0 0 1 1
0 1 1 1 1 1 0 1 1
-
-
-
+
x1
x2
-
Prof. Ahmed Sultan Al-Hegami
21. AI 21
Activation Functions
Each neuron in the network
receives one or more input(s).
An activation function is
applied to the inputs, which
determines the output of the
neuron – the activation level.
...
718
.
2
;
1
1
)
(
e
e
x
f x
f(x)=x
Prof. Ahmed Sultan Al-Hegami
22. AI 22
Problems
Perceptrons can only perform
accurately with linearly separable
classes
ANN research put on hold for 20yrs.
Solution: additional (hidden) layers of
neurons, MLP architecture
Able to solve non-linear classification
problems such as XOR
x1
x2
x1
x2
Prof. Ahmed Sultan Al-Hegami
23. AI 23
Feed-back Networks
Feed-Forward Neural Networks
Also known as:
The Multi-layer Perceptron
or
The Back-Propagation Neural Network
Solutions: Use Multi-layer
Perceptron
Prof. Ahmed Sultan Al-Hegami
24. AI 24
Multi-layer Perceptrons
Each input layer neuron connects to all neurons in
the hidden layer.
The neurons in the hidden layer connect to all
neurons in the output layer.
Node 1
Node 2
Node i
Node j
Node k
Node 3
Input Layer Output Layer
Hidden Layer
1.0
0.7
0.4
Wjk
Wik
W3i
W3j
W2i
W2j
W1i
W1j
Prof. Ahmed Sultan Al-Hegami
25. AI 25
Neural Nets
Pro: More general than perceptrons
Not restricted to linear discriminants
Multiple outputs: one classification each
Con: No simple, guaranteed training
procedure
Use greedy, hill-climbing procedure to train
“Gradient descent”, “Backpropagation”
Prof. Ahmed Sultan Al-Hegami
26. AI 26
Neural Net Training
Goal:
Determine how to change weights to get correct
output
Large change in weight to produce large reduction in
error
Approach:
Compute actual output: o
Compare to desired output: d
Determine effect of each weight w on error = d-o
Adjust weights
Prof. Ahmed Sultan Al-Hegami
27. AI 27
Backpropagation
Multilayer neural networks learn in the same way
as perceptrons.
However, there are many more weights, and it is
important to assign credit (or blame) correctly
when changing weights.
Backpropagation networks use the sigmoid
activation function, as it is easy to differentiate:
Prof. Ahmed Sultan Al-Hegami
28. AI 28
Backpropagation
Greedy, Hill-climbing procedure
Weights are parameters to change
Slow
Back propagation: Computes current output,
works backward to correct error
Prof. Ahmed Sultan Al-Hegami
29. AI 29
Back propagation
Desired output of the training examples
Error = difference between actual & desired
output
Change weight relative to error size
Calculate output layer error , then propagate
back to previous layer
Improved performance, very common!
Prof. Ahmed Sultan Al-Hegami
31. AI 31
notations
We use the Following notations:
T (target): the actual output
O (output): The output of every neuron at any layer
f (activation function)
η : learning rate
W: weight
δ : Error signal
Prof. Ahmed Sultan Al-Hegami
32. AI 32
Training Method
Step 1: start at the output layer
Calculate the sumation of signals that enter to each output neuron (N)
Nk = ∑j(Wjk Oj) ------------------------ (1)
This value passes through neuron represented by activation function
and hence the output of every output neuron is as follows:
Ok = 1/(1+e^ -NK)=f(Nk) ---------------------(2)
(This value represents the actual output that the network obtained
which has to be compared to the desired output to know the value of
error).
Step 2: Computer the error value (δ) as follows:
δk = (tk – Ok) f’(Nk)
=(tk – Ok) Ok (1– Ok) ---------------------(3)
Update the weight between output and hidden layers (weights
change based on their contribution on this error) as follows:
Wjk Wjk + η δk Oj ----------------------(4)
Prof. Ahmed Sultan Al-Hegami
33. AI 33
Step 3: at hidden layer neurons,
Repeat the above process as follows:
Compute the error in this layer as follows:
δj = Oj (1– Ok) ∑kWjk δk ---------------------(5)
Update the weight between input layer and hidden layers (weights
change based on their contribution on this error) as follows:
Wij Wij + η δj Oi --------------------(6)
These 3 steps are repeated many times for all inputs
until the error of the network reaches to the minimum
error where the training process STOPS and
therefore the network becomes trained network.
Prof. Ahmed Sultan Al-Hegami
34. AI 34
h1
h2
W21
W12
W11
A Detailed Example
W22
W20
(h)
Hidden Layer
Output Layer (O)
W10
(i)
Input Layer
x1
x2
•The network to be trained
Prof. Ahmed Sultan Al-Hegami
35. AI 35
A Detailed Example
•The input/output used for training:
X1 X2 Target (t)
0 0 0
0 1 1
1 0 1
1 1 1
We select η=1 as learning rate for simplicity
Prof. Ahmed Sultan Al-Hegami
36. AI 36
•We assume random weights and use the
first row in the I/O table
x1 x2 t W11 W12 W21 W22 W10 W20
0 0 0 1 0 0 1 1 1
We also use the following notations:
hi1: total inputs for 1st cell in the Hidden layer
hi2: total inputs for 2nd cell in the Hidden layer
ho1: output of 1st cell in the Hidden layer
ho2: output of 2nd cell in the Hidden layer
N: Total inputs to the cell of output layer
O: The actual output of the network
Prof. Ahmed Sultan Al-Hegami
37. AI 37
•We obtain the following:
hi1= W11x1+W21x2
= (1)(0)+(0)(0) = 0
hi2= W12x1+W22x2
= (0)(0)+(1)(0) = 0
hO1= 1/(1+e^-hi1) ------------(1)
= 1/(e^-0) = 0.5
hO2= 1/(1+e^-hi2) ------------(2)
= 1/(e^-0) = 0.5
By using the first step in the algorithm, we get the total number of inputs that
entered unto the output cell as follows:
N = W10hO1 + W20hO2 ------------------(3)
= (1)(0.5) + (1)(0.5) = 1
Therefore the actual output of the network:
O = 1/(1+e^-N)
= 1/(1+e^-1) = 0.73106 (which is far away from desired (target) output).
Prof. Ahmed Sultan Al-Hegami
38. AI 38
As the actual output is far away from target, we have to modify
the weights to be close from target. To determine the error in
the result, we use step 2 of the algorithm as follows:
δO = (t – O) O (1 – O)
= (0-0.73106) (0.73106)(1-0.73106)
= -0.14373
By this error value, we can update the weights between hidden
and output layers using equation (3) of step 2 in the
algorithm, as follows:
W10 W10 + η δO hO1
= 1+(1)(-0.14373)(0.5) = 0.92813
W20 W20 + η δO hO2
= 1+(1)(-0.14373)(0.5) = 0.92813
(at this point we Back Propagate from output layer to hidden layer, and
in the same fashion, propagate to input layer)
Prof. Ahmed Sultan Al-Hegami
39. AI 39
determine the error that the hidden layer contributed using equation (5) of step 3 of the algorithms as
follows:
δh1= hO1(1 – hO1)W10δO
= (0.5)(1-0.5)(0.92813)(-0.14373)
= -0.03335
δh2= hO2(1 – hO2)W20δO
= (0.5)(1-0.5)(0.92813)(-0.14373)
= -0.03335
By this error value, we can update the weights between hidden and input layers using equation (6) of
step 3 of the algorithm, as follows:
W11 W11 + η δh1 x1
= 1+(1)(-0.03335)(0) = 1
W12 W12 + η δh2 x1
= 0+(1)(-0.03335)(0) = 0
W21 W21 + η δh1 x2
= 0+(1)(-0.03335)(0) = 0
W22 W22 + η δh2 x2
= 1+(1)(-0.03335)(0) = 1
Notice that, the weights have not been changed as it is normal, due to the initialization of inputs to
ZERO
Prof. Ahmed Sultan Al-Hegami
40. AI 40
The following table shows the results after
training the network only once:
x1 x2 t W11 W12 W21 W22 W10 W20
0 0 0 1 0 0 1 0.92813 0.92813
Prof. Ahmed Sultan Al-Hegami
41. AI 41
Now, we consider the second ROW of the target Table, and continue
training process of the network by using the same steps:
And using the following data in the training:
x1 = 0, x2 = 1, t = 1
Also using the weights obtained in the previous stage of training, We obtain:
hi1= W11x1+W21x2
= (1)(0)+(0)(1) = 0
hi2= W12x1+W22x2
= (0)(0)+(1)(1) = 1
hO1= 1/(1+e^-hi1)
= 1/(e^-0) = 0.5
hO2= 1/(1+e^-hi2)
= 1/(e^-1) = 0.73106
By using the first step in the algorithm, we get the total number of inputs that entered unto the output cell as follows:
N = W10hO1 + W20hO2 ------------------(3)
= (0.92813)(0.5)+(0.92813)(0.73106) = 1.1426
Therefore the actual output of the network:
O = 1/(1+e^-N)
= 1/(1+e^-1.1426) = 0.7582 (which is far away from desired (target) output).
Prof. Ahmed Sultan Al-Hegami
42. AI 42
As the actual output is far away from target, we have to modify
the weights to be close from target. To determine the error in
the result, we use step 2 of the algorithm as follows:
δO = (t – O) O (1 – O)
= (1- 0.7582) (0. 0.7582)(1- 0.7582)
= -0.04435
By this error value, we can update the weights between hidden
and output layers using equation (3) of step 2 in the
algorithm, as follows:
W10 W10 + η δO hO1
= 0.92813+(1)(0.04435)(0.5) = 0.95030
W20 W20 + η δO hO2
= 0.92813+(1)(0.04435)(0.73106) = 0.96056
(at this point we Back Propagate from output layer to hidden layer, and
in the same fashion, propagate to input layer)
Prof. Ahmed Sultan Al-Hegami
43. AI 43
determine the error that the hidden layer contributed using equation (5) of step 3 of the
algorithms as follows:
δh1= hO1(1 – hO1)W10δO
= (0.5)(1-0.5)(0.9503)(0.04435)
= -0.01054
δh2= hO2(1 – hO2)W20δO
= (0.73106)(1-0.73106)(0.96056)(0.04435)
= 0.00838
By this error value, we can update the weights between hidden and input layers using
equation (6) of step 3 of the algorithm, as follows:
W11 W11 + η δh1 x1
= 1+(1)(0.01054)(0) = 1
W12 W12 + η δh2 x1
= 0+(1)(0.00838)(0) = 0
W21 W21 + η δh1 x2
= 0+(1)(0.01054)(1) = 0.01054
W22 W22 + η δh2 x2
= 1+(1)(0.00838)(1) = 1.00838
Prof. Ahmed Sultan Al-Hegami
44. AI 44
The following table shows the results after
training the network the second time:
x1 x2 t W11 W12 W21 W22 W10 W20
0 1 1 1 0 0.01054 1.00838 0.9503 0.96056
Prof. Ahmed Sultan Al-Hegami
45. AI 45
The training process have to be repeated many times until we obtain the
MINIMUM error. The following table shows the results after training the
network approximately 1000 times.
As you notice from the Table bellow, the actual outputs
are very near to the desired (target) output.
W11 W12 W21 W22 W10 W20
-3.5402 4.0244 -3.5248 4.5814 -11.9103 4.6940
Prof. Ahmed Sultan Al-Hegami
46. AI 46
The comparison of the actual and desired
(target) output is shown in the table
bellow:
X1 X2 Target (t) Output (O)
0 0 0 0.0264
0 1 1 0.9867
1 0 1 0.9863
1 1 1 0.9908
Prof. Ahmed Sultan Al-Hegami
47. AI 47
Evolving networks
Continuous process of:
Evaluate output
Adapt weights
Take new inputs
ANN evolving causes stable state of the
weights, but neurons continue working:
network has ‘learned’ dealing with the
problem
“Learning”
Prof. Ahmed Sultan Al-Hegami
48. AI 48
Where are NN used?
Recognizing and matching complicated,
vague, or incomplete patterns
Data is unreliable
Problems with noisy data
Prediction
Classification
Data association
Filtering
Planning
Prof. Ahmed Sultan Al-Hegami
49. AI 49
Applications
Prediction: learning from past experience
pick the best stocks in the market
predict weather
identify people with cancer risk
Classification
Image processing
Predict bankruptcy for credit card companies
Risk assessment
Prof. Ahmed Sultan Al-Hegami
50. AI 50
Applications
Recognition
Pattern recognition: SNOOPE (bomb detector in
U.S. airports)
Character recognition
Handwriting: processing checks
Data association
Not only identify the characters that were scanned
but identify when the scanner is not working
properly
Prof. Ahmed Sultan Al-Hegami
51. AI 51
Applications
Data Filtering
e.g. take the noise out of a telephone signal, signal
smoothing
Planning
Unknown environments
Sensor data is noisy
Fairly new approach to planning
Prof. Ahmed Sultan Al-Hegami
52. AI 52
Strengths of a Neural Network
Power: Model complex functions, nonlinearity built
into the network
Ease of use:
Learn by example
Very little user domain-specific expertise needed
Intuitively appealing: based on model of biology,
will it lead to genuinely intelligent computers/robots?
Neural networks cannot do anything that cannot be
done using traditional computing techniques, BUT
they can do some things which would otherwise be
very difficult.
Prof. Ahmed Sultan Al-Hegami
53. AI 53
General Advantages
Advantages
Adapt to unknown situations
Robustness: fault tolerance due to network
redundancy
Autonomous learning and generalization
Disadvantages
Not exact
Large complexity of the network structure
Prof. Ahmed Sultan Al-Hegami
54. AI 54
Status of Neural Networks
Most of the reported applications are
still in research stage
No formal proofs, but they seem to
have useful applications that work
Prof. Ahmed Sultan Al-Hegami
55. AI 55
Conclusions
Simulation based on neurons in brain
Perceptrons (single neuron)
Guaranteed to find linear discriminant
IF one exists -> problem XOR
Neural nets (Multi-layer perceptrons)
Very general
Backpropagation training procedure
Prof. Ahmed Sultan Al-Hegami