•

0 likes•20 views

ANN

- 1. Artificial Neural Network Complied by Dr. Vaishali Wangikar
- 2. What is Artificial Neural Network? • The term "Artificial Neural Network" is derived from Biological neural networks that develop the structure of a human brain. Similar to the human brain that has neurons interconnected to one another, artificial neural networks also have neurons that are interconnected to one another in various layers of the networks. These neurons are known as nodes. https://www.javatpoint.com/artificial-neural-network
- 3. Neural Network • NN are constructed and implemented to model the human brain. • Performs various tasks such as pattern-matching, classification, optimization function, approximation, vector quantization and data clustering. • These tasks are difficult for traditional computers
- 4. ANN • ANN posess a large number of processing elements called nodes/neurons which operate in parallel. • Neurons are connected with others by connection link. • Each link is associated with weights which contain information about the input signal. • Each neuron has an internal state of its own which is a function of the inputs that neuron receives- Activation level • In short , An artificial neural network consists of a pool of simple processing units which communicate by sending signals to each other over a large number of weighted connections.
- 5. Artificial Neural Network A set of major aspects of a parallel distributed model include: a set of processing units (cells). a state of activation for every unit, which equivalent to the output of the unit. connections between the units. Generally each connection is defined by a weight. a propagation rule, which determines the effective input of a unit from its external inputs. an activation function, which determines the new level of activation based on the effective input and the current activation. an external input for each unit. a method for information gathering (the learning rule). an environment within which the system must operate, providing input signals and _ if necessary _ error signals.
- 6. Artificial Neural Networks • The “building blocks” of neural networks are the neurons. • In technical systems, we also refer to them as units or nodes. • Basically, each neuron receives input from many other neurons. changes its internal state (activation) based on the current input. sends one output signal to many other neurons, possibly including its input neurons (recurrent network).
- 7. Artificial Neural Networks • Information is transmitted as a series of electric impulses, so-called spikes. • The frequency and phase of these spikes encodes the information. • In biological systems, one neuron can be connected to as many as 10,000 other neurons. • Usually, a neuron receives its information from other neurons in a confined area, its so-called receptive field.
- 8. How do ANNs work? An artificial neural network (ANN) is either a hardware implementation or a computer program which strives to simulate the information processing capabilities of its biological exemplar. ANNs are typically composed of a great number of interconnected artificial neurons. The artificial neurons are simplified models of their biological counterparts. ANN is a technique for solving problems by constructing software that works like our brains.
- 9. How do our brains work? The Brain is A massively parallel information processing system. Our brains are a huge network of processing elements. A typical brain contains a network of 10 billion neurons.
- 10. How do our brains work? A processing element Dendrites: Input Cell body: Processor Synaptic: Link Axon: Output
- 11. From Biological to Artificial Neurons The Neuron - A Biological Information Processor • dentrites - the receivers • soma - neuron cell body (sums input signals) • axon - the transmitter • synapse - point of transmission • neuron activates after a certain threshold is met Learning occurs via electro-chemical changes in effectiveness of synaptic junction.
- 12. From Biological to Artificial Neurons An Artificial Neuron - The Perceptron • simulated on hardware or by software • input connections - the receivers • node, unit, or PE simulates neuron body • output connection - the transmitter • activation function employs a threshold or bias • connection weights act as synaptic junctions Learning occurs via changes in value of the connection weights.
- 13. From Biological to Artificial Neurons An Artificial Neuron - The Perceptron • simulated on hardware or by software • input connections - the receivers • node, unit, or PE simulates neuron body • output connection - the transmitter • activation function employs a threshold or bias • connection weights act as synaptic junctions Learning occurs via changes in value of the connection weights.
- 14. How do our brains work? A processing element A neuron is connected to other neurons through about 10,000 synapses
- 15. How do our brains work? A processing element A neuron receives input from other neurons. Inputs are combined.
- 16. How do our brains work? A processing element Once input exceeds a critical level, the neuron discharges a spike ‐ an electrical pulse that travels from the body, down the axon, to the next neuron(s)
- 17. How do our brains work? A processing element The axon endings almost touch the dendrites or cell body of the next neuron.
- 18. How do our brains work? A processing element Transmission of an electrical signal from one neuron to the next is effected by neurotransmitters.
- 19. How do our brains work? A processing element Neurotransmitters are chemicals which are released from the first neuron and which bind to the Second.
- 20. How do our brains work? A processing element This link is called a synapse. The strength of the signal that reaches the next neuron depends on factors such as the amount of neurotransmitter available.
- 21. How do ANNs work? An artificial neuron is an imitation of a human neuron
- 22. How do ANNs work? • Now, let us have a look at the model of an artificial neuron.
- 23. How do ANNs work? Output x1 x2 xm ∑ y Processing Input ∑= X1+X2 + ….+Xm =y . . . . . . . . . . . .
- 24. How do ANNs work? Not all inputs are equal Output x1 x2 xm ∑ y Processing Input ∑= X1w1+X2w2 + ….+Xmwm =y w1 w2 w m weights . . . . . . . . . . . . . . . . .
- 25. How do ANNs work? The signal is not passed down to the next neuron verbatim Transfer Function (Activation Function) Output x1 x2 xm ∑ y Processing Input w1 w2 wm weights . . . . . . . . . . . . f(vk) . . . . .
- 26. The output is a function of the input, that is affected by the weights, and the transfer functions
- 27. 27 Sj f(Sj) Xj ao a1 a2 an +1 wj0 wj1 wj2 wjn Going step by step : The Artificial Neuron (Perceptron)
- 28. 28 A Simple Model of a Neuron (Perceptron) • Each neuron has a threshold value • Each neuron has weighted inputs from other neurons • The input signals form a weighted sum • If the activation level exceeds the threshold, the neuron “fires” w1j w2j w3j wij y1 y2 y3 yi O
- 29. 29 An Artificial Neuron • Each hidden or output neuron has weighted input connections from each of the units in the preceding layer. • The unit performs a weighted sum of its inputs, and subtracts its threshold value, to give its activation level. • Activation level is passed through a sigmoid activation function to determine output. w1j w2j w3j wij y1 y2 y3 yi f(x) O
- 30. 30 Supervised Learning • Training and test data sets • Training set; input & target
- 31. 31 Perceptron Training • Linear threshold is used. • W - weight value • t - threshold value 1 if wi xi >t Output= 0 otherwise { i=0
- 32. 32 Simple network 1 if wixi >t output= 0 otherwise { i=0 t = 0.0 Y X W1 = 1.5 W3 = 1 -1 AND with a Biased input W2 = 1
- 33. 33 Learning algorithm While epoch produces an error Present network with next inputs from epoch Error = T – O If Error <> 0 then Wj = Wj + LR * Ij * Error End If End While
- 34. 34 Learning algorithm Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]) Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1
- 35. 35 Learning algorithm Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the target value will be 1 Output , O : The output value from the neuron Ij : Inputs being presented to the neuron Wj : Weight from input neuron (Ij) to the output neuron LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1
- 36. 36 Training Perceptrons t = 0.0 y x -1 W1 = ? W3 = ? W2 = ? For AND A B Output 0 0 0 0 1 0 1 0 0 1 1 1 •What are the weight values? •Initialize with random weight values
- 37. 37 Training Perceptrons t = 0.0 y x -1 W1 = 0.3 W3 =-0.4 W2 = 0.5 I1 I2 I3 Summation Output -1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0 -1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 -1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 -1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0 For AND A B Output 0 0 0 0 1 0 1 0 0 1 1 1
- 38. 38 Learning in Neural Networks • Learn values of weights from I/O pairs • Start with random weights • Load training example’s input • Observe computed input • Modify weights to reduce difference • Iterate over all training examples • Terminate when weights stop changing OR when error is very small
- 39. Artificial Neural Networks An ANN can: 1. compute any computable function, by the appropriate selection of the network topology and weights values. 2. learn from experience! Specifically, by trial‐and‐error
- 40. Learning by trial‐and‐error Continuous process of: Trial: Processing an input to produce an output (In terms of ANN: Compute the output function of a given input) Evaluate: Evaluating this output by comparing the actual output with the expected output. Adjust: Adjust the weights.
- 41. How it works? Set initial values of the weights randomly. Input: truth table of the XOR Do Read input (e.g. 0, and 0) Compute an output (e.g. 0.60543) Compare it to the expected output. (Diff= 0.60543) Modify the weights accordingly. Loop until a condition is met Condition: certain number of iterations Condition: error threshold
- 42. Design Issues Initial weights (small random values ∈[‐1,1]) Transfer function (How the inputs and the weights are combined to produce output?) Error estimation Weights adjusting Number of neurons Data representation Size of training set
- 43. Transfer Functions Linear: The output is proportional to the total weighted input. Threshold: The output is set at one of two values, depending on whether the total weighted input is greater than or less than some threshold value. Non‐linear: The output varies continuously but not linearly as the input changes.
- 44. Error Estimation The root mean square error (RMSE) is a frequently- used measure of the differences between values predicted by a model or an estimator and the values actually observed from the thing being modeled or estimated
- 45. Weights Adjusting After each iteration, weights should be adjusted to minimize the error. – All possible weights – Back propagation
- 46. Topologies of Neural Networks completely connected feedforward (directed, a-cyclic) recurrent (feedback connections)
- 47. Basic models of ANN Basic Models of ANN Interconnections Learning rules Activation function
- 48. Classification based on interconnections Interconnections Feed forward Single layer Multilayer Feed Back Recurrent Single layer Multilayer
- 49. Single layer Feedforward Network
- 50. Feedforward Network • Its output and input vectors are respectively • Weight wij connects the i’th neuron with j’th input. Activation rule of ith neuron is where EXAMPLE
- 51. Multilayer feed forward network Can be used to solve complicated problems
- 52. Feedback network When outputs are directed back as inputs to same or preceding layer nodes it results in the formation of feedback networks
- 53. Lateral feedback If the feedback of the output of the processing elements is directed back as input to the processing elements in the same layer then it is called lateral feedback
- 54. Recurrent n/ws • Single node with own feedback • Competitive nets • Single-layer recurrent nts • Multilayer recurrent networks Feedback networks with closed loop are called Recurrent Networks. The response at the k+1’th instant depends on the entire history of the network starting at k=0. Automaton: A system with discrete time inputs and a discrete data representation is called an automaton
- 55. Basic models of ANN Basic Models of ANN Interconnections Learning rules Activation function
- 56. Learning • It’s a process by which a NN adapts itself to a stimulus by making proper parameter adjustments, resulting in the production of desired response • Two kinds of learning • Parameter learning:- connection weights are updated • Structure Learning:- change in network structure
- 57. Training • The process of modifying the weights in the connections between network layers with the objective of achieving the expected output is called training a network. • This is achieved through • Supervised learning • Unsupervised learning • Reinforcement learning
- 58. Classification of learning • Supervised learning • Unsupervised learning • Reinforcement learning
- 59. Supervised Learning • Child learns from a teacher • Each input vector requires a corresponding target vector. • Training pair=[input vector, target vector] Neural Network W Error Signal Generator X (Input) Y (Actual output) (Desired Output) Error (D-Y) signals
- 60. Supervised learning contd. Supervised learning does minimization of error
- 61. Unsupervised Learning • How a fish or tadpole learns • All similar input patterns are grouped together as clusters. • If a matching input pattern is not found a new cluster is formed
- 63. Self-organizing • In unsupervised learning there is no feedback • Network must discover patterns, regularities, features for the input data over the output • While doing so the network might change in parameters • This process is called self-organizing
- 64. Reinforcement Learning NN W Error Signal Generator X (Input) Y (Actual output) Error signals R Reinforcement signal
- 65. When Reinforcement learning is used? • If less information is available about the target output values (critic information) • Learning based on this critic information is called reinforcement learning and the feedback sent is called reinforcement signal • Feedback in this case is only evaluative and not instructive
- 66. Basic models of ANN Basic Models of ANN Interconnections Learning rules Activation function
- 67. 1. Identity Function f(x)=x for all x 2. Binary Step function 3. Bipolar Step function 4. Sigmoidal Functions:- Continuous functions 5. Ramp functions:- Activation Function ifx ifx x f 0 1 { ) ( ifx ifx x f 1 1 { ) ( 0 0 1 0 1 1 ) ( ifx x if x ifx x f
- 68. 71 Activation functions • Transforms neuron’s input into output. • Features of activation functions: • A squashing effect is required • Prevents accelerating growth of activation levels through the network. • Simple and easy to calculate
- 69. 72 Standard activation functions • The hard-limiting threshold function – Corresponds to the biological paradigm • either fires or not • Sigmoid functions ('S'-shaped curves) – The logistic function – The hyperbolic tangent (symmetrical) – Both functions have a simple differential – Only the shape is important f(x) = 1 1 + e -ax
- 70. Some learning algorithms we will learn are • Supervised: • Adaline, Madaline • Perceptron • Back Propagation • multilayer perceptrons • Radial Basis Function Networks • Unsupervised • Competitive Learning • Kohenen self organizing map • Learning vector quantization • Hebbian learning
- 71. Neural processing • Recall:- processing phase for a NN and its objective is to retrieve the information. The process of computing o for a given x • Basic forms of neural information processing • Auto association • Hetero association • Classification
- 72. Neural processing-Autoassociation • Set of patterns can be stored in the network • If a pattern similar to a member of the stored set is presented, an association with the input of closest stored pattern is made
- 73. Neural Processing- Heteroassociation • Associations between pairs of patterns are stored • Distorted input pattern may cause correct heteroassociation at the output
- 74. Neural processing-Classification • Set of input patterns is divided into a number of classes or categories • In response to an input pattern from the set, the classifier is supposed to recall the information regarding class membership of the input pattern.
- 75. Important terminologies of ANNs • Weights • Bias • Threshold • Learning rate • Momentum factor • Vigilance parameter • Notations used in ANN
- 76. Weights • Each neuron is connected to every other neuron by means of directed links • Links are associated with weights • Weights contain information about the input signal and is represented as a matrix • Weight matrix also called connection matrix
- 77. Weight matrix W= 1 2 3 . . . . . T T T T n w w w w = 11 12 13 1 21 22 23 2 1 2 3 ... ... .................. ................... ... m m n n n nm w w w w w w w w w w w w
- 78. Weights contd… • wij –is the weight from processing element ”i” (source node) to processing element “j” (destination node) X1 1 Xi Yj Xn w1j wij wnj bj 0 0 0 1 1 2 2 0 1 1 .... n i ij inj i j j j n nj n j i ij i n j i ij inj i y xw x w xw x w x w w xw y b xw
- 79. Activation Functions • Used to calculate the output response of a neuron. • Sum of the weighted input signal is applied with an activation to obtain the response. • Activation functions can be linear or non linear • Already dealt • Identity function • Single/binary step function • Discrete/continuous sigmoidal function.
- 80. Bias • Bias is like another weight. Its included by adding a component x0=1 to the input vector X. • X=(1,X1,X2…Xi,…Xn) • Bias is of two types • Positive bias: increase the net input • Negative bias: decrease the net input
- 81. Why Bias is required? • The relationship between input and output given by the equation of straight line y=mx+c X Y Input C(bias) y=mx+C
- 82. Threshold • Set value based upon which the final output of the network may be calculated • Used in activation function • The activation function using threshold can be defined as ifnet ifnet net f 1 1 ) (
- 83. Learning rate • Denoted by α. • Used to control the amount of weight adjustment at each step of training • Learning rate ranging from 0 to 1 determines the rate of learning in each time step
- 84. Learning rate • The learning rate defines the size of the corrective steps that the model takes to adjust for errors in each observation. • A high learning rate shortens the training time, but with lower ultimate accuracy, while a lower learning rate takes longer, but with the potential for greater accuracy. • Optimizations such as Quickprop are primarily aimed at speeding up error minimization, while other improvements mainly try to increase reliability. • In order to avoid oscillation inside the network such as alternating connection weights, and to improve the rate of convergence, refinements use an adaptive learning rate that increases or decreases as appropriate. • (From Wikipedia)
- 85. Learning Rate • Neural networks are often trained by gradient descent on the weights. This means at each iteration we use backpropagation to calculate the derivative of the loss function with respect to each weight and subtract it from that weight. However, if you actually try that, the weights will change far too much each iteration, which will make them “overcorrect” and the loss will actually increase/diverge. So in practice, people usually multiply each derivative by a small value called the “learning rate” before they subtract it from its corresponding weight. • w1new= w1 + (learning rate)* (derivative of cost function wrt w1)
- 86. Learning Rate • Stochastic gradient descent is an optimization algorithm that estimates the error gradient for the current state of the model using examples from the training dataset, then updates the weights of the model using the back- propagation of errors algorithm, referred to as simply backpropagation. • The amount that the weights are updated during training is referred to as the step size or the “learning rate.” • Specifically, the learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0. • The learning rate is one of the most important hyper-parameters to tune for training deep neural networks.
- 87. Learning Rate • If the learning rate is low, then training is more reliable, but optimization will take a lot of time because steps towards the minimum of the loss function are tiny. • If the learning rate is high, then training may not converge or even diverge. Weight changes can be so big that the optimizer overshoots the minimum and makes the loss worse.
- 88. Learning Rate –Variable learning rate • Now another thing is learning rate may not be a constant for all the layers of a neural network, it may be different for different layers which avoids problem of vanishing gradient i.e, weights may stop changing as weight change backpropogates itself to first layer (since there are lot multiplications of derivatives and these derivatives itself REACH decimal values < 1 and there products are even smaller if we observe mathematical analysis of backpropagation of neural networks and as a result learning will not take place and saturate immaturely) so we assign variable learning rate to each layer.
- 89. A systematic approach towards finding the optimal learning rate 1. start with a high learning rate and steadily decrease it. Changes in the weight vector must be small in order to reduce oscillations or any divergence 2. A simple suggestion is to increase learning rate in order to improve performance and decrease the learning rate in order to worsen the performance. 3 another method is to double the learning rate until the error values worsens.
- 90. A systematic approach towards finding the optimal learning rate • Ultimately, we'd like a learning rate which results is a steep decrease in the network's loss. • We can observe this by performing a simple experiment where we gradually increase the learning rate after each mini batch, recording the loss at each increment. • This gradual increase can be on either a linear or exponential scale.
- 91. • For learning rates which are too low, the loss may decrease, but at a very shallow rate. • When entering the optimal learning rate zone, you'll observe a quick drop in the loss function. Increasing the learning rate further will cause an increase in the loss as the parameter updates cause the loss to "bounce around" and even diverge from the minima. • Remember, the best learning rate is associated with the steepest drop in loss, so we're mainly interested in analyzing the slope of the plot.
- 93. Two types of learning • 1. Sequential or pre-pattern method • 2. Batch or pre-epoch method. • In sequential learning a given input is pattern is propagated forward, the error is determined and back propagated, and the weights are updated. • In batch learning the weightsare updated only after the entire set of training network has been presentedto the network. Thus the weight update is only performed after every epoch. • If p= patterns in one epoch then • ∆ w= 1 𝑝 𝑝=1 ∞ ∆𝑤𝑝 • This method is having smoothing effect.
- 94. When to stop back propogation ? • Continue as long as the error for the validation decreses. • Whenever the error begins to increase the net is starting to memorise the training patterns and the training is terminated.
- 95. How to choose hidden neurons • There are many rule-of-thumb methods for determining an acceptable number of neurons to use in the hidden layers, such as the following: 1. The number of hidden neurons should be between the size of the input layer and the size of the output layer. 2. The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer. 3. The number of hidden neurons should be less than twice the size of the input layer.
- 96. How to choose number of hidden layers Number of Hidden Layers Result none Only capable of representing linear separable functions or decisions. 1 Can approximate any function that contains a continuous mapping from one finite space to another. 2 Can represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy. >2 Additional layers can learn complex representations (sort of automatic feature engineering) for layer layers.
- 97. Other terminologies • Momentum factor: • used for convergence when momentum factor is added to weight updation process. • Vigilance parameter: • Denoted by ρ • Used to control the degree of similarity required for patterns to be assigned to the same cluster
- 98. Neural Network Learning rules c – learning constant
- 99. Hebbian Learning Rule • The learning signal is equal to the neuron’s output FEED FORWARD UNSUPERVISED LEARNING
- 100. Features of Hebbian Learning • Feedforward unsupervised learning • “When an axon of a cell A is near enough to exicite a cell B and repeatedly and persistently takes place in firing it, some growth process or change takes place in one or both cells increasing the efficiency” • If oixj is positive the results is increase in weight else vice versa
- 101. Perceptron Learning rule • Learning signal is the difference between the desired and actual neuron’s response • Learning is supervised
- 102. Delta Learning Rule • Only valid for continuous activation function • Used in supervised training mode • Learning signal for this rule is called delta • The aim of the delta rule is to minimize the error over all training patterns
- 103. Delta Learning Rule Contd. Learning rule is derived from the condition of least squared error. Calculating the gradient vector with respect to wi Minimization of error requires the weight changes to be in the negative gradient direction
- 104. Widrow-Hoff learning Rule • Also called as least mean square learning rule • Introduced by Widrow(1962), used in supervised learning • Independent of the activation function • Special case of delta learning rule wherein activation function is an identity function ie f(net)=net • Minimizes the squared error between the desired output value di and neti
- 106. Winner-Take-All Learning rule Contd… • Can be explained for a layer of neurons • Example of competitive learning and used for unsupervised network training • Learning is based on the premise that one of the neurons in the layer has a maximum response due to the input x • This neuron is declared the winner with a weight
- 108. Summary of learning rules
- 109. 112 Neural Network –Weakness and Strengths • Weakness • Long training time • Require a number of parameters typically best determined empirically, e.g., the network topology or “structure.” • Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of “hidden units” in the network • Strength • High tolerance to noisy data • Ability to classify untrained patterns • Well-suited for continuous-valued inputs and outputs • Successful on an array of real-world data, e.g., hand-written letters • Algorithms are inherently parallel • Techniques have recently been developed for the extraction of rules from trained neural networks
- 110. 113 Summary : A Multi-Layer Feed-Forward Neural Network Output layer Input layer Hidden layer Output vector Input vector: X wij ij k i i k j k j x y y w w ) ˆ ( ) ( ) ( ) 1 (
- 111. 114 Summary :How A Multi-Layer Neural Network Works • The inputs to the network correspond to the attributes measured for each training tuple • Inputs are fed simultaneously into the units making up the input layer • They are then weighted and fed simultaneously to a hidden layer • The number of hidden layers is arbitrary, although usually only one • The weighted outputs of the last hidden layer are input to units making up the output layer, which emits the network's prediction • The network is feed-forward: None of the weights cycles back to an input unit or to an output unit of a previous layer • From a statistical point of view, networks perform nonlinear regression: Given enough hidden units and enough training samples, they can closely approximate any function
- 112. 115 Summary: Defining a Network Topology • Decide the network topology: Specify # of units in the input layer, # of hidden layers (if > 1), # of units in each hidden layer, and # of units in the output layer • Normalize the input values for each attribute measured in the training tuples to [0.0—1.0] • One input unit per domain value, each initialized to 0 • Output, if for classification and more than two classes, one output unit per class is used • Once a network has been trained and its accuracy is unacceptable, repeat the training process with a different network topology or a different set of initial weights
- 113. 116 Summary: Backpropagation • Iteratively process a set of training tuples & compare the network's prediction with the actual known target value • For each training tuple, the weights are modified to minimize the mean squared error between the network's prediction and the actual target value • Modifications are made in the “backwards” direction: from the output layer, through each hidden layer down to the first hidden layer, hence “backpropagation” • Steps • Initialize weights to small random numbers, associated with biases • Propagate the inputs forward (by applying activation function) • Backpropagate the error (by updating weights and biases) • Terminating condition (when error is very small, etc.)
- 114. 117 Summary :Neuron: A Hidden/Output Layer Unit • An n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping • The inputs to unit are outputs from the previous layer. They are multiplied by their corresponding weights to form a weighted sum, which is added to the bias associated with unit. Then a nonlinear activation function is applied to it. mk f weighted sum Input vector x output y Activation function weight vector w w0 w1 wn x0 x1 xn ) sign( y Example For n 0 i k i i x w m bias
- 115. 118 Summary :Efficiency and Interpretability • Efficiency of backpropagation: Each epoch (one iteration through the training set) takes O(|D| * w), with |D| tuples and w weights, but # of epochs can be exponential to n, the number of inputs, in worst case • For easier comprehension: Rule extraction by network pruning • Simplify the network structure by removing weighted links that have the least effect on the trained network • Then perform link, unit, or activation value clustering • The set of input and activation values are studied to derive rules describing the relationship between the input and hidden unit layers • Sensitivity analysis: assess the impact that a given input variable has on a network output. The knowledge gained from this analysis can be represented in rules
- 116. References Craig Heller, and David Sadava, Life: The Science of Biology, fifth edition, Sinauer Associates, INC, USA, 1998. Introduction to Artificial Neural Networks, Nicolas Galoppo von Borries Tom M. Mitchell, Machine Learning, WCB McGraw-Hill, Boston, 1997. ”Neural Networks for Pattern Recognition”, Bishop, C.M., 1996 Jiawei Han, Micheline Kamber, and Jian Pei,University of Illinois at Urbana- Champaign & imon Fraser University©2011 Han, Kamber & Pei.