BACK PROPAGATION PRESENTED BY Karthika.T Nithya.G Revathy.R
Back Propagation described by Arthur E. Bryson and Yu-Chi Ho in 1969, but it wasn't until 1986, through the work of David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams , that it gained recognition, and it led to a “renaissance” in the field of artificial neural network research.
The term is an abbreviation for "backwards propagation of errors".
As the algorithm's name implies, the errors (and therefore the learning) propagate backwards from the output nodes to the inner nodes.
So technically speaking, backpropagation is used to calculate the gradient of the error of the network with respect to the network's modifiable weights.
This gradient is almost always then used in a simple stochastic gradient descent algorithm to find weights that minimize the error. “
“ B ackpropagation " is used in a more general sense, to refer to the entire procedure encompassing both the calculation of the gradient and its use in stochastic gradient descent.
Backpropagation usually allows quick convergence on satisfactory local minima for error in the kind of networks to which it is suited.
FEED FORWARD NETWORK Network activation flows in one direction only: from the input layer to the output layer, passing through the hidden layer. Each unit in a layer is connected in the forward direction to every unit in the next layer.
Back propagation is a multilayer feed forward network with one layer of z-hidden units.
The y output units has b(i) bias and Z-hidden unit has b(h) as bias. It is found that both the output and hidden units have bias. The bias acts like weights on connection from units whose output is always 1.
The input layer is connected to the hidden layer and output layer is connected to the output layer by means of interconnection weights.
The architecture of back propagation resembles a multi-layered feed forward network.
The increasing the number of hidden layers results in the computational complexity of the network.
As a result, the time taken for convergence and to minimize the error may be very high.
The bias is provided for both the hidden and the output layer,to act upon the net input to be calculated.
The training algorithm of back propagation involves four stages.
Initialization of weights- some small random values are assigned.
Feed forward- each input unit (X) receives an input signal and transmits this signal to each of the hidden units Z 1 ,Z 2 ,……Z n . Each hidden unit then calculates the activation function and sends its signal Z i to each output unit. The output unit calculates the activation function to form the response of the given input pattern.
Back propagation of errors- each output unit compares activation Y k with its target value T K to determine the associated error for that unit. Based on the error, the factor δ K (k=1,……,m) is computed and is used to distribute the error at output unit Y k back to all units in the previous layer. Similarly, the factor δ j (j=1,….,p) is compared for each hidden unit Z j.
Updation of the weights and biases.
INITIALIZATION OF WEIGHTS
STEP 1: Initialize weight to small random values.
STEP 2: While stopping condition is false, do steps 3-10.
STEP 3: For each training pair do steps 4-9.
STEP 4: Each input unit receives the input signal x i and transmits this signal all units in the above i.e. hidden layer.
STEP 5: Each hidden unit (z h ,h=1,….p) sums its input signals.
Z inj = V oj +∑x i v ij
Applying activation function
Z j =f(Z inj )
And send this signal to all units in the layer above i.e output units.
STEP 6: Each output unit (Y k, k=1,….m) sums its weighted input signals.
Y ink =W ok +∑Z j W ij
And supplies its activation function to calculate the output signals.
Y k =f(Y ink )
BACK PROPAGATION OF ERRORS
STEP 7: Each output unit (Y k ,k=1,…m) receives a target pattern corresponding to an input pattern,error information term is calculated as
δ k =(t k -Y k )f(Y ink )
STEP 8:Each hidden unit (Z j, j=1,….n) sums its delta inputs from units in the layer above
δ inj =∑δ j W jk
the error information term is calculated as
δ j = δ inj f(Z inj )
UPDATION OF WEIGHTS AND BIASES
STEP 9: Each output unit (Y k ,k=1,…,m) updates its bias and weights (j=0,…,n)
The weight correctiom term is given by
∆ W jk =αδ k Zj
And the bias correction term is given by
∆ W ok =αδ k
W jk(new) =W jk(old) + ∆W jk
W ok(new) =W ok(old) + ∆W ok
Each hidden unit(Z i ,j=1,….,p) updates its bias and weights (i=0,…..n)
The weight correction term
∆ V jk =αδ j x i
The bias correction term
∆ V oj =αδ j
Therefore, V ij(new) =V ij(old) +∆V ij ,
V oj(new) =V oj(old) +∆V oj
STEP 10: Test the stopping condition.
The stopping condition may be minimization of the errors,number epochs etc.
The application algorithm for BPN is shown below:
STEP 1: Initialize weights (from training algorithm).
STEP 2: For each input vector do steps 3-5.
STEP 3: For i=1,…,n; set activation of input unit, X i ;
STEP 4: For j=1,….,p;
Z inj =V oj +∑X i V ij
STEP 5: For k=1,…,m
Y ink =W ok +∑Z j W jk
Y k =f(Y ink )
MERITS OF BACK PROPAGATION
Relatively simple implementation.
Standard method and generally works well.
Mathematical Formula used in algorithm can be applied to any network.
It does not require any special mention of the features of the function to be learnt.
Computing time is reduced if the weights chosen are small at the beginning.
Batch update of weights exist,which provides a smoothing effect on the weight correction terms.
DEMERITS OF BACK PROPAGATION
Slow and inefficient.
Can get stuck in local minima resulting in sub- optimal solutions .
A large amount of input/output data is available, but you're not sure how to relate it to the output.
The problem appears to have overwhelming complexity, but there is clearly a solution.
It is easy to create a number of examples of the correct behaviour.
The solution to the problem may change over time, within the bounds of the given input and output parameters (i.e., today 2+2=4, but in the future we may find that 2+2=3.8).
Outputs can be "fuzzy", or non-numeric.
The Lego Mindstorms Robotics Invention System (RIS) is a kit for building and programming Lego robots.
It consists of 718 Lego bricks including two motors, two touch sensors, one light sensor, an infrared tower, and a robot brain called the RCX.
The RCX is a large brick that contains a microcontroller and an infrared port. You can attach the kit's two motors (as well as a third motor) and three sensors by snapping wire bricks on the RCX.
The infrared port allows the RCX to communicate with your desktop computer through the infrared tower.
A Roverbot , constructed in the Lego Mindstorms Constructopedia,has been used as the guide for constructing robots.
This Roverbot, as shown in Figure 1, has been configured to use all three sensors and two motors included in Lego Mindstorms RIS 2.0.
BACK PROPAGATION NETWORK
A backpropagation network has to be modeled for our Roverbot
The robot has three inputs (two touch sensors and one light sensor) and two outputs (the two motors).
So, a three-layer backpropagation network has been used , where we change the unit index to begin with 0;.
Moving forward: If Sensor 1 is off, and Sensor 2 is over a white floor, and Sensor 3 is off, then Motor A and Motor C go forward (Roverbot goes forward)
Moving right: If Sensor 1 is on, then Motor A goes forward, and Motor C goes backward (Roverbot turns right)
Moving left: If Sensor 3 is on, then Motor A goes backward, and Motor C goes forward (Roverbot turns left)
Moving backward: If Sensor 2 is over a black floor, then Motor A and Motor C go backward (Roverbot goes backward)
Lego Mindstorms robots are cool toys used by hobbyists all around the world. They prove suitable for building mobile robots and programming them with artificial intelligence.
It is a fully-self-contained planar laboratory-prototype of an autonomous free-Lying space robot complete with on-board gas, thrusters, electrical power, multi-Processor computer system, camera, wireless Ether- net data/communications link, and two cooperating manipulators. It exhibits nearly frictionless motion as it oats above a granite surface plate on a 50 micron thick cushion of air. Accelerometers and an angular-rate sensor sense base motions.
The diffIcult aspects of a neural-network control application are the decisions about how to structure the control system and which components are to be neural-network-based.-rate sensor sense base motions.
Training a neural network to produce a thruster map-
ping based upon a model of the robot can be thought
of as learning the inverse model of the robot-thruster
This is a common approach and would be relatively Straightforward if not for the discrete-valued functions that represent the on-thrusters.
Some modication to the learning algorithm is required to allow gradient-based optimizationto be used with these non-dierentiable functions.
The method relies on approximation of the discrete-valued functions with
oisy sigmoids" during training. This is a broadly-applicable algorithm that applies to any gradient-based optimization involving discrete-valued functions.
The robot thruster layout in nominal and failed congurations. The magnitude and
di-rection of each thruster is shown. Nominally, each Thruster produces 1 Newton of force, directed as shown. The failures were produced by mechanically changing the thrusters. Failures include: half strength, plugged completely, angled at 45, and angled at 90. The 90 failure mode places high demands on the control reconguration system, since It destabilizes the robot.
Very rapid learning is possible due first to the FCA,and due second to the growing of the network. With few hidden neurons, quick learning takes place since fewer computations are required, and fewer training
patterns are required (to avoid overting). The netWork begins with 3 inputs, 8 hidden neurons, and 8 outputs, and gradually grows to 30 or more hidden neurons as training progresses. New hidden neurons are added when performance begins to plateau.
To prevent overtting, the training-set size is grown proportionally with the number of hidden neurons.With this arrangement, a mapping with about 30% error above optimal results in 30 seconds, 20% above optimal within 60 seconds, and 10% above optimal1
NUMBER RECOGNITION SYSTEM
In the handwritten writing recognition, we proposed an of the isolated
handwritten digits. this system is divided in three phases:
Acquisition and preprocessing.
ACQUISITION AND PREPROCESSING
T he acquisition is done with a numeric scanner of resolution 300 dpi with 8 bits/pixels, the used samples are all possible classes of the handwritten digits (0,1,2,3,4,5,6,7,8,9) with variable sizes and variable thickness, and with 100 samples for every class.
Images of our database are formed only by two gray levels: the black for the object and the white for the bottom. The figure 2 shows some samples of the used database.
The preprocessing attempts to eliminate some variability related to the writing process and that are not very significant under the point of view of the recognition, such as the variability due to the writing environment, writing style, acquisition and digitizing of image.
FEATURE EXTRACTION & RECOGNITION
Features extraction method limits or dictates the nature and output of the preprocessing step and the decision to use gray-scale versus binary image, filled representation or contour, thinned skeletons versus full-stroke images depends on the nature of the features to be extracted.
For training the neural network, back propagation with momentum training method is followed. This method was selected because of its simplicity and because it has been previously used on a number of pattern recognition problems.
The recognition of the isolated handwritten digits, but also to the recognition of the handwritten numeric chains constituted of a variable number of digits.
Every obtained digit by segmentation is presented separately to the entry of the system achieved in the first part (recognition system of the isolated handwritten digits) and will undergo the following processing: Normalization, features extraction
and finally the recognition.
Finally and after the presentation of all segments, the system displays the
result of the recognition of the presented handwritten numeric chain.
The obtained recognition rate for the database used attain 91.30%
FACE RECOGNITION SYSTEM
A special advantage of this technique is that there is no extra learning process included here, only by saving the face information of the person and appending the person’s name in the learned database completes the learning process.
Recognition System (FRS) can be subdivided into two main parts.
1. image processing scanning, Image
enhancement, Image clipping, Filtering, Edge detection and Feature extraction.and
2.recognition techniques Genetic Algorithm and Back Propagation Neural Network.
As the recognition machine of the system; a three layer neural network has been used that was trained with Error Back-propagation learning technique with an error tolerance of 0.001.
A. Face Image Acquisition To collect the face images, a scanner has been used. After scanning, the image can be saved into various formats such as Bitmap, JPEG, GIF and TIFF. This FRS can process face images of any format. B.Filtering and Clipping The input face of the system may contain noise and garbage data that must be removed. Filter has been used for fixing these problems. For this purpose median filtering technique has been used. After filtering, the image is clipped to obtain the necessary data that is required for removing the unnecessary background that surrounded the image.
C. Edge detection
Then the edges are detected using high-pass filter, high-boost filter, median filter
procedure for determining edges of an image is similar everywhere but only
difference is the use of masks. quick mask has been used in only one direction
for an image; on the other hand others are applied in eight direction of an
image. So, the quick mask is eight times faster than other masks.
E. Features Extraction To extract features of a face at first the image is converted into a binary. From this binary image the centroid (X,Y) of the face image is calculated using equation 1and 2 Where x, y is the co-ordinate values and m=f(x,y)=0 or 1. Then from the centroid, only face has been cropped and converted into the gray level and the features have been collected. F. Recognition Extracted features of the face images have been fed in to the Genetic algorithm and Back-propagation Neural Network for recognition. The unknown input face image has been recognized by Genetic Algorithm and Back-propagation Neural Network Recognition phase
Therefore the efficiency of the Face Recognition System by
using Back-propagation Algorithm is 91.30%
FACE RECOGNITION SYSTEM
Face recognition has the benefit of being a passive, non-intrusive system for verifying personal identity. The 1Physiological or behavioral characteristics which uniquely identify us.techniques used in the best face recognition systems may depend on the application of the system. We can identify at least two broad categories of face recognition systems:
1. We want to find a person within a large database of faces (e.g. in a police database). These systems typically return a list of the most likely people in the database . Often only one image is available per person. It is usually not necessary for recognition to be done in real-time. 2. We want to identify particular people in real-time (e.g. in a security monitoring system, location tracking system, etc.), or we want to allow access to a group of people and deny access to all others (e.g. access to a building, computer, etc.) . Multiple images per person are often available for training and real-time recognition is required.
Many people have explored geometrical feature based methods for face recognition. It presented an automatic feature extraction method based on ratios of distances and reported a recognition rate of between 45-75% with a database of 20 people. Geometrical features such as nose width and length, mouth position, and chin shape. They report a 90% recognition rate on a database of 47 people. However, they show that a simple template matching scheme provides 100% recognition
High-level recognition tasks are typically modeled with many stages of processing as in the Marr paradigm of progressing from images to surfaces to three-dimensional models to matched models. However, it is likely that there is also a recognition process based on low-level, twodimensional image processing
A hierarchical neural network which is grown automatically and not trained with gradient-descent was used for face recognition.They report good results for discrimination of ten distinctive subjects.
FINGERPRINT RECOGNITION SYSTEM
HARDWARE & SOFTWARE
LICENSE PLATE RECOGNITION
In Persian number plates are used a set of characters and words in Farsi and Latin alphabet.Therefore we need several optical character recognition (OCR) for identify numbers, letters and words in both languages of Farsi and Latin.
This can be easily done through back propagation network.
The license plate candidates determined in the locating stage are examined in the license character identification step. There are two major tasks involved in the identification step,
1.Character separation accomplished by connected components and blob coloring.
2.Character recognition implemented with artificial
neural network with back propagation learning algorithm.
RECOGNITION WITH NEURAL NETWORK
With multiple layers neural network learning is done with back propagation algorithm on the several of sample image license plate.
When learning of neural network complete, we can used that for recognize place of license plate.
In this method image scans with N row distance.In the scanning image algorithm count edge of image which they are located in specify distance from each other.
If number of edge greater than a threshold that is location of license plate in image.
If in the first scanning program didn't find place, repeated algorithms between lines of previous with reduce threshold for counting edges. This algorithm repeated until place of license is finding.
CITY WORD RECOGNITION
Farsi script word recognition presents challenges because all orthography is cursive and letter shape is context sensitive.
The Holistic paradigm in word recognition treats the word as a single, indivisible entity and attempts to recognize words from their overall shape, as opposed to their character contents. this paper present new method for word recognition with holistic approaches.
In the analytical approaches a word decomposed into sequence of smaller subunits or character letters, the problems of these approaches are
1) segmentation ambiguity for deciding where to segment the Farsi word image and
2) variability of segment word shape.
In the plates, number of word limiting to number of important city of country, then it is better we recognize these city name with holistic method.
For holistic word recognition in LPR system, we use a neural network with back propagation algorithm learning, that number of input layer equal than number of features from word of city name and number of output layer equal than number of city in IRAN.
LEAF RECOGNITION SYSTEM
” Leaves Recognition using Back Propagation Neural Networks” is aimed to develop a java program to recognize the images of leaves by using previously trained Neural Network. The outer frame (edge) of the leaf and a back propagation neural network is enough to give a reasonable statement about the species it belongs to. The system is user friendly. The user can scan the leaf and click the recognition button to get the solution.
Another main part of this work is the integration of a feed-forward back propagation neuronal network. The inputs for this neuronal network are the individual tokens of a leaf image, and as a token normally consists of a cosines and sinus angle, the amount of input layers for this network are the amount of tokens multiplied by two. The image on the left should give you an idea of the neuronal network that takes place in the Leaves Recognition application.
LEAF RECOGNITION SYSTEM
Green line: The shape of the leaf image after successful edge detection &thinning.
Red Square: This Square represents a point on the shape of the leaf imagefrom which we are going to draw a line to the next square.
Blue line: The compound of the center of two squares from which we are going to calculate the cosines and sinus angle. Such a blue line is a representation of a leaf token.
If you now take a deeper view on the small triangle zoom on this image you should recognize that it shows a right-angled triangle. This and the summary of all triangles of a leaf image are the representation of the tokens of a leaf from which we can start the neural network calculations.
In order to understand the algorithm consider the figure and details shown below.
LEAF RECOGNITION SYSTEM
This trains the full training set by calling the back propagation algorithm., EPOCH number of times. Initially it assigns array wt1 and wt2 with random weights and initializes inp(0) and out(0) to 1.After opening appropriate file it calls procedure on this file. This is continued for EPOCH number of times. These final adjusted weights are stored in the output file for use during recognition phase.
This reads the input leaf to be recognized by calling image processing unit. For each value of matrix (I, J) the corresponding weights are read from that file. With the help of weights out1 and out2 are calculated.
Back propagation ():
This procedure is used to train the training set. This takes the training patterns from the data input, calculates the corresponding node output values. It measures the error between actual value and desired value, and then used those values for adjusting the weights. So that the network is trained.
RECOGNITION 1.Screen to display the selected leaf1. 2.Screen to display the edge and tokens of the selected Leaf1 3.Screen to display the Leaf Image1. 4.Screen to display the results of the Recognition module. 5.Screen to display the leaf image for finding pest recognition. 6.Screen to display the Pest Percentage of the given leaf and also the damage part.
NAVIGATION OF CAR
The network takes inputs from a 34 X 36 video image and a 7 X 36 range fi nder. Output units represent “ drive straight ” , “ turn left ” or “ turn right ”. After training about 40 times on 1200 road images, the car drove around CMU campus at 5 km/h (using a small workstation on th e car). This was almost twice the speed of any other non-NN algorithm at the time.