2. Handwritten recognition
• In an optical character recognition (OCR)
problem, the computer program translates an
image containing text into text.
3. OCR
• OCR
– Image processing such as down sampling,
skeletionization, feature extraction techniques
– AI: Neural networks such as Self-Organizing map,
MLP
4. Algorithm
• One of the commons ways of OCR processing is
by using a multi-layer perceptron (MLP) with
back propagation learning.
5. Image processing: Down sampling
• Down sampling allows an image that exceeds the
network’s input size to be resized.
7. Input Size
• Number of pixels in the image determines the
number of input neurons.
• Ex. If given a 7 by 5 image, then the number of input
neurons is 7 x 5 = 35.
8. Using an MLP
• Number of characters that are to be determined is
the number of output neurons.
• Ex. Since there are 26 letters in our alphabet,
there are 26 output neurons.
9. Hidden Layers
• The number of hidden neurons should be
– between the size of the input layer and the size of the output
layer.
– 2/3 the size of the input layer, plus the size of the output layer.
– be less than twice the size of the input layer.
11. Training the MLP
• First, calculate error of output layer:
• Then, calculate error of the other layers:
12. Change Weights
• Once the errors are calculated, update weights
and biases:
• When the weights are changed, the network has
gone through an iteration of training.
14. Reference
• Heaton, Jeff. “Introduction to Neural Networks
Using Java” 2nd Ed. 2002
• Gorr. “Error Back propagation”
<http://www.willamette.edu/~gorr/classes/cs
449/backprop.html>
• Pap and Singh. “Handwritten English
Character Recognition using Neural Networks”
2010. < http://csjournals.com/IJCSC/PDF1-
2/30..pdf>
Editor's Notes
The Handwriting recognition problem is also an optical character recognition problem. So the computer program is given an image containing text and it tries to recognizes the characters
Basically threre’s two parts when a program tries to recognize a character. Firstly the program does image processing such as downsampling, skeletionization. After processing the image, the image is input into the neural network for recognition, which can vary in structure.
One of the common ways is to process the image, which is then converted into a matrix that then inputs into an mlp. That mlp is then trained and tested against aniticipated outputs.
There are many ways to process the image. One way is to downsample it. This only occurs when the image exceeds the network input size. In this example, the original image is divided into 5 by 7 grids. If there is a pixel is colored in a grid, then the new image has that grid colored.
Another way process the image is to skeletionize it. In this picture, the broad strokes are reduced to thin lines.
Once the image is processed, the image goes into the neural network. The input size of the neural network depends on the number of pixels in the image. For example, if you have a 35 pixel image, the number of input neurons is 35.
The output neurons can be determined by the number of characters you want to recognize. For example, if you know the image will contain the capital letters in the english alphabet, then the number of output neuron is 26. So if an image contains the letter A and the network recognizes it, the output neuron that represents the letter A would fire a 1. the rest would fire a 0.
The number of hidden layers and neurons is arbitrary. There can be at most 2 layers. There are also rules of thumbs to determine the suitable number of neurons in the hidden layer. The number of hidden layers should be between the size of the input and output layer. Or, it should also be 2/3 the size of the input layer plus the size of the output layer. Otherwise, it should be less than twice the size of the input layer.
Once the structure of the network is decided, it has to be trained. Before training, the outputs of the network must be computed. This can be computed by taking the dot product, plus the threshold.
Once the network’s outputs are computed, the error of the output layers are computed. The error is calculated by subtracting the ideal output with the actual output. The rest of the layer’s error are calculated by multiplying the transposed weight matrix of the next layer with the error matrix of the next layer. This matrix is then multiplied with the derivative activation function.
Once the errors are calculated, the change in weight is calculated by multiplying the error matrix with the output of the previous layer. The change in bias matrix is the error matrix.
So far most handwritten recognizers have a success rate of 80 % to 90%, but not yet 100%.