OCR is the transformation of Images of text to Machine encoded text.
A simple API to an OCR library might provide a function which takes as input an image and outputs a string.
In this project we have applied Deep learning Neural Network to solve Optical Character Recognition.
We have made use of Tensorflow and Convolutional Neural Network.
2. INTRODUCTION
• OCR is the transformation of Images of text to
Machine encoded text.
• A simple API to an OCR library might provide a
function which takes as input an image and
outputs a string.
• In this project we have applied Deep learning
Neural Network to solve Optical Character
Recognition.
• We have made use of Tensorflow and
Convolutional Neural Network.
3. MOTIVATION
• Optical character recognition is needed when the
information should be readable both to humans and to a
machine and alternative inputs can not be predefined.
• The basic OCR system was invented to convert the data
available on papers in to computer process able
documents, So that the documents can be editable and
reusable.
• Traditional OCR techniques are typically multi-stage
processes. For example, first the image may be divided into
smaller regions that contain the individual characters,
second the individual characters are recognized, and finally
the result is pieced back together. A difficulty with this
approach is to obtain a good division of the original image.
4. Sample Architecture for CNN
What are convolution Neural Network
• Step 1 – Convolution Operation
• Step 1(b) – ReLu layer (Rectified Linear unit)
• Step 2 – Pooling
• Step 3 – Flattering
• Step 4 – Full Connection
5. Fully Connected Layer of CNN model
Source : Created by Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team
6. OCRGen.py
STRINGPOWER
AI
Dataset is generated using the
Python Imaging Library (PIL)
A fully convolutional network is presented
which transforms the input volume into a
sequence of character predictions.
Predicted Output
Fully Connected
Layer
CSV file
7. Deep OCR Architecture
• A fully convolutional network is presented
which transforms the input volume into a
sequence of character predictions. These
character predictions can then be transformed
into a string. The architecture of the network
is shown below in Figure.
8. • Where N is the number of possible characters. In this example,
there are 63 possible characters for uppercase and lowercase
characters, digits, and a blank character. The parenthesized values in
the convolutional layers are the filter sizes and stride values from
top to bottom respectively. The values in the reshape layer are the
reshaped dimension.
• The input volume is a rectangular RGB image. This first height and
width of this volume are reduced across the convolutional layers
using striding. The 3rd dimension of this volume increases from 3
channels (RGB) to 1 channel for each character possible. Thus, the
volume is transformed from an RGB image into a sequence of
vectors. Applying argmax across the channel dimension gives a
sequence of 1-hot encoded vectors which can be transformed into a
string.
SOURCE
https://github.com/nicholastoddsmith/pythonml/blob/master/Dee
pOCR/TFModel/_classes.txt
9. Result
• To facilitate training this network, a dataset is generated using the Python
Imaging Library (PIL). Random strings consisting of alphanumeric
characters are generated. Using PIL, images are generated for each
random string. A CSV file is also generated which contains the file name
and the associated random string. Some examples from the generated
dataset are shown below in Figure.
Training Data
Generating Data
12. Training the Network
• To train the network, the CSV file is parsed and the images are loaded into
memory. Each target value for the training data is a sequence of 1-hot
vectors. Thus the target matrix is a 3D matrix with the three dimensions
corresponding to sample, character, and 1-hot encoding respectively.
• Next the neural network is constructed using the artificial neural network
classifier (ANNC) class from TFANN. The architecture described above is
represented in the following lines of code using ANNC
• Softmax cross-entropy is used as the loss function which is performed
over the 3rd dimension of the output.
• Fitting the network and performing predictions is simple using the ANNC
class. The prediction is split up using array_split from numpy to prevent
out of memory errors.
13. System Details
• Distributed Deep Learning (DDL) environment
on POWER8 system with IBM PowerAI ML/DL
frameworks.
• 40 threads POWER8, 256 RAM, 1 x k80 GPU
• PushToCompute for compiling POWER8
applications and deploying directly to the
Nimbix Cloud