OPTICAL CHARACTER
  RECOGNITION FOR
BANGLA HANDWRITTEN
        TEXT
INTRODUCTION
 Optical       character      recognition,(OCR),      is
  the mechanical or electronic translation of
  scanned images of handwritten, typewritten or
  printed text into machine-encoded text.
 It is widely used to convert books and documents
  into electronic files, to computerize a record-keeping
  system in an office, or to publish the text on a
  website.
 OCR makes it possible to edit the text, search for a
  word or phrase, store it more compactly.
Today we have OCRs easily available for the
English language . We can find OCRs for printed
Bengali as well but OCRs for handwritten Bengali are
very rare. And those which are available do not have
a decent recognition accuracy.
                 We aim to create such an OCR which
gives us a considerable recognition accuracy for
handwritten Bengali.
PROBLEM
 Now we are creating an OCR for handwritten
Bengali text. The main problem arises due to
the fact that we are doing it for handwritten
text. So our sample set is very infinite. Also
different      samples     have      different
characteristics. The handwriting samples are
collected from different persons, hence it is
very unlikely that they will follow a similar
pattern.
OUR APPROACH
     We have followed a bottom up method
in our approach, i.e. we start with a specific
sample , and then approach towards the
general solution. We take a particular sample
, apply our methodology to it and find out the
results. Then we re-perform the computation
on a second sample set and depending upon
the performance of our methodology on this
set we keep on improving our process until it
alludes towards a general solution.
Presently we are in our first step of the method :
SEGMENTATION.
     And the methods which we have used are :
Thinning and Run length Reduction
Projection Along Column Scan lines
Thinning and Run length Reduction
Thinning basically is reducing the density of the
  characters ……….
But we faced some difficulties in this approach :
This method was becoming too much
 dependent on the handwriting which is not
 desirable.
The segmentation of the ‘matras’ and the
 character resulted in some gaps in the
 character itself which was not easy to fill in.
The segmentation obtained was not optimum.
PROJECTION
   In this method we project the intensity of
each and every character (here the ‘matras’ of
the character are also taken into account
along with the character). And for every
straight line we get a peak value through
which we can identify the presence of the
‘matra’.
This method too has its own disadvantages
 which can be summed up as follows :
As we are working with handwritten bengali
 text , it is not definite that we will have
 straight lines in the characters, i.e. if someone
 writes in italics then we will have bent
 lines, and the process will not identify that as
 a straight line which it should. Thus this
 method fails for such cases.
Work in the upcoming days
    Due to the above demerits in the pre
discussed methods we are now thinking of a
new method . First of all standardization of
the characters is to be done ..so this will give
us a standard sample set which will probably
overcome all the disadvantages previously
mentioned.
•


    THANK YOU !!!!!

Optical character recognition for bangla handwritten text

  • 1.
    OPTICAL CHARACTER RECOGNITION FOR BANGLA HANDWRITTEN TEXT
  • 2.
    INTRODUCTION  Optical character recognition,(OCR), is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text.  It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.  OCR makes it possible to edit the text, search for a word or phrase, store it more compactly.
  • 3.
    Today we haveOCRs easily available for the English language . We can find OCRs for printed Bengali as well but OCRs for handwritten Bengali are very rare. And those which are available do not have a decent recognition accuracy. We aim to create such an OCR which gives us a considerable recognition accuracy for handwritten Bengali.
  • 4.
    PROBLEM Now weare creating an OCR for handwritten Bengali text. The main problem arises due to the fact that we are doing it for handwritten text. So our sample set is very infinite. Also different samples have different characteristics. The handwriting samples are collected from different persons, hence it is very unlikely that they will follow a similar pattern.
  • 5.
    OUR APPROACH We have followed a bottom up method in our approach, i.e. we start with a specific sample , and then approach towards the general solution. We take a particular sample , apply our methodology to it and find out the results. Then we re-perform the computation on a second sample set and depending upon the performance of our methodology on this set we keep on improving our process until it alludes towards a general solution.
  • 6.
    Presently we arein our first step of the method : SEGMENTATION. And the methods which we have used are : Thinning and Run length Reduction Projection Along Column Scan lines
  • 7.
    Thinning and Runlength Reduction Thinning basically is reducing the density of the characters ……….
  • 8.
    But we facedsome difficulties in this approach : This method was becoming too much dependent on the handwriting which is not desirable. The segmentation of the ‘matras’ and the character resulted in some gaps in the character itself which was not easy to fill in. The segmentation obtained was not optimum.
  • 9.
    PROJECTION In this method we project the intensity of each and every character (here the ‘matras’ of the character are also taken into account along with the character). And for every straight line we get a peak value through which we can identify the presence of the ‘matra’.
  • 10.
    This method toohas its own disadvantages which can be summed up as follows : As we are working with handwritten bengali text , it is not definite that we will have straight lines in the characters, i.e. if someone writes in italics then we will have bent lines, and the process will not identify that as a straight line which it should. Thus this method fails for such cases.
  • 11.
    Work in theupcoming days Due to the above demerits in the pre discussed methods we are now thinking of a new method . First of all standardization of the characters is to be done ..so this will give us a standard sample set which will probably overcome all the disadvantages previously mentioned.
  • 12.
    THANK YOU !!!!!