ISSN: 2277 – 9043            International Journal of Advanced Research in Computer Science and Electronics Engineering   ...
ISSN: 2277 – 9043            International Journal of Advanced Research in Computer Science and Electronics Engineering   ...
ISSN: 2277 – 9043            International Journal of Advanced Research in Computer Science and Electronics Engineering   ...
ISSN: 2277 – 9043             International Journal of Advanced Research in Computer Science and Electronics Engineering  ...
ISSN: 2277 – 9043               International Journal of Advanced Research in Computer Science and Electronics Engineering...
ISSN: 2277 – 9043               International Journal of Advanced Research in Computer Science and Electronics Engineering...
Upcoming SlideShare
Loading in …5

6 11


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

6 11

  1. 1. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 6, August 2012 Handwritten Script recognition using Soft Computing Akhilesh Pandey1, Sunita Singh2, Rajiv Kumar3, Amod Tiwari4Abstract-Today, handwritten script recognition is reorganization is a complex text with following reasons-challenging part in the computer science. It is complexity in preprocessing, complexity in featureimportant to know a script used in writing. Script extraction, complexity in classification, sensitivity of therecognitions have many important applications like scheme to the variation in handwritten text in documentsautomatic transcription of multilingual documents, like font size, font style and document skew and thesearching document image, script sorting. Proposed performance of the scheme. Many researchers have beenwork emphasis on the “block level technique” where done to solve handwritten Multi Script recognitionscript recognition recognizes the script of the given problem in related areas such as Image Processing, Patterndocument in a mixture of various script documents. Recognition, Artificial Intelligence, and cognitive scienceThere has an important role of computational field like etc. Further researches are being done to improveartificial intelligence, expect system. Feature extraction accuracy and efficiency. Recognition of Offlinetechnique is an important step in Script recognition. In Handwritten Multi Scripts is a goal of many researchthis project, we have used combined approach of efforts in the pattern recognition field and A survey ofDiscrete Cosine Transform (DCT) and discrete offline cursive script word recognition is presented in [1].wavelets Transform (DWT) for feature extraction and The survey is classified into three section-in firstneural network (feed forward back propagation) introduction about automatic recognition of handwritingclassifier for classification and recognition purpose. and official regional scripts in India. The nine regionalHuman mind can easily trace handwritten script so scripts are contain and then categorized into fourthere have we use Artificial intelligence in which we subgroups based on their similarity and evolutionuse classifier neural network. The proposed system has information.OCR work is done on Indian scripts reportedbeen experimented on three handwritten scripts Hindi, in [2] in which contain a benchmark database. ManyEnglish and Urdu. Our database contains 961 techniques have been applied for recognition ofhandwritten samples, written in three scripts. Every handwritten Multi Scripts but still it is the case of lessscript (Hindi, English and Urdu) contains 320 samples efficiency and accuracy of recognition. Artificial(160 samples are written in small font and another 160 Intelligence concepts like neural networks are used tosamples are in large font). perform the work as human mind can do. This explores the idea of how humans recognize text in general and areKeywords: Multi-script documents, handwritten script, used to develop machines that simulated this process.Discrete Cosine Transform, Wavelets, neural network Developing these intelligent machines for recognizingclassifier. Multi Scripts is not an easy task; this is because a Multi Script can be written in different ways. Also there are so many imperfections and variation of handwriting such as 1. INTRODUCTION alignment, noise and angles, which make handwritten Multi Script recognition difficult to implement with aToday, many researchers have been done to recognize machine. . Existing script identification depends on themulti script recognition. But the problem of interchanging different feature extraction like DCT and DWT presenteddata between human beings and computing machines is a in [3].the OCR technique is applied on the devanagrichallenging one. Even in present, many algorithms have script on [4] paper. In [5] paper metadata describing thebeen proposed by many researchers so-that these multi text in paragraph, page and line level. Tools to extractscript (Hindi, English, Urdu) can be easily recognize. But paragraphs from pages, segment paragraphs into linesthe efficiency of these algorithms is not satisfactory. have also been developed. two approaches for AmharicMulti-script document is a document in that contains text word recognition in unconstrained handwritten text usinginformation in more than one script. Handwritten script HMMs describe in [6].in which first approach builds word 6 All Rights Reserved © 2012 IJARCSEE
  2. 2. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 6, August 2012models from combined features of constituent characters such as the direction, speed and the order of strokes of theand in the handwriting.second method HMMs of constituent characters areconcatenated to form word model. In [7] paper offline A. Handwritten Multi Script RecognitionarbiclFarsi handwritten recognition algorithm on a subset Handwritten Multi Script Recognition (HMSR) is an areaof Farsi name is proposed. There have use RBF neural of pattern recognition that has been the subject ofnetwork and combination of GA and K-Means clustering considerable research since last some decades. There arealgorithm. The [8] paper is works on street name too many applications in Indian offices such as bank,recognition on Indian language. we know that some street sales-tax, railway, etc. are used English, Hindi and Urduname contain two or more than words so it is concatenate languages. Many forms and applications are filled inthat’s word and create in a single word. Hence, in this these languages and sometimes those forms have to bepaper, we present a multiple feature based approach that scanned directly. If there is no standard HMSR system,combines Discrete Cosine Transform (DCT) and Wavelet then image is directly captivated and there is no optionbased frequency contents for three Indian scripts including for editing those documents. Handwritten scriptEnglish, Hindi and Urdu. The classification is done using recognition (HSR) is a process of automatic computerfeed forward back propagation neural network classifier. recognition of scripts in optically scanned and digitizedThe experiments are carried out on the database at block pages of text. The main objective of an HMSR system islevel. to recognize multi script, which are in the form of digital images, without any human intervention. This is done by II. BACKGROUND INFORMATION searching a match between the features extracted from the given script’s image and the library of image models.Multi Script recognition is a process, which associatesvarious script objects (words) drawn on an image, i.e., B. Pre-processingMulti Script recognition techniques associate a wordidentity with the image of a Multi Script. Mainly, Multi In HMSR, typical preprocessing operations include 1. BinarizationScript recognition machine takes the raw data that furtherimplements the process of preprocessing of any 2. Noise reductionrecognition system. 3. Skew detection The main objectives of Pre-processing methods are:- On the basis of that data acquisition process, Script recognition can be categorized into following two parts: -  In preprocessing technique we perform 2 1. Online Script Recognition operation 2. Offline Script RecognitionOff-line handwriting recognition refers to the process of  Binarization:-transform colored image in to blackrecognizing words that have been scanned from a paper & white imageand are stored digitally in grey scale format. After being img= im2double(rgb2gray(imread(’coins.png’)));stored, it is conventional to perform further processing toallow recognition scheme. In case of online handwritten  Thinning:-Morphological operations on binaryscript recognition, the handwriting is captured and stored images. Thinning is a morphological operation that isin digital form via different means. Usually, a special pen used to remove selected foreground pixels from binaryis used in conjunction with an electronic surface. As the images.pen moves across the surface, the two- dimensional img= bwmorph(img,thin);coordinates of successive points are represented as afunction of time and are stored in order [1]. It is generally After pre-processing phase, a cleaned image is availableaccepted that the on-line method of recognizing that goes to the segmentation phase. The raw data,handwritten text has achieved better results than its off- depending on the data acquisition type, is subjected to aline counterpart. This may be attributed to the fact that number of preliminary processing steps to make it usablemore information may be captured in the on-line case in the descriptive stages of Script analysis. Preprocessing aims to produce data that are easy for the HMSR system to operate accurately. 7 All Rights Reserved © 2012 IJARCSEE
  3. 3. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 6, August 2012 It is an operation that seeks to disintegrate an image of sequence of Scripts into sub images of individual symbols. The utility of conventional systems script segmentation play the main requirement. Different methods used can be classified based on the type of text and strategy being followed like straight segmentation method, recognition-based segmentation and cut classification method. In order to achieve broad utility, it is important that a segmentation method have the following properties: 1. Capture perceptually important groupings, which often ruminating global aspects of the image. Two central issues those are provided precise scriptizations of what are perceptually important, and to be able to specify what a given segmentation technique does. There should be precise definitions of the properties of a resulting segmentation, in order to better understand the method as well as to alleviate the comparison of different approaches.Figure 1: Block Diagram of Script Identification 2. In order to be of practical use, segmentation methods that runs at several frames per second can be used in video processing applications. D. Feature extraction Every Script has features, which play a big role in pattern recognition. English, Hindi and Urdu Scripts have many particular features. Feature extraction describes theFigure 2: Script Sample of English Language relevant shape information contained in a pattern so that the task of classifying the pattern is made easy by a formal procedure. Feature extraction stage in HMSR system analyses these Script segment and selects a set of features that can be used to uniquely identify in the script segment. Mainly, this stage is heart of HMSR system because the expected output depends on these features.Figure 3: Script Sample of Hindi Language Feature extraction is the name given to a family of procedures for measuring the relevant shape information contained in a pattern so that the task of classifying the pattern is made easy by a formal procedure. Among the different design issues involved in building a recognizing system, perhaps the most significant one is the selectionFigure 4: Script Sample of Urdu Language of set of features. Feature extraction for exploratory data projection enables high-dimensional data visualization for better data structure understanding and for cluster analysis. In feature extraction for classification, it is desirable to extract high discriminative reduced-dimensionality features, whichFigure 5: Combined Sample of multi script reduce the classification computational requirements. However, feature extraction criteria for exploratory dataC. Segmentation projection regularly aim to minimize an error function, 8 All Rights Reserved © 2012 IJARCSEE
  4. 4. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 6, August 2012such as the mean square error or the inter pattern distancedifference whereas feature extraction criteria forclassification aim to increase class reparability as possiblecalculated for exploratory data projections are notnecessarily the optimum features of the image.III. REPRESENTATION OF SCRIPT FEATURES a. b.After extracting the features, the data should berepresented in one of two ways, either as a boundary oras a complete region. When the focus is on external shapescript such as corners and modulations then boundaryrepresentation is appropriate. While regionalrepresentation is appropriate when the focus is on internalproperties such as textures or skeleton shape. In someapplications like script recognition these representationscoexist, which often require algorithm based on boundaryshape as well as skeletons and other internal properties. c. d. a. b. e. f. c. d. g. Figure 7. a. Original Cropped Image of Hindi Script b. Black & White Image c. Invert color d. Clear component clear border e. Applying thinning f. DCT form g. DWT form of Hindi Script e. f. a. b. g.Figure 6. a. Original Cropped Image of English Scriptb. Black & White Image c. Invert color d. Clearcomponenet clear border e. Applying thining f. DCTform g. DWT form of English Script c. d. 9 All Rights Reserved © 2012 IJARCSEE
  5. 5. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 6, August 2012 Scripts No. of Train/test Recognition samples result Hindi 373 e. f. English 369 Urdu 320 481/480 82.70% Table 1. Result of Multiple classifiers g.Figure 8. a. Original Cropped Image of Urdu Script b.Black & White Image c. Invert color d. Clear componentclear border e. Applying thinning f. DCT form g. DWTform of Urdu Script III. RESULTSThe sets of handwritten scripts are made. The data setwas partitioned into two parts. The first one is used fortraining the system and the second one for testing. Foreach script, features were computed and stored fortraining the network. Three network layers, i.e. one inputlayer, one hidden layer and one output layer are taken. Ifnumber of neurons in the hidden layer is increased, then aproblem of allocation of required memory is occurred. Bythat recognitions rate we find out the 82.70% accurateresult in all three script. Here we use 50-50 set for thetraining and testing purpose. Table 2. Confusion Matrix REFERENCE [4] Jayadevan, R. Pune Inst. of Comput. Technol., Pune, India Kolhe, S.R. ; Patil, P.M. ; Pal, U.,” Offline Recognition of Devanagari Script:[1] Nabin Sharma. With Co-Authored with U. Pal, and R. Jayadevan, A Survey”, Volume: 41 , Issue: 6,Product Type: Journals &”Handwriting recognition in Indian regional scripts: A survey of offline Magazines,2011.techniques” [5] AlKhateeb, J.H.,” A new approach for off-line handwritten Arabic[2]ram sarkar, nibaran das, subhadip basu, mahantapas kundu, mita word recognition using KNN classifier”, 18-19 Nov. 2009.nasipuri and dipak kumar basu,” cmaterdb1: a database of unconstrainedhandwritten bangla and bangla-english mixed script document image”, [6] Assabie, Y.,’ HMM-Based Handwritten Amharic Word Recognitioninternational journal on document analysis and recognition Volume 15, with Feature Concatenation”, Document Analysis and Recognition,number 1 (2012), 71-83, doi: 10.1007/s10032-011-0148-6, 2012. ICDAR 09. 10th International Conference, 2009.[3] G. G. Rajput and Anita H. B.,” Handwritten Script Recognition using [7] Bahmani, Z., Alamdar, F., Azmi, R., Haratizadeh, S.,” 8) Off-DCT and Wavelet Features at Block Level”,2010. line Arabic/Farsi handwritten word recognition using RBF neural 10 All Rights Reserved © 2012 IJARCSEE
  6. 6. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 6, August 2012network and genetic algorithm ”, Intelligent Computing and Intelligent [13] C. V. Lakshmi and C. Patvardhan, “A high accuracy OCR systemSystems (ICIS),IEEE International Conference on 2010. for printed Telugu text”, in the Proceedings of Conference on Convergent Technologies for Asia-Pacific Region (TENCON 2003),[8] Pal, U., Roy, R.K., Kimura, F.,” Handwritten street name recognition Vol. 2, pp. 725-729, 2003.for Indian postal automation”, Document Analysis and Recognition(ICDAR), International Conference on 2011. [14] Lei Han, Jue Zhong, Arkady Voloshin, Image analysis and data processing of time series fringe pattern of PCBs by using moiré[9] Liangrui Peng, Changsong Liu, Xiaoqing Ding, Hua Wang, interferometry,in: Proceedings of HDP’04, 2004, pp. 141–145."Multilingual document recognition research and its application inChina," dial, pp.126-132, Second International Conference on Document [15] Ping Zhong, Chenjie Song, Nian Luo, Method of extracting high-Image Analysis for Libraries (DIAL06), 2006. resolution digital moiré fringe in warpage measurement, Physical and Failure Analysis of Integrated Circuits, IPFA, 2009, pp. 527–530.[10] U. Pal and B. Chaudhuri. Automatic identification of English,Chinese, Arabic, Devnagari and Bangla script line. In International [16] V. Ablavsky and M.R. Stevens, “Automatic Feature Selection withConference on Document Analysis and Recognition, pages 790-794, Applications to Script Identification of Degraded Documents,” Proc.2001. Int’l Conf. Document Analysis & Recognition, Edinburgh, pp.750-754, Aug. 2003.[11]u.bhattacharya,T.K Das,A.Datta,S.K.Parui,B.B Chaudhuri,”Ahybrid scheme for hand printed numeral recognition based on a self- [17] [2] D.Dhanya, A.G Ramakrishnan and Peeta Basa pati, “Scriptorganizing network and MPL Classifiers,Int.J.Pattern Recognitoin identification in printed bilingual documents,” Sadhana, vol. 27, part-1,Artificial Intelligence”.16(2002) 845-864. pp. 73-82, 2002.[12] K. H. Aparna, V. Subramaniam, M. Kasirajan, G. V. Prakash, V. S.Chakravarthy and S. Madhvanath, “Online handwrting recognition forTamil”, in the Proceedings of 9th International Workshop on Frontiersin Handwriting Recognition(IWFHR), pp. 438-443, 2004. AUTHORS PROFILE: Akhilesh Pandey is an Asst. Professor in Technology degree in Information Technology from Bengal department of computer science and engineering Engineering College, Shibpur(DU), West Bengal. His main Shridhar University, Pilani. He did his MCA from interest area is Image Processing, Pattern recognition,Neural IGNOU in 2002 and after that he worked as a faculty member Networks. in different engineering college. After that he acquired his M. Tech. (CSE) at Sharda University, Gr. Noida, India., His area Dr. Amod Tiwari acquired his Bachelor degree in Mathematics of Interest is Pattern Recognition and neural network. and Science from CSJM Kanpur University Kanpur and master degree in Computer Science and Engineering from Bilaspur Sunita singh done her B.Tech. (CSE) from Lord Central University Bilaspur (CG) in India. His Academic Krishna College, Gaziyabad and done her from sharda university, Gr. Noida, India. She is a excellence shines further with PhD in Computer Science and member of our team and work on the MATLAB. Engineering from Indian Institute of Technology Kanpur with Her programming is very excellent. Her area of interest is the awarded from UPTU Lucknow. His immense experience in Image processing. working for reputed firm like LML Scooter India Ltd, Kanpur, at senior level more than two years. He has been associated with Rajiv Kumar is an Assistant Professor at School Indian Institute of Technology Kanpur from 2005 to 2010. He is of Engineering & Technology, Sharda University, Greater Noida,India. He acquired his Master of currently working as Associate professor in the department of Computer Science and Engineering PSIT Kanpur. Dr. Tiwari has more than 37 Publications in his credit. 11 All Rights Reserved © 2012 IJARCSEE