• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Script identification from printed document images using statistical
 

Script identification from printed document images using statistical

on

  • 729 views

 

Statistics

Views

Total Views
729
Views on SlideShare
729
Embed Views
0

Actions

Likes
2
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Script identification from printed document images using statistical Script identification from printed document images using statistical Document Transcript

    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME607SCRIPT IDENTIFICATION FROM PRINTED DOCUMENT IMAGESUSING STATISTICAL FEATURESM. M. Kodabagi1, S. R. Karjol21Department of Computer Science and Engineering,Basaveshwar Engineering College, Bagalkot-587102, Karnataka, India2Department of Computer Science and Engineering,Basaveshwar Engineering College, Bagalkot-587102, Karnataka, IndiaABSTRACTAutomatic identification of a script in a document image facilitates many importantapplications such as automatic archiving of multilingual documents; searching online archives ofdocument images and for the selection of script specific OCR in a multilingual environment. Inthis work a technique for script identification from document images is proposed. The methoduses vertical and horizontal run components/objects of words of a single line of text todistinguish 3 Indian scripts: Kannada, Hindi and English. Initially, the method segments wordsfrom the selected line of text from a document image. Then statistics of horizontal and verticalrun objects are determined. Further, linear discriminant function is used to identify script of thedocument image as Kannada, Hindi or English script. The method has been tested for 300document images and the method found to be robust and efficient. The proposed system achieves93% identification accuracy for Hindi script, 90% identification accuracy for English script and86% identification accuracy for Kannada script.1. INTRODUCTIONIn recent years, the escalating use of physical documents has made progress towards thecreation of electronic documents to facilitate easy communication and storage of documents.However, the usage of physical documents is still prevalent in most of the communications. TheINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING& TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 2, March – April (2013), pp. 607-622© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2013): 6.1302 (Calculated by GISI)www.jifactor.comIJCET© I A E M E
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME608amount of creation and storage of electronic documents is increasing rapidly with the advancesin computer technology. Such data include multi-lingual documents. For example, museumsstore images of old fragile documents in typically large databases. These documents havescientific or historical or artistic value and can be written in different scripts. Document analysissystems that help process these stored images is of interest for both efficient archival and toprovide access to various researchers. Script identification is a key step that arises in documentimage analysis especially when the environment is multiscript and multi-lingual. An automaticscript identification scheme is useful to (i) sort document images, (ii) select appropriate script-specific OCRs and (iii) search online archives of document images for those containing aparticular script.India is a multi-script multi-lingual country and hence most of the document includingofficial ones, may contain text information printed in more than one script/language forms. Forsuch multi script documents, it is necessary to pre-determine the language type of the document,before employing a particular OCR on them. With this context, it is proposed to work on theprioritized requirements of a particular region- Karnataka, a state in India.In a multi-lingual country like India (India has 18 regional languages derived from 12different scripts; a script could be a common medium for different languages), documents likebus reservation forms, passport application forms, examination question papers, bank-challenge,language translation books and money-order forms may contain text words in more than onelanguage forms. For such an environment, multi lingual OCR system is needed to read themultilingual documents. To make a multi-lingual OCR system successful, it is necessary toseparate portions of different language regions of the document before feeding to individualOCR systems. In this direction, multi lingual document segmentation has strong directapplication potential, especially in a multilingual country like India. In the context of Indianlanguages, some amount of research work has been reported. Further there is a growing demandfor automatically processing the documents in every state in India including Karnataka. Underthe three language formulae, adopted by most of the Indian states, the document in a state maybe printed in its respective official regional language, the national language Hindi and also inEnglish. Accordingly, a document produced in Karnataka, a state in India, may be printed in itsofficial regional language Kannada, national language Hindi and also in English. For such anenvironment, multilingual OCR system is needed to read the multilingual documents.According to the three language policy adopted by most of the Indian states, thedocuments produced in Karnataka are composed of texts in Kannada- the regional language,Hindi – the National language and English. Such trilingual documents are found in majority ofthe private and Government sectors, railways, airlines, banks, post-offices of Karnataka state.For automatic processing of such tri-lingual documents through the respective OCRs, a pre-processor is necessary which could identify the language type of the texts words. So, it isproposed to develop a model to identify the script of documents containing Kannada, Hindi andEnglish text.Some essential factors need to be considered before choosing or designing a scriptidentification scheme for any multi-lingual application. These factors are: (a) complexity in pre-processing, (b) complexity in feature extraction and classification, (c) computational speed ofentire scheme, (d) sensitivity of the scheme to the variation in text in document (font style, fontsize and document skew), (e) performance of the scheme, and (f) range of applications in which
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME609the scheme could be used. Performance of the scheme includes accuracy reported and selectionof testing data. Currently, individual approaches are designed such that they can effectively dealwith some of the factors listed above (not all).Some of the key challenges identified in script identification works [1-10] from thefactors listed above are presence of document degradation, skew, varying font size and font type.There are four types of most common document degradation, namely, poor image resolution,noise including salt and pepper noise, and Gaussian noise and physical document degradation.All these document degradation must be compensated before script identification. An image thatis slanting too far in one direction or one that is misaligned is known as skew. Compensating forthe dominant skew angle in an entire page image may not be sufficient adjustment to allowaccurate script identification. In case of varying font size and font type, the relative offsets aredistributed it is difficult to accurately estimate results with limited font size. It is difficult toclassify documents those printed in unfamiliar font types. The difficulty of most of images inscript identification appeared to stem from their unfamiliar font types.From the reported works [1-10] on script identification, the documents produced inKarnataka usually are composed of texts in Kannada, Hindi and English. Though a great amountof work has been carried out on identification of the three languages Kannada, Hindi andEnglish, very few works pertain to script identification processing the document image atword/line level. By analysing the study of work carried out on word level identification ofKannada, English and Hindi, a generalisation of existing work with more accurate results forscript identification from document images have been carried out. Also, the processing ofword/line level reduces the number of computations.Language identification is one of the vision application problems. Generally humansystem identifies the language in a document using some visible characteristic features such astexture, horizontal lines, vertical lines, which are visually perceivable and appeal to visualsensation. This human visual perception capability has been the motivator for the development ofthe proposed system. With this context, an attempt has been made to simulate the human visualsystem, to identify the type of the script based on visual clues, without reading the contents ofthe document. . Hence, this motivated for developing a technique for script identification ofKannada, Hindi and English from printed document images used in Karnataka to report betterrecognition.In this work a technique for script identification of Kannada, English and Hindi fromdocument images is proposed. In this work a technique for script identification from documentimages is proposed. The method uses vertical and horizontal run components/objects of words ofa single line of text to identify the script of the document image. Further, the methoddistinguishes 3 Indian scripts: Kannada, Hindi and English. Initially, the method segments wordsfrom the selected line of text from a document image. Then statistics of horizontal and verticalrun objects are determined. Further, linear discriminant function is used to identify script of thedocument image as Kannada, Hindi or English script. The method has been tested for 300document images and the method found to be robust and efficient. The proposed system achieves93% identification accuracy for Hindi script, 90% identification accuracy for English script and86% identification accuracy for Kannada script. The literature survey related to current work issummarized in the following section.
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME610The rest of the paper is organized as follows; the detailed survey related to scriptidentification from printed document images is described in Section 2. The proposed method ispresented in Section 3. The experimental results and discussions are given in Section 4. Section5concludes the work and lists future directions of the work.2. RELATED WORKSA substantial amount of work has gone into the research related to script identificationfrom printed document images. Some of the related works are summarized in the following.A robust method for determination of the script and language content of DocumentImages proposed in [1]. The algorithm determines connected components and locates upwardconcavities and then classifies the script into two broad classes Han-based (Chinese, Japaneseand Korean) and Latin-based (English, French, German and Russian) languages. The extractionof Rotation Invariant Texture Features and Their Use in Automatic Script Identification has beencarried out in [2]. The method computes features from text blocks using multi-channel gaborfilters, constructs a representative feature vector and Euclidian distance classifier is used forscript identification of 6 languages (Chinese, English, Greek, Russian, Persian, and Malayalam).Script and Language Identification from Document Images using Multiple channel Gabor filtersand gray level cooccurrence matrices (GLCMs) to extract texture features and K-NN classifier isused to classify seven languages; Chinese, English, Greek, Korean, Malayalam, Persian andRussian has been proposed in [3].The Cluster-Based Templates is used for Automatic Script Identification from DocumentImages in [4]. Evaluation of Texture features for Script Identification is carried out in [5]. Amethod for Automatic Identification of English, Chinese, Arabic, Devnagari and Bangla ScriptLine is discussed in [6]. A method for Script and Language Identification in Noisy and DegradedDocument Images is presented in [7]. Script Identification Based on MorphologicalReconstruction in Document Images is described in [8]. A simple technique based on thecharacteristic features of top-profile and bottom-profile of individual text lines for Identificationfor Kannada, Hindi and English text lines from a printed document is proposed in [9]. ScriptIdentification at both paragraph and word level using Appearance based models have beenpresented in [10].A Survey of Script Identification technique for Multi-Script Document Images is carried outin [11]. Two-stage Approach for Word-wise Script Identification of English (Roman), Devnagari andBengali (Bangla) scripts is proposed in [12]. Zone-based Structural feature extraction to recognizefour south Indian scripts namely Kannada, Telugu, Tamil and Malayalam along with English andHindi is employed in [13]. A technique presented in [14] use Voting Technique for ScriptIdentification from a Tri Lingual Document. The technique presented in [15] extracts featuresconsistent with human perception from the responses of a multi-channel log-Gabor filter bank,designed at an optimal scale and multiple orientations for Script Identification from IndianDocuments.A simple and efficient technique for script identification for Kannada, Hindi and English textlines from a printed document using horizontal projection profile is presented in [16]. A method forWord level Script Identification for scanned document images in which during both training andtesting , a Gabor filter is applied and 16 channels of features are extracted is evaluated in [17]. Multi-
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME611script identification technique for Indian languages using different text lines of Indian scripts from adocument are identified in [18].A method found in [19] uses texture-based approach to identify the script type using WaveletPacket Based Features for documents printed in seven scripts: Kannada, Tamil, Telugu, Malayalam,Urdu, Hindi and English.A technique proposed in [20] for language identification in document images to discriminatefive major Indian languages: Hindi, Marathi, Sanskrit, Assamese and Bengali belong to Devnagariand Bangla scripts. But, in the current work horizontal and vertical run objects determined from thetext line of document image are used to determine the script of document. The detailed description ofthe methodology is given in the following section.3. PROPOSED METHODOLOGY FOR SCRIPT IDENTIFICATIONThe proposed methodology uses horizontal and vertical run objects to determine the script ofthe document image containing Kannada, Hindi or English text. The methodology comprises fourphases; Image Acquisition, Preprocessing, Segmentation, Feature Extraction and Linear DiscriminantAnalysis. The block diagram of proposed model is given in Figure 3a. The detailed description ofeach processing step is presented in the following subsections.3.1 Image acquisitionThe process begins with acquiring document images of the three scripts Kannada, Hindi andEnglish. The document images are scanned images which are downloaded from the internet. Thedocument images considered as input are skew free and noise free. About 300 sample images i.e.,100 samples of each script are collected as requirement.Input document imageIdentified script as Kannada/English/HindiFig. 3a. Block Diagram of Proposed ModelPREPROCESSING(Binarization and Bounding Box )SEGMENTATION(Line and Words segmentation)FEATURE EXTRACTION(Horizontal run objects and Vertical run objects)LINEAR DISCRIMINANT ANALYSIS
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME6123.2 PreprocessingIn the preprocessing phase, the text document images taken as input are binarized andbounding box is generated. Binarization is the step in which the image is converted into binaryimage where each pixel is represented by either 0 or 1. Binary image is a black and white type ofimage. Bounding box is generated by applying horizontal and vertical run objects. The purposeof this phase is to make the image easier for the feature extraction and classification.3.3. SegmentationIn this phase the segmentation of single line from the document image is carried out. Thebounding box is generated around the segmented line. From the selected line, the words aresegmented and bounding boxes are generated to the segmented words. The segmentation processof line and words is described below.• Segmentation of lineThe horizontal projection features are determined to segment a line from the documentimage. Bounding box is generated to the segmented line. The line segmentation of Hindi script isas shown in below Figure 3b, the English script is Figure 3c and the Kannada script is Figure 3d.3b3c3dFig. 3b, c, d. Sample Images of segmented lines of Hindi, English and Kannada script• Segmentation of wordsThe vertical projection features are determined to extract words from the selected lines.Using the boundary between two consecutive vertical projections, the words are segmented.Then the bounding box’s are generated to the segmented words. The segmented words of aboveFigures 3b, 3c and 3d are given in the below Figures 3e, 3f and 3g respectively.
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME6133e 3f 3gFig 3e, f, g. Sample Images of Segmented words of Hindi, English and Kannada lines.3.4. Feature extractionIn this phase, the Horizontal run object and vertical run objects of each segmented textwords are determined.Horizontal run objectIn the binary image of each text word, a set of consecutive pixels in a row whose lengthis greater than the threshold value (HT) results in a horizontal run objects.Vertical run objectIn the binary image of each text word, a set of consecutive pixels in a column whoselength is greater than the threshold value (VT) results in a vertical run objects. The number ofhorizontal and vertical run objects are determined and stored into a feature vector Fv as given inequation (1).(1)Where, Fv is Feature Vectoris the number of horizontal run objectsis the number of vertical run objects
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME6143.5. Linear Discriminant AnalysisThe Discriminant analysis phase of the proposed model uses the and features toclassify the segmented words of the document image as Hindi, Kannada or English script.Condition 1:If one of the horizontal run objects ( ) in a word is greater than half of the number ofcolumns(n2) in a word then, the script of the word is identified as Hindi. (HT is n2/2)> (n2/2) = Word is Hindi script (2)Condition 2:If the value of feature is greater the value of feature , then the script of the wordis identified as Kannada. (HT considered is 3 and VT considered is 5)> = Word is Kannada script (3)Condition 3:Else if the value of feature is greater the value of feature then, the script of theword is identified as English. (HT considered is 3 and VT considered is 5)> = Word is English script (4)After identifying the script of each segmented words then the classification of script ofthe document image is done on the bases of above conditions.Condition 4:If from the selected line in the document image, the number of words identified as Hindiscript i.e. equation (2) is greater than the total number of words in the selected line then, thescript of the document image is identified as Hindi script.Condition 5:If the document image is not Hindi script, then if from the selected line the text wordsidentified as Kannada script i.e. equation(3) are greater than or equal to the words identified asEnglish script i.e. equation(4) from the selected line, then the script of the document image isidentified as Kannada script.Condition 6:Else, if the document image is not Kannada script, then it means the text words from theselected line identified as English script i.e. equation (4) are greater than the words identified asKannada script i.e. equation (3). And hence, the script of the document image is identified asEnglish script.
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME6154. EXPERIMENTAL RESULTS AND DISCUSSIONFor the purpose of experimentation we have created our own database of documentimages. The document images are scanned images which are downloaded from the internet. Thedocument images considered as input are skew free and noise free. About 300 sample imagesi.e., 100 samples of each script are collected as requirement. The proposed methodology hasbeen tested for about 300 document images containing Kannada, Hindi and English script.Horizontal and vertical run objects are used for feature extraction. Further, linear discriminantAnalysis is carried out to identify the script of the document image as Hindi, Kannada or Englishscript. The documents having different font sizes have been considered. Exhaustiveexperimentations were done to analyze the performance of the system for different imagepatterns.4.1. An Experimental Analysis for a Sample Hindi Document Image.Fig. 4a. Sample Input Document ImageFigure 4.a shows sample input document image. The bounding box and Binarization ofinput document image is done. The segmentation of line from the document image is carried out.The segmented line from the document image is shown in the Figure 4.bFig. 4b. Segmented Line from Input Image
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME616After segmentation of line the words are segmented from the selected line. Thesegmented words from the line in Figure 4.b are given in Figure 4.cFig. 4c. Segmented wordsFeature extraction and Linear Discriminant Analysis is carried out. And finally thedocument image is identified as Hindi script. The Figure 4.d shows the result displayed.Fig. 4d. Dialog box4.2. An Experimental Analysis for a Sample English Document Image.Example 2: English sampleFig. 4e. Sample English Input document image
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME617Figure 4.e shows the original English document image. After applying bounding box andbinarization of the image, segmentation of line from the document image is carried out. Thesegmented line from the document image is shown in the Figure 4.f.Fig. 4f. Segmented lineAfter segmentation of line the words are segmented from the selected line. Thesegmented words are given in Figure 4.gFig. 4g. Segmented wordsFeature extraction and Linear Discriminant Analysis is carried out. And finally thedocument image is classified as English script. The Figure 4.h shows the result displayed.Fig. 4h. Dialog box4.3 System Performance AnalysisThe overall system performance of the script identification from printed documentimages is as shown in the below Table 1Table 1: Overall System PerformanceTested scripts Number ofdocument imagesClassification rateWord wiseClassification rateLine wiseHindi script 100 (987/1053) 94% 93%Kannada script 100 (496/636) 78% 86%English script 100 (781/936) 83% 90%
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME6184.4 An Experimental Analysis dealing with various issuesThe proposed methodology has been evaluated dealing with various issues such asvariation in font size and style, color, noise, varying spacing between words. The results ofexperimentation are given below;Example 1: Sample image with containing noisy document image.Fig. 4i. Input document imageFig. 4j. Segmented lineFig 4k. Extracted words
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME619Fig. 4l. Dialog boxExample 2: Sample image with smaller font sizeFig. 4m. Input document imageFig. 4n. Segmented line
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME620Fig. 4o. Segmented wordsFig. 4p. Dialog box5. CONCLUSIONIn this method, Line and Word-Wise identification models to identify Kannada, Hindiand English text words from Indian multilingual machine printed documents have beenpresented. The proposed model is developed based on the visual discriminating features, whichserve as useful visual clues for script identification. Horizontal and Vertical run objects are usedfor feature extraction. The methods help to accurately identify and separate different languageportions of Kannada, English and Hindi. The experimental results show that the method iseffective and good enough to identify and separate the three language portions of the document,which further helps to feed individual language regions to specific OCR system. Further, lineardiscriminant function is used to identify script of the document image as Kannada, Hindi orEnglish script. The method has been tested for 300 document images and the method found to berobust and efficient. The proposed system achieves 93% identification accuracy for Hindi script,90% identification accuracy for English script and 86% identification accuracy for Kannadascript approach. The proposed system can also be extended to identify other Indian languagesand foreign languages.REFERENCES[1] A. L. Spitz, 1997, “Determination of script and language content of documentimages”, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 19, No.3,pp. 235–245, 1997.
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME621[2] T. N. Tan, 1998, “Rotation Invariant Texture Features and their use in Automatic ScriptIdentification”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 7,pp. 751-756, 1998.[3] G. S. Peake and T. N. Tan, 1997, “Script and Language Identification fromDocument Images”, Proc.Workshop Document Image Analysis, vol. 1, pp. 10-17,1997.[4] J. Hochberg, P. Kelly, T. Thomas, L. Kerns, 1997 “Automatic Script Identificationfrom Document Images using Cluster–based Templates”, IEEE Transaction onPattern Analysis and Machine Intelligence, pp. 176-181, 1997.[5] Andrew Busch, Wageeh W. Boles and Sridha Sridharan, 2005, “Texture for ScriptIdentification”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.7,NO. 11, pp. 1720-1732 November 2005.[6] U. Pal, B. B. Choudhuri, 2001 “Automatic Identification of English, Chinese, Arabic,Devanagari and Bangla Script Line”, Proc. 6th International Conference on DocumentAnalysis and Recognition, pp. 790-794, (2001).[7] Lu Shijian and Chew Lim Tan, 2008,” Script and Language Identification in Noisyand Degraded Document Images”, IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 30, no. 1, January 2008.[8] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi, V.S.Malemath, 2006, “Word- wiseScript Identification from Bilingual Documents Based on MorphologicalReconstruction” Digital Information Management, 2006 1st International Conference,pp. 389 – 394, December 2006.[9] M. C. Padma, Dr P. A Vijaya, 2008“Language identification of Kannada, Hindi andEnglish Text Words Through Visual Discrimination Features”, International Journalof Computational Intelligence Systems, Vol.1, No. 2 (May, 2008), 116– 126.[10] T. N. Vikram and D. S. Guru, 2006, “Appearance based models in document scriptidentification”, International School of Information Management and Department ofStudies in Computer Science, University of Mysore, Manasagangotri, Mysore, India.[11] S. Abirami, Dr. D. Manjula, 2009,”A Survey of Script Identification Techniques forMulti-Script Document Images”, International Journal of Recent Trends inEngineering, Vol. No.2, May 2009.[12] Sukalpa Chanda, Srikanta Pal, Katrin Franke, Umapada Pal, 2009, “Two-stageApproach for Word-wise Script Identification”, IEEE 10th International Conference onDocument Analysis and Recognition (ICDAR), pp.926-930,2009.[13] Rajesh Gopakumar, N V Subbareddy, Krishnamoorthi Makkithaya, U DineshAcharya,2010, “Zone-based Structural feature extraction for Script Identification fromIndian Documents”, 5th International Conference on Industrial and InformationSystems, pp. 420-425, Jul 29 - Aug 01, 2010.[14] M. C Padma and P. A Vijaya, 2010, “Script Identification of Text Words from a TriLingual Document using Voting Technique” International Journal of ImageProcessing, Volume (4): Issue (1). pp. 35-52. 2010.[15] Gopal Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy, 2006, “Script Identificationfrom Indian Documents”, In, proceedings of seventh IAPR workshop onDocument Analysis System, New Zealand, pp-255-267, 2006.
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME622[16] Prakash K. Aithal, Rajesh G., Dinesh U. Acharya, Krishnamoorthi M. Subbareddy N.V. ,2010,“Text Line Script Identification for a Tri-lingual Document” IEEE 2010Second International conference on Computing, Communication and NetworkingTechnologies. pp. 1-3. 2010.[17] Huanfeng Ma and David Doerman, 2004, “Word level Script Identification forscanned document images”, In SPIE Conference Document Recognition andRetrieval (San Jose,CA), in press-2004.[18] U.Pal, S.Sinha, B.B.Choudhuri, 2003, “Multi-Script Line Identification from IndianDocuments”, Proc. 7th International Conference on Document Analysis andRecognition (ICDAR 2003) vol. 2, pp. 880-884, 2003.[19] M. C Padma and P. A Vijaya ,2010, “Global Approach for script identification usingWavelet Packet Based Features” International Journal of Signal Processing, Imageprocessing and Pattern Recognition Vol. 3, No. 3 September, 2010.[20] Mallikarjun Hangarge and B.V.Dhandra, 2008, “Shape and MorphologicalTransformation based Features for Language Identification in Indian DocumentImages” First International Conference on Emerging Trends in Engineering andTechnology (IEEE Comput. Soc. Press), pp. 1175-1180, July 2008.[21] M. M. Kodabagi, S. A. Angadi and Chetana. R. Shivanagi, “Character Recognition ofKannada Text In Scene Images Using Neural Network”, International Journal OfGraphics And Multimedia (IJGM), Volume 4, Issue 1, 2013, pp. 9 - 19, ISSN Print:0976 – 6448, ISSN Online : 0976 –6456.[22] Gunjan Singh, Avinash Pokhriyal and Sushma Lehri, “Fuzzy Rule Based Classificationand Recognition of Handwritten Hindi Curve Script”, International journal of ComputerEngineering & Technology (IJCET), Volume 4, Issue 1, 2013, pp. 337 - 357, ISSN Print:0976 – 6367, ISSN Online: 0976 – 6375.