novel method of differentiating palm leaf scribers using 2D corelationPaper electronics conference-cbit
National Congress on Communications and Computer Aided Electronic Systems (CCAES 2012) A Novel Method of Differentiating Palm Leaf Scribers Using 2D Correlation Panyam Narahari Sastry*, Dr. N.V.Srinivasulu** *Department of Electronics and Communication Engineering, CBIT, Hyderabad, India **Department of Mechanical Engineering, C.B.I.T., Hyderabad, India. E-Mail: *firstname.lastname@example.org Abstract- Character Recognition (CR) is one of the With the passage of time, most of these palm leaves, if leftoldest applications of automatic pattern recognition. unattended, will deteriorate as they are coming to the end ofRecognizing Hand-Written Characters (HWC) is an their natural lifetime as they face destructive elements sucheffortless task for humans, but for a computer it is an as dampness, fungus, bacteria, ants and cockroaches. For thisextremely tricky job. Research in character recognition is reason, Rashtriya Sanskrit Vidyapeeth (RSVP), Tirupati,very popular for various application potentials in banks, Andhra Pradesh, India and Oriental Research Institute, (ORI)post offices, defense organizations, reading aid for the is establishing the Palm Leaf Manuscript Preservationblind, library automation, language processing and Project for the discovery, preservation and protection ofmulti-media design. Even though Epigraphical work palm leaf manuscripts and to extract knowledge from thedealing with stone inscriptions have been analyzed, these ancient world [3, 4]. Currently, computer technology canhave been done largely manually and also on 2D traces. A store and process the ancient image documents in multimedialarge collection of these are available in the classical systems. It is possible to collect and access those manuscriptsIndian languages like Sanskrit, Tamil, Pali etc as well as and preserved them in digital formats in the computer .in more modern languages like Telugu. These palm Although currently storing systems can store documentleaves contain religious texts and treaties on a host of images, there is no specific system to retrieve the knowledgesubjects such as art, medicine, astronomy, astrology, from these ancient documents. However, it is recognized thatmathematics, law and music. However, they have not it is not an easy task as there are many styles of traditionalbeen exploited in the manner they deserve to be. While handwriting, noise on the images, and fragmentation orthe reasons for this are manifold the minimally available cracks due to fragility of the aged leaves. It is common thatmethods applicable to the specific purpose of Palm Leaf images of the collected ancient documents are of poorCharacter Recognition (PLCR) and digitization is one of quality due to insufficient attention paid to the condition ofthe primary reasons. These characters on the palm leaf the storage and the quality of the written material. As ahave the additional properties like depth, an added result, the foreground and background in the scanned imagesfeature which can be gainfully exploited in character are difficult to be separated. Many of the palm leaf imagesrecognition. This paper describes the method to find out have varying contrast and illuminant, smudges, smear, stains,if the palm leaves of two different folios or sets get mixed and contaminations due to seeping ink from the other side ofup using 2D Correlation values. The results obtained the palm leaf elimination is also proposed.show very distinct 2D correlation values between the test Character Recognition (CR) is one of the oldestsamples and the database samples. applications of automatic pattern recognition. To recognize Hand-Written Characters (HWC) is an effortless task forKey Words- Palm Leaf Character recognition, 2D humans, but for a computer it is an extremely tricky job. ThisCorrelation, Folio, Pattern recognition. is mainly due to the vast differences or the impreciseness associated with handwritten patterns written by different I. INTRODUCTION individuals [6, 7]. Machine recognition involves the ability of a computer to receive input from sources such as paper Palm leaf manuscripts were one of the popular written and other documents, photographs, touch screens and otherdocuments for over a thousand years in South and Southeast devices, which is an ongoing research area .Asia [1, 2]. In Indian history, dried palm leaves have beenused to record Buddhist teaching and doctrines, folklores, II. DATA ACQUISITION METHODknowledge and use of indigenous medicines, stories ofdynasties, traditional arts and architectures, astrology, Palm leaves were provided by Oriental Research Instituteastronomy and techniques of traditional massages. Recently, (ORI), S.V. University Campus, Tirupati, Andhra Pradesh.several universities and institutes including medical For the present research, we have chosen palm leaves of twodepartments and religion organizations have initiated different scribers. The photographs of the palm leaves areprojects to collect, recover and preserve Indian palm leaf shown in figure 1 wherein the red arrow depicts the holes ofmanuscripts. It is recognized that these documents contain the Folio which helps to store the leaves between the woodeninvaluable knowledge, history, culture, and local wisdoms of boards.Indian civilization. In particular, knowledge concerningindigenous medicines has been studied with great attentiondue to their potential in treating many ailments and diseases.
A Novel Method of Differentiating Palm Leaf Scribers Using 2D Correlation III. PROPOSED SCRIBER RECOGNITION METHOD In this proposed method, 2D correlation is used to find whether the leaf pertains to a specific scriber’s folio or it is written by a different scriber. In handwritten documents on paper, writers can be differentiated distinctly on the basis of appearance of the letters. Since for counterfeits it is easy to copy the appearance of characters, identifying writers to a writing/signature can also be a tricky affair. Lipikaras were highly trained professionals and scribing on the palm leaves was an extremely serious affair. Thus distinguishing the scribings on the basis of appearance of the letters is not simple. However, the pressure applied by the scriber at various pixel points in a character is different for different scribers. It is presumed that pressure is directly proportional to the depth of indentation (in microns) which is available to us from the Z axis data. Using this concept, images were compared in YZ plane for two different scribers for the various Telugu palm leaf characters. For the same character 5 samples for 2 different scribers were considered for testing and training. Table no. 1 and Table no 2 are showing the co- ordinates for the Telugu Characters “Aa” and “Tha”. Our basis for the differentiation was to compare the correlation value obtained for the character scribed by the author at different positions and correlation of the same Fig. 1 Palm leaves chosen for the study character when scribed by a different author. Thus if the test image of a particular character say “Ae” belongs to scriber 1 Table No. 1 Co-ordinates of Aa then the correlation coefficient obtained between the test image and any other sample image of “Ae” of scriber 1 Aa should be distinctly greater compared to the correlation Pixel points X (mm) Y (mm) Z (mm) 1 1.091 0.16 25 coefficient obtained between the test image and “Ae” of 2 1.456 0.49 24 scriber 2 samples. Recognizing the scriber of a certain 3 0.925 0.999 26 document is a great challenge in terms of pattern recognition 4 0.338 0.725 29 but is also of immense value. 5 0 0 29 6 0.338 -0.547 28 Traditional paper based documents are being replaced by 7 1.832 -0.825 27 digital documents for official and legal purposes. Hence 8 2.797 -0.547 28 authenticating these digital documents is extremely critical. 9 3.002 0.396 29 Authentication of the security documents including 10 2.51 0.756 33 banknotes, passports, etc. which may be printed on paper or 11 2.281 0.306 28 12 3.042 -0.087 34 any other support is a very important application of Digital 13 2.098 -0.047 34 Document Analysis. Automatic document authentication 14 0.741 -0.06 36 consists of an image acquisition system such as a CCD 15 0.741 -0.08 38 camera and a processor whose job is to compare the acquired intensity profile with a pre stored reference image. The Table No. 2 Co-ordinates of Tha document handling device accepts or rejects the document depending on the match, which is connected to the Tha comparing processor. E-mails are the electronic documents Pixel points X (mm) Y (mm) Z (mm) 1 0.308 -0.400 26 which have replaced the paper documents due to the need of 2 0.687 0.066 26 quick response and faster means of communication but still 3 0.300 0.327 34 4 0.000 0.000 31 lack the accountability. Digital watermarking and public key 5 0.188 -0.426 20 encryption-based authentication are the most common 6 0.418 -0.789 25 methods used for authentication of the digital documents. It 7 0.833 -0.658 25 8 1.428 -0.618 39 is possible to apply this concept of pen pressure to the online 9 1.620 -0.423 38 signature verification in addition to the two dimensional 10 1.670 -0.144 35 character matching. 11 1.180 0.412 38 12 1.310 -0.127 39 13 1.400 0.390 28 14 0.842 0.798 34 15 0.284 0.876 96 16 0.842 0.812 94 17 1.345 1.342 48 18 1.949 1.554 59
National Congress on Communications and Computer Aided Electronic Systems (CCAES 2012) test character set and the training character set are IV. IMPLIMENTATION OF SCRIBER RECOGNITION disjoint sets in this work. METHOD CC* in the table depicts Correlation Coefficient value (r)The flow chart of proposed method of implementing scriber calculated using the following equation 1 :recognition is shown in figure 2. Load Test Images and data base images Step 1 of size 50x50 pixels (1) Step 2 Read all the images Convert the image into binary type Step 3 using a threshold value of 0.7 where A and B are the matrices of images of same size and r indicates the Correlation Coefficient in the range of 0 to 1. Find Average correlation co-efficient of Step 4 test images belonging to scriber 1 with all the scriber 1 data base images of same character Find Average correlation co-efficient of Step 5 test images belonging to scriber 1 with all the scriber 2 data base images of same character Step 6 Allot the test character to the scriber displaying a higher average correlation coefficient Step 7 Check for the number of matched and mismatched characters Fig: 2 Implementation of scriber authentication algorithm V. RESULTS AND DISCUSSIONS All the experiments are carried on a PC machine with P43GHz CPU and 512MB RAM memory under Matlab 7.0platform. The database images consists of 4 different imagesof each class and hence 29X4=116 images. These images areof size 50X50 pixels and are in the .tiff format. More than300 character images were tested. All the images of thedatabase and the images to be tested were of YZ plane ofprojection (XY data failed to differentiate the charactersbetween authors). All the 28 different Telugu characters(Classes) were used as test characters to test the accuracy ofthe proposed method. The Database images consisted ofboth Scriber No. 1 and 2 where as the test characters(Images) were of Scriber No. 1 (One from each class). Each of the test image character was tested usingCorrelation with all the available database images (consistingof both Scriber1 and Scriber 2) for a particular character. Theaverage correlation co-efficient of the test character withboth Scriber 1 and 2 was determined separately and theresults are tabulated in Table No. 3. Also, the time taken torun the program has been captured in the same table. The
A Novel Method of Differentiating Palm Leaf Scribers Using 2D Correlation Table No. 3 Scriber Authentication results Author Author Test Program Scriber 1 2 Difference of % Difference Of S.No. Char time in Recognized (Yes average average CC Correlation acter Seconds / No) CC* CC* 1. a 0.2096 0.0277 0.1819 86.78435115 0.89 Y 2. aa 0.1182 0.0051 0.1131 95.68527919 0.39 Y 3. ala 0.1093 0.0253 0.084 76.85269899 0.46 Y 4. bra 0.1505 0.0049 0.1456 96.74418605 0.44 Y 5. khaa 0.2414 0.0204 0.221 91.54929577 0.53 Y 6. la 0.0706 0.0614 0.0092 13.03116147 0.39 Y 7. tha 0.1614 0.1154 0.046 28.50061958 0.48 Y 8. ae 0.0384 0.001 0.0374 97.39583333 0.46 Y 9. gha 0.1433 0.1284 0.0149 10.39776692 0.39 Y 10. haa 0.1022 0.0067 0.0955 93.44422701 0.46 Y 11. na 0.0898 0.0199 0.0699 77.83964365 0.4 Y 12. pa 0.2665 0.0115 0.255 95.684803 0.46 Y 13. sa 0.1255 0.0744 0.0511 40.71713147 0.46 Y 14. shaa 0.3062 0.0005 0.3057 99.83670803 0.46 Y 15. va 0.1858 0.0371 0.1487 80.03229279 0.45 Y 16. ya 0.2905 0.1143 0.1762 60.65404475 0.45 Y 17. ka 0.1223 0.0038 0.1185 96.89288635 0.47 Y 18. ksha 0.1633 0.0044 0.1589 97.30557257 0.46 Y 19. ba 0.1104 0.0657 0.0447 40.48913043 0.45 Y 20. bha 0.0996 0.0091 0.0905 90.86345382 0.46 Y 21. ja 0.127 0.0406 0.0864 68.03149606 0.46 Y 22. ru 0.3121 0.2251 0.087 27.87568087 0.4 Y 23. da 0.1697 0.0571 0.1126 66.35238656 0.47 Y 24. cha 0.0905 0.0402 0.0503 55.5801105 0.39 Y 25. dha 0.128 0.0074 0.1206 94.21875 0.46 Y 26. ee 0.0538 0.0439 0.0099 18.40148699 0.45 Y 27. ga 0.058 0.0244 0.0336 57.93103448 0.51 Y 28. saa 0.0311 0.0165 0.0146 46.94533762 0.45 Y Characters in red are the lowest CC values and in blue are > 95% CC difference
National Congress on Communications and Computer Aided Electronic Systems (CCAES 2012) VI. CONCLUSIONS . V.N.Manjumeh Aradhya, G.Hemanth Kumar, S.Noushat, “Multilingual OCR system for South Indian1. All the test characters belonging to Scriber 1 had higher Scripts and English documents: An approach based on average correlation co-efficient when tested with Fourier transform and PCA”, Elsevier, Engineering scriber1 compared to characters of scriber 2 located at applications of artificial intelligence, pp. 658-668, 2008. other position on the leaf. The test and the training . B.B.Chaudhuri and Ujwal Bhattacharya, Handwritten character set are disjoint sets. numeral databases of Indian scripts and multistage2. The Characters La, Tha, Gha, Ru and Ee have shown recognition of mixed numerals, IEEE transcations on pattern less than 30% of difference of average correlation analysis and machine intelligence, Vol.31 No.3, pp.444-457, between the test character and database characters of March 2009. Scriber1 and 2.  Senior and Robinson , “An Off-Line Cursive3. The time taken for identification of the Scriber is very Handwriting Recognition System”, IEEE Transactions on low and is less than 1 second. Pattern analysis and Machine Intelligence, Vol.20, No.3, pp.4. If the right characters are selected as test character then 309-321, 1998. the scriber identification is 100 %.5. A rigorous test of this idea needs to be further established with data from a greater number of samples/scribers, which is beyond the scope of the present work. If the above mentioned characters remain poor differentiators we can select specific characters (characters in blue in the table 10.1) which can be used for differentiation/authentication more accurately. ACKNOWLEDGMENT The author whole heartedly acknowledges the co-operation extended by Sri S. Anand, Finance Officer, RSVP(Rashtriya Sanskrit Vidyapeeth), Tirupati in procuring thepalm leaves from Oriental Research Institute, Tirupati, A.P,India. Further, the author expresses sincere gratitude to Dr.Vally Maya who has actively participated in the technicaldiscussions and rendered appropriate suggestions at everystage in the work. REFERENCES O. Surinta and R. Chamchong, "Image Segmentation ofHistorical Handwriting from Palm Leaf Manuscripts," in 5thIFIP International Conference on IntelligentInformation Processing, Beijing, China, 2008, p. 280. Z. Shi, S. Setlur, and V. Govindaraju, "DigitalEnhancement of Palm Leaf Manuscript Images usingNormalization Techniques," in 5th InternationalConferenceOn Knowledge Based Computer Systems, Hyderabad, India,2004. Panyam Narahari Sastry, Ramakrishnan Krishnan,Bhagavatula Venkata Sanker Ram, Telugu CharacterRecognition on Palm Leaves-A three dimensional ApproachTechnology Spectrum (JNTU Hyderabad), Vol. 2, No. 3,pp.19-26, November 2008..Panyam Narahari Sastry, Ramakrishnan Krishnan andBhagavatula Venkata Sanker Ram, Classification andIdentification of Telugu hand written characters extractedfrom palm leaves using decision tree approach, ARPNJournal of Engineering and Applied Sciences, Vol. 5, No. 3,March 2010. Panyam Narahari Sastry, Ramakrishnan Krishnan andT.V.Rajanikant, Palm Leaf Telugu Character Recognitionusing Hough Transform , International conference onadvanced computing Methodologies, Elsevier, pp 21-28,December 2011.