Classification of Fonts and Calligraphy Styles based on Complex Wavelet Transform

1,323 views

Published on

Classification of Fonts and Calligraphy Styles based on Complex Wavelet Transform, by Alican Bozkurt

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,323
On SlideShare
0
From Embeds
0
Number of Embeds
117
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Classification of Fonts and Calligraphy Styles based on Complex Wavelet Transform

  1. 1. Alican BozkurtPınar Duygulu Şahin GRC 2013A. Enis Çetin Bilkent University
  2. 2. OFR as a mean: Optical Character Recognition (OCR) • As of August 2010, there are 129.864.880 books in the world1. • Only 20 million of them have been digitized. • Digitization ≠ Scanning – Image vs Context – Additional processing • Optical Character Recognition1http://booksearch.blogspot.com/2010/08/books-of-world-stand-up-and-be-counted.html
  3. 3. OFR as a mean: Optical Character Recognition (OCR)• Inter-typeface variability – Vast number of typefaces (>50000)• OCR is like an finding needle in haystack• Knowing the font significantly reduces the size of haystack
  4. 4. OFR as an end: Dead Sea Scrolls• Digitized by Google• Currently 5 scrolls are available• Classification of new scripts
  5. 5. OFR as an end: Identifont• Font search service• Font are expensive! ($25-$1000)• Finding cheaper alternatives: Museo (free) Adelle ($599)
  6. 6. How to Recognize Fonts? Local Global• Information from individual letters • Information from blocks of words• Higher resolution (decision per • Faster word/letter) • Lower resolution (decision per• Needs OCR as preprocessing block)
  7. 7. Dual Tree Complex Wavelet Transform (DT-CWT)
  8. 8. Dual Tree Complex Wavelet Transform (DT-CWT)• Why CWT? – Directional selectivity DWT CWT Real 90 45(?) 0(deg)
  9. 9. Dual Tree Complex Wavelet Transform (DT-CWT)• Why CWT? – Directional selectivity – Shift invariance DWT CWT
  10. 10. Demonstration• Train images • Test image – Printscreens – Random image for “typewriter” – No noise – Real noise – White background – Colored background – ~1900x750 px image size – 1169x1142 px image size – 168x480 px sample size – 96x96 sample size – One paragraph per font
  11. 11. Demonstration• Smaller subsample size – Different height/width ratio• Noise• Different background• Not exact font• %96 success rate – (125/130) – Blue: Courier New Regular – Red: Bookman Regular
  12. 12. Demonstration Train image for “Courier New regular”Test image Train image for “Bookman regular”
  13. 13. Feature extraction • Input Image Step 0
  14. 14. Feature extraction • Input Image Step 0 • Convert Image to binary using Step 1 Otsu’s method
  15. 15. Feature extraction • Input Image Step 0 • Convert Image to binary using Step 1 Otsu’s method • Divide the image into Step 2 subsamples
  16. 16. Feature extraction • Input ImageSubsample Level 1 Level 2 Level 3 Step 0 level 1 angle 15 level 2 angle 15 level 3 angle 15 • Convert Image to binary using Step 1 Otsu’s method level 3 angle 45 level 2 angle 45 • Divide the image into level 3 angle 75 Step 2 subsamples level 1 angle 45 level 2 angle 75 • 3 level DTCWT For each subsample level 1 angle 75
  17. 17. Feature Extraction • Input ImageLevel Step 0 1 μ1 : 0,082091 0,084891 0,060045 0,080689 0,085836 0,060873 • Convert Image to binary using σ1 : 0,14791 0,15201 0,11201 0,14617 0,15402 0,11424 Step 1 Otsu’s method • Divide theLevel Step 2 image into subsamples μ2: 0,22597 0,24064 0,11976 0,23731 0,24072 0,12753 2 σ2: 0,36203 0,35692 0,17401 0,37765 0,34842 0,19024 • 3 level DTCWT For each subsampleLevel μ3: 0,49943 0,54883 0,35954 0,55623 0,56736 0,30949 • Mean and std 3 σ3: 0,6949 0,65361 0,46078 0,72141 0,68851 0,39779 Step 4 • Concatenate Step 5 Φ = [μ1, μ2, μ3, σ1, σ2, σ3] (1x36 feature vector)
  18. 18. Results:English Font Recognition• Dataset – Printscreen, Small natural noise, Artificial noise, Large natural noise – 1 paragraph per font/emphasis pair – 8 fonts: • Arial, Bookman, Century Gothic, Comic Sans, Courier, Computer Modern, Impact,Times New Roman
  19. 19. Results: English Font Recognition• CompetitionAlgorithm Preprocessing? Subsampling Feature Classifier Mean, std of SVM (one Proposed Otsu’s method Variable CWT againist one) Text line 100 random detection, Skewness & EM trainedAviles-Cruz 64x64 normalization, kurtosis Bayes classifier subsamples texture formation Mean,std, max Normalization, SVM (oneRamanathan 3x3 grid of Gabor Otsu’s method against all) responses
  20. 20. Results: English Font Recognition Low Natural Noise Proposed Avilez-Cruz Ramanathan Low Natural Noise A 100Font Proposed Avilez-Cruz Ramanathan 95 Mean: B 90 A 96,88 81,75 100 85 B 100 87 100 80 75 CG 98,45 69,75 97,22 T CG 70 CS 100 75,5 100 65 C 100 96,25 100 I 100 99 100 M 100 97 100 M CS T 100 91 100Mean: 99,41625 87,15625 99,6525 I C
  21. 21. Results: English Font Recognition Low Natural Noise + Artificial Noise Proposed Avilez-Cruz Ramanathan Low Natural Noise + Artifical Noise AFont 100 Proposed Avilez-Cruz Ramanathan 95 Mean: B A 95,31 78,25 97,22 90 85 B 100 83 100 80 CG 98,44 67,5 97,22 75 CS 100 73 100 T CG 70 C 100 91,5 97,22 65 I 98,44 98,5 100 M 100 91,25 100 T 98,44 79,25 97,22 M CSMean: 98,82875 82,78125 98,61 I C
  22. 22. Results: English Font Recognition High Natural Noise Proposed Avilez-Cruz A Ramanathan 100 High Natural Noise 98Font Mean: 96 B Proposed Avilez-Cruz Ramanathan 94 92 A 98,44 - 91,67 90 88 B 98,44 - 88,89 86 T 84 CG CG 92,19 - 94,44 82 CS 100 - 97,22 80 C 100 - 94,44 I 100 - 94,44 M 98,44 - 88,88 M CS T 98,44 - 100Mean: 98,24375 - 93,7475 I C
  23. 23. Results: English Font Recognition Recognition Means Proposed Avilez-Cruz Ramanathan 100 100 100 99.6525 99.41625 98.82875 98.61 98.24375 93.7475 87.15625 82.78125 Printscreen Low Natural Noise Low Natural Noise + artificial noise High Natural Noise
  24. 24. Results: Farsi Font Recognition • Dataset – Small natural noise – 1 paragraph per font/emphasis pair – 8 fonts: • Homa, Lotus, Mitra, Nazanin, Tahoma, Times New Roman, Titr, Traffic, Yaghut, and Zar a: Lotus italic b:Homa bold italic [a] c:Times New[b][c] Roman bold
  25. 25. Results: Farsi Font Recognition• Competition Algorithm Preprocessing? Subsampling Feature Classifier Mean, std of SVM (one Proposed Otsu’s method Variable CWT againist one) Text lineKhosravi and detection, Mean,std of 4x4 grid AdaBoost Kabir normalization, Sobel-Roberts texture formation PCA of Sobel,Senobari and Yes, but not explai 128x128 size Roberts, MLP classifer Khosravi ned subsamples Symlet Wavelets
  26. 26. Results: Farsi Font Recognition Low Natural Noise Proposed Khosravi SenobariFont Proposed Khosravi Senobari L L 92,2 92,2 90,7 100 M 95,3 93,4 93,7 Mean M 95 N 90,6 85,2 92 90 TR 98,4 97,6 95,9 TN 85 N Y 96,9 97,6 98,5 Z 92,2 87,4 90,9 80 H 100 99,2 99,8 75 TI 100 95,2 97 T TR T 100 96,6 98,3 TN 98,4 97,2 98,8Mean 96,41 94,16 95,56 TI Y H Z
  27. 27. Results: Arabic Font Recognition • Dataset – ALPH-REGIM database – 749 different sized/long samples – 10 fonts: • Ahsa, Andalus, Arabic_ transparant, Badr, Bury idah, Dammam, Hada, Kharj, Koufi, Naskh[a] a: Ahsa[b] b: Badr[c] c: Naskh[d] d: Dammam
  28. 28. Results: Arabic Font Recognition• CompetitionAlgorithm Preprocessing? Subsampling Feature Classifier Mean, std of SVM (one Proposed Otsu’s method Variable CWT againist all)Ben Moussa No No Fractal based NN
  29. 29. Results: Arabic Font Recognition ALPH-REGIM Database Proposed Ben Moussa Font Proposed Ben Moussa AH 99,633 94 AH 100 AN 98,1595 94 Mean 98 AN 96 AT 99,734 92 94 92 B 99,5968 100 N 90 AT 88 BU 98,2955 100 86 D 99,8592 100 84 82 H 90,4424 100 KO B K 90,4037 88 KO 99,3478 98 N 98,2418 98 K BU Mean 97,3714 96,4 H D
  30. 30. Results: Speed Test
  31. 31. Results: Ottoman Style Recognition • Dataset – Ottoman Archives – 6 pages per style – Different backgrounds – 5 styles: • Divani, Nesih, Matb u, Talik, Rikaa: Divanib: Matbu c: Nesih d: Rika [a][b][c] e: Talik [d][e]
  32. 32. Results: Ottoman Font Recognition
  33. 33. Conclusion• New feature for font recognition: – Mean and std of 3 level CWT – Higher accuracy than states of art on English, Farsi, Arabic fonts – Faster than state of art – Robust to noise – Performs well on Ottoman texts
  34. 34. References[1] Abuhaiba, I., 2004. Arabic font recognition using decision trees built [14] Khosravi, H., Kabir, E., 2010. Farsi font recognition based on sobelrobertsfrom common words. Journal of Computing and Information Technology features. Pattern Recognition Letters 31 (1), 75 – 82.13 (3), 211–224. [15] Kingsbury, N., 1997. Image processing with complex wavelets. Phil.[2] Amin, A., 1998. Off-line arabic character recognition: the state of the Trans. Royal Society London A 357, 2543–2560.art. Pattern recognition 31 (5), 517–530. [16] Kingsbury, N., 1998. The dual-tree complex wavelet transform: a new ef-[3] Aviles-Cruz, C., Rangel-Kuoppa, R., Reyes-Ayala, M., Andrade- 29Gonzalez, A., Escarela-Perez, R., 2005. High-order statistical texture ficient tool for image restoration and enhancement. In: Proc. EUSIPCO.analysis-font recognition applied. Pattern Recognition Letters 26 (2), Vol. 98. pp. 319–322.135 – 145. [17] Kingsbury, N., 2000. A dual-tree complex wavelet transform with improved[4] Ben Moussa, S., Zahour, A., Benabdelhafid, A., Alimi, A., 2008. Fractalbased orthogonality and symmetry properties. In: Image Processing,system for arabic/latin, printed/handwritten script identification. 2000. Proceedings. 2000 International Conference on. Vol. 2. IEEE, pp.In: Pattern Recognition, 2008. ICPR 2008. 19th International Conference 375–378.on. IEEE, pp. 1–4. [18] Ma, H., Doermann, D., 2003/// 2003. Gabor filter based multi-class[5] Borji, A., Hamidi, M., 2007. Support vector machine for persian font classifier for scanned document images. In: 7th International Conferencerecognition. International Journal of Intelligent Systems and Technologies, on Document Analysis and Recognition (ICDAR). pp. 968 – 972.184–187. [19] Otsu, N., 1979. A threshold selection method from gray-level histograms.[6] Boser, B., Guyon, I., Vapnik, V., 1992. A training algorithm for optimal IEEE Transactions on Systems, Man and Cybernetics 9 (1), 62–66.margin classifiers. In: Proceedings of the fifth annual workshop on [20] Petkov, N., Wieling, M., 2008. Gabor filter for image processing andComputational learning theory. ACM, pp. 144–152. computer vision. Tech. rep., University of Groningen.[7] Cai, S., Li, K., Selesnick, I., ???? Matlab implementation of wavelet [21] Ramanathan, R., Soman, K., Thaneshwaran, L., Viknesh, V., Arunkumar,transforms. Tech. rep., Polytechnic University. T., Yuvaraj, P., oct. 2009. A novel technique for english font[8] Chang, C., Lin, C., 2011. Libsvm: a library for support vector machines. recognition using support vector machines. In: Advances in Recent28 Technologies in Communication and Computing, 2009. ARTCom ’09.ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3), International Conference on. pp. 766 –769.27. [22] Rashedi, E., Nezamabadi-pour, H., Saryzadi, S., 2007. Farsi font recognition[9] Chaudhuri, B., Garain, U., 1998. Automatic detection of italic, bold and using correlation coefficients (in farsi). In: 4th Conf. on Machineall-capital words in document images. In: Pattern Recognition, 1998. Vision and Image Processing, Ferdosi Mashhad.Proceedings. Fourteenth International Conference on. Vol. 1. IEEE, pp. [23] Selesnick, I., Baraniuk, R., Kingsbury, N., 2005. The dual-tree complex610–612. wavelet transform. Signal Processing Magazine, IEEE 22 (6), 123–151.[10] Cortes, C., Vapnik, V., Sep. 1995. Support-vector networks. Mach. 30Learn. 20 (3), 273–297. [24] Villegas-Cortez, J., Aviles-Cruz, C., 2005. Font recognition by invariant[11] Duan, K., Keerthi, S., 2005. Which is the best multiclass svm method? moments of global textures. In: Proceedings of international workshopan empirical study. Multiple Classifier Systems, 732–760. VLBV05 (very low bit-rate video-coding 2005). pp. 15–16.[12] Hsu, C., Chang, C., Lin, C., et al., 2003. A practical guide to support [25] Zhu, Y., Tan, T., Wang, Y., Oct. 2001. Font recognition based on globalvector classification. texture analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23 (10), 1192–[13] Jung, M., Shin, Y., Srihari, S., 1999. Multifont classification using typographical 1200.attributes. In: Document Analysis and Recognition, 1999. [26] Zramdini, A., Ingold, R., 1998. Optical font recognition using typographicalICDAR’99. Proceedings of the Fifth International Conference on. IEEE, features. IEEE Transactions on Pattern Analysis and Machinepp. 353–356. Intelligence 20, 877–882.

×