IMPACT Final Conference - Research Parallel Sessions02 research session_ncsr_tools

IMPACT Tools Developed by NCSR IMPACT Final Conference 2011 24-25 October 2011, London, UK B. Gatos Computational Intelligence Laboratory Institute of Informatics and Telecommunications National Center for Scientific Research ( NCSR ) "Demokritos" GR-153 10 Agia Paraskevi, Athens, Greece

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

Computational Intelligence Laboratory (CIL) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],[object Object],Institute activities IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],[object Object],[object Object],Institute Personnel CIL Personnel ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

Computational Intelligence Laboratory (CIL) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Computational Intelligence Laboratory Institute of Informatics and Telecommunications N ational C enter for S cientific R esearch "Demokritos" GR-153 10 Agia Paraskevi, Athens, Greece IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

Recent OCR projects Computational Intelligence Laboratory Institute of Informatics and Telecommunications N ational C enter for S cientific R esearch "Demokritos" GR-153 10 Agia Paraskevi, Athens, Greece IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

Information gain web ontology language Image Video Visual Information Non Visual Information Text Audio Video OCR http://www.casam-project.eu/ IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK Fusion Low-level analysis Interpre tation

Video Logo Detection IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

Border_Detection_v4 [0|1] [infile] [outfile1] [outfile2] parameter [0|1]: 0 -> only border removal, 1 -> border removal & page split parameter [infile]: Input filename (b/w or gray scale image) parameters [outfile1] [outfile2]: Output filenames (b/w or gray scale image) + web service implementation IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK ,[object Object],[object Object]

N. Stamatopoulos, B. Gatos, T. Georgiou, “ Page frame detection for double page document images ”, 9th IAPR International Workshop on Document Analysis Systems (DAS 2010) , pp. 401-408, Cambridge, MA, USA, June 2010. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK ,[object Object],[object Object],[object Object]

1 (Bad) 2 3 4 5 (Good) Av=4.3 Av=3.6 1. Final image almost destroyed! 2. Big part of text is missing 3. Small part of text is missing 4. All text is there, border not completely removed. 5. All text is there, border has been completely removed. 1. Final image almost destroyed! 2. Big part of text is missing 3. Small part of text is missing 4. All text is there, border not completely removed. 5. All text is there, border has been completely removed. 21709 images to test border removal 3003 newspaper images to test border removal

1 (Bad) 2 3 4 5 (Good) Av=3.3 1. Page split fails! 2 Page split with problems. 3. Page split is correct, large parts of noise remains or text is removed 4. Page split is correct, small parts of noise remains or text is removed 5. Page split is correct, only black noise has been removed IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK 3009 images to test page split (results on 50%)

IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK SET A: 38718 randomly selected historical images BL BNE BNF BSB JSI NLB ONB TOTAL #images 3632 11126 12251 4784 4430 706 1789 38718 IMPACT Prec (%) 99.49 99.89 98.88 98.10 98.91 99.86 97.29 99.08 Rec (%) 98.83 99.26 99.40 96.07 99.06 99.73 97.82 98.79 FM (%) 99.16 99.58 99.14 97.07 98.99 99.79 97.55 98.93 D.X Le Prec (%) 94.98 99.68 98.67 97.70 97.35 99.80 97.19 98.30 Rec (%) 99.31 90.65 99.24 95.58 99.21 99.81 99.19 96.63 FM (%) 97.10 94.95 98.26 96.63 98.27 99.80 98.18 97.30 BookRestorer Prec (%) 91.13 96.88 98.08 97.29 94.50 99.79 95.12 96.47 Rec (%) 99.56 91.57 99.77 97.43 99.40 99.85 99.61 97.06 FM (%) 95.16 94.15 98.91 97.36 96.89 99.82 97.31 96.76 WiseBook Prec (%) 86.93 88.57 91.20 95.76 90.69 99.46 80.37 90.20 Rec (%) 98.37 99.47 99.10 96.40 97.29 98.45 98.63 98.56 FM (%) 92.30 93.71 94.99 96.08 93.87 98.95 88.57 94.20 ScanFix Prec (%) 81.65 92.87 91.29 95.62 91.00 99.24 84.52 91.17 Rec (%) 94.97 98.66 98.66 97.81 95.66 80.81 96.98 97.46 FM (%) 87.81 95.68 94.83 96.70 93.27 89.08 90.32 94.21

IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK SET B: 22383 images with noisy black border BL BNE BNF BSB JSI NLB ONB TOTAL #images 1631 7543 7677 2417 1416 315 1384 22383 IMPACT Prec (%) 98.94 99.88 98.29 96.86 98.01 99.98 96.63 98.62 Rec (%) 98.18 99.27 99.26 93.14 99.15 99.87 98.24 98.46 FM (%) 98.56 99.57 98.77 94.96 98.58 99.92 97.43 98.54 D.X Le Prec (%) 88.89 99.58 97.98 96.08 93.20 99.85 96.48 97.28 Rec (%) 99.05 86.64 98.86 91.53 99.09 99.97 99.06 94.01 FM (%) 93.70 92.66 98.42 93.75 96.05 99.91 97.75 95.62 BookRestorer Prec (%) 80.30 95.46 97.00 95.27 84.26 99.83 93.76 94.11 Rec (%) 99.36 93.02 99.68 95.22 99.56 99.96 99.62 96.92 FM (%) 88.82 94.22 98.32 95.24 91.27 99.89 96.60 95.50 WiseBook Prec (%) 70.98 83.24 86.10 92.24 72.44 99.18 74.77 83.32 Rec (%) 99.38 99.49 99.58 95.19 98.36 98.61 99.09 98.94 FM (%) 82.81 90.64 92.35 93.69 83.43 98.89 85.23 90.46 ScanFix Prec (%) 59.23 89.57 86.23 91.96 73.38 99.04 80.14 85.00 Rec (%) 95.42 98.78 99.03 96.54 98.55 80.61 97.67 98.04 FM (%) 73.09 93.95 92.19 94.19 84.12 88.88 88.04 91.05

IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK 3009 images to test page split

IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK 458 images from BNF to test page split

Page_Curl_Correction _v4 [0|1] [infile] [outfile] parameter [0|1]: 0 -> coarse & fine correction, 1 -> only coarse correction parameter [infile]: Input filename (b/w or gray scale image) parameters [outfile] : Output filename (b/w or gray scale image) + web service implementation IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

N. Stamatopoulos, B. Gatos, I. Pratikakis, S.J. Perantonis, “ Goal-oriented Rectification of Camera-Based Document Images ”, IEEE Transactions on Image Processing , DOI: 10.1109/TIP.2010.2080280, 2010 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK ,[object Object],[object Object]

Only Coarse Correction Coarse + Fine Correction Av=4.2 Av=4.5 ,[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK 14706 images to test Page Curl Correction

Only Coarse Correction Coarse + Fine Correction Av=3.9 ,[object Object],[object Object],[object Object],[object Object],[object Object],Av=3.7 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK 3009x2 = 6018 images to test page split

IMPACT Page Curl Correction v.4 87.78% (81.98% only coarse correction) BookRestorer 80.87% N. Stamatopoulos, B. Gatos and I. Pratikakis, " A Methodology for Document Image Dewarping Techniques Performance Evaluation ", 10th International Conference on Document Analysis and Recognition (ICDAR'09) , pp. 956-960, Barcelona, Spain, July 2009. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

0.21 0.91 Character_Segmentation_v3 [WordImageFilename] [XMLOutputFilename] parameter [WordImageFilename]: An image containing a word parameter [XMLOutputFilename] : several character segmentation variations encoded following the XML schema of IBM used in TR3 (Adaptive OCR) IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

Merged characters Broken characters Overlapped characters Noise IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

Segmentation variations encoded following the XML schema used in TR3 N. Nikolaou, M. Makridis, B. Gatos, N. Stamatopoulos, N. Papamarkos, " Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths ", Image and Vision Computing, Vol. 28 , Issue 4, pp. 590-604, 2010. ,[object Object],[object Object],[object Object],[object Object]

0.61 0.79 0.85 0.98 0.94 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

0.83 0.63 0.73 0.89 0.90 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

0.61 0.79 0.94 Evaluation of the result with the highest confidence IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

0.61 0.79 0.94 Evaluation of the best possible result IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

Features based on word profile projections Features based on zones Hybrid features by projections and zones ,[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK x y x y y=y t Upper Boundary Lower Boundary x y x y x y

Features by centers of masses ,[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],(a) Synthetic query word (b) Initial ranking of segmented words. The highlighted words denote correct words selected by the user (c) Ranking after user’s feedback. (a) (b) (c) IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

A. L. Kesidis, E. Galiotou, B. Gatos and I. Pratikakis, “ A word spotting framework for historical machine-printed documents ”, International Journal on Document Analysis and Recognition, DOI: 10.1007/s10032-010-0134-4, pp. 1-14, 2010. A. L. Kesidis, E. Galiotou, B. Gatos, A. Lampropoulos, I. Pratikakis, I. Manolessou and A. Ralli, " Accessing the content of Greek historical documents ", 3rd Workshop on Analytics for Noisy Unstructured Text Data (AND'09), pp. 55-62, Barcelona, Spain, July 2009 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK Search Word Spotting Query by Example Free Text Query User management + settings Guest √ √ √ Administrator √ √ √ √ √

IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK Query by Keyword Query by Example Free Text OFFLINE PREPARATION – ADMINISTRATIVE TASKS Page segmentation and features extraction Admin Admin Admin Keywords definition Admin Letter templates definition Admin Admin Word Spotting by User ’ s feedback Admin ONLINE USAGE Searching All Users All Users All Users

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object], (a) (b) Average precision vs recall diagrams of word spotting in relation to feature extraction methods for (a) Book A and (b) Book B. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

[object Object],[object Object], (a) (b) Average precision vs recall diagrams of word spotting in relation to the user’s feedback involvement for (a) Book A and (b) Book B . IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

http://users.iit.demokritos.gr/~bgat/H-DocPro/ H-DocPro v.1 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 Step 1: Select the directory with your images or copy your images to directory [Install Dir]/images. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 Step 2: Select the directory for saving the results after pressing the "Settings" button. (default save directory: [Install Dir]/Results ) IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 Step 3: Select one or more document images. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 ,[object Object],[object Object],[object Object],[object Object],IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 Step 5: Select the method for every processing module by pressing "<" or ">" on every module at the workflow line. Right click on the module at the workflow line and deselect "Do not recalculate if result exists" if you want to recalculate an existing result. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 Step 6: Execute workflow by pressing "Apply Processes" IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 Step 7: View results on the preview window or right click on any module at the workflow line and select "View Result". If you right click on the right-most module you will view the final result otherwise you will view the intermediate results. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 - Document Image Processing Components Binarization NCSR: Based on "B. Gatos, I. Pratikakis and S. J. Perantonis, Adaptive Degraded Document Image Binarization, Pattern Recognition, Vol. 39, pp. 317-327, 2006" FR8.1: From FineReader Engine v. 8.1. IMPORTANT NOTICES: (a) You must have the engine already intalled. (b) You must edit file [Install Dir]/temp/Binarization/FRkey.txt and add your FineReader license key code IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 - Document Image Processing Components Border Removal Auto: Based on projection profiles and connected component analysis. Auto_Edit: Press inside the marked area and adjust it by draging the black points. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

H-DocPro v.1 - Document Image Processing Components Page Split Auto: Based on "N. Stamatopoulos, B. Gatos, T. Georgiou, Page frame detection for double page document images, 9th IAPR International Workshop on Document Analysis Systems (DAS 2010), pp. 401-408, Cambridge, MA, USA, June 2010" Auto_Edit: Press inside the left or right marked area and adjust it by dragging the black points. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

ASM 2011, 12-13 April 2011, Munich, Germany H-DocPro v.1 - Document Image Processing Components Dewarping Auto: Based on "N. Stamatopoulos, B. Gatos, I. Pratikakis and S.J. Perantonis, Goal-oriented Rectification of Camera-Based Document Images, IEEE Transactions on Image Processing, vol. 20, no. 4, pp. 910-920, 2011." IMPORTANT NOTICES: (a) It needs the MATLAB Component Runtime Installer, (b) it can be applied only to single column documents. Auto_Edit: Manually correct the position of the two lines and the two curves that delimit the text area by draging the corresponding black points. Press ">" button to test the result.

IMPACT Final Conference - Research Parallel Sessions02 research session_ncsr_tools

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to IMPACT Final Conference - Research Parallel Sessions02 research session_ncsr_tools

Similar to IMPACT Final Conference - Research Parallel Sessions02 research session_ncsr_tools (20)

More from IMPACT Centre of Competence

More from IMPACT Centre of Competence (20)

Recently uploaded

Recently uploaded (20)

IMPACT Final Conference - Research Parallel Sessions02 research session_ncsr_tools

Editor's Notes