IMPACT Final Conference - NCSR - Wordspotting

850 views

Published on

IMPACT Final Conference - NCSR - Wordspotting

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
850
On SlideShare
0
From Embeds
0
Number of Embeds
323
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • Outline of your presentation:
  • IMPACT Final Conference - NCSR - Wordspotting

    1. 1. IMPACT Tools Developed by NCSR IMPACT Final Conference 2011 24-25 October 2011, London, UK B. Gatos Computational Intelligence Laboratory Institute of Informatics and Telecommunications National Center for Scientific Research ( NCSR ) "Demokritos" GR-153 10 Agia Paraskevi, Athens, Greece
    2. 2. <ul><li>Develop an alternative technique for historical document indexing </li></ul><ul><ul><li>based on spotting words directly on document images </li></ul></ul><ul><ul><li>avoiding the conventional OCR procedure </li></ul></ul><ul><li>Provide three methods for word spotting: </li></ul><ul><ul><li>Selecting the query from a predefined list of keywords </li></ul></ul><ul><ul><li>Query by example </li></ul></ul><ul><ul><li>Free text query </li></ul></ul><ul><li>Incorporate the whole word spotting functionality in a GUI tool </li></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    3. 3. <ul><li>The main operational parts of the Word Spotting application are: </li></ul><ul><li>Page segmentation </li></ul><ul><li>Feature extraction </li></ul><ul><li>Marking character templates </li></ul><ul><li>Word matching </li></ul><ul><li>User feedback </li></ul><ul><li>Query selection by example </li></ul><ul><li>Free text synthetic query creation </li></ul><ul><li>Searching </li></ul><ul><li>User access control </li></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    4. 4. <ul><li>Main steps (1/2) </li></ul><ul><li>Document pages </li></ul><ul><ul><li>Select pages from the documents corpus </li></ul></ul><ul><ul><li>Apply word segmentation to the pages </li></ul></ul><ul><ul><li>Apply feature extraction to all segmented </li></ul></ul><ul><ul><li>words </li></ul></ul><ul><li>Query </li></ul><ul><ul><li>Define the list of keywords </li></ul></ul><ul><ul><li>Select the query keyword from the list </li></ul></ul><ul><ul><li>Mark the character templates </li></ul></ul><ul><ul><li>Create a synthetic query image </li></ul></ul><ul><ul><li>Apply feature extraction to the query image </li></ul></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    5. 5. <ul><li>Main steps (2/2) </li></ul><ul><li>Matching and User feedback </li></ul><ul><ul><li>Word matching </li></ul></ul><ul><ul><li>User feedback </li></ul></ul><ul><ul><li>Selecting the final results </li></ul></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    6. 6. <ul><li>Marking character templates </li></ul><ul><ul><li>Applied directly on a text image </li></ul></ul><ul><ul><li>Character baseline adjustment </li></ul></ul><ul><ul><li>Performed “once-for-all” and can be used for entire books or collections with similar text characteristics </li></ul></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    7. 7. <ul><li>Feature extraction & word matching </li></ul><ul><ul><li>Describe each word (synthetic or real) by a set of features </li></ul></ul><ul><ul><li>Normalize </li></ul></ul><ul><ul><li>Match by checking similarity based on features </li></ul></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    8. 8. Features based on word profile projections Features based on zones Hybrid features by projections and zones <ul><li>Feature extraction & word matching </li></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK x y x y y=y t Upper Boundary Lower Boundary x y x y x y
    9. 9. Features by centers of masses <ul><li>Feature extraction & word matching </li></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    10. 10. <ul><li>User feedback example </li></ul>(a) Synthetic query word (b) Initial ranking of segmented words. The highlighted words denote correct words selected by the user (c) Ranking after user’s feedback. (a) (b) (c) IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    11. 11. <ul><li>Searching </li></ul><ul><ul><li>Allow the user to search the image corpus for instances of query keywords that have already undergone the user feedback process . </li></ul></ul><ul><ul><li>The user selects one of the processed keywords and the application shows all the instances of this keyword in the images of the corpus. </li></ul></ul><ul><ul><li>The user can navigate through the results in an instance level (showing one instance per time) or in a page level (showing all instances in a page). </li></ul></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    12. 12. A. L. Kesidis, E. Galiotou, B. Gatos and I. Pratikakis, “ A word spotting framework for historical machine-printed documents ”, International Journal on Document Analysis and Recognition, DOI: 10.1007/s10032-010-0134-4, pp. 1-14, 2010. A. L. Kesidis, E. Galiotou, B. Gatos, A. Lampropoulos, I. Pratikakis, I. Manolessou and A. Ralli, &quot; Accessing the content of Greek historical documents &quot;, 3rd  Workshop on Analytics for Noisy Unstructured Text Data (AND'09), pp. 55-62, Barcelona, Spain, July 2009 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    13. 13. <ul><li>Main steps </li></ul><ul><li>Document pages (applied once) </li></ul><ul><li>Query </li></ul><ul><ul><li>Select (by cropping) a query word image from a page </li></ul></ul><ul><ul><li>Apply feature extraction to the query image </li></ul></ul><ul><li>Matching </li></ul><ul><ul><li>Match query features to all segmented words features </li></ul></ul><ul><ul><li>Rank the segmented words by similarity </li></ul></ul><ul><ul><li>Return the most similar segmented words </li></ul></ul><ul><ul><li>No user feedback! </li></ul></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    14. 14. <ul><li>Main steps </li></ul><ul><li>Document pages (applied once) </li></ul><ul><li>Query </li></ul><ul><ul><li>Type the query text </li></ul></ul><ul><ul><li>Construct a synthetic query image using letter templates (provided by an administrator) </li></ul></ul><ul><ul><li>Apply feature extraction to the query image </li></ul></ul><ul><li>Matching </li></ul><ul><ul><li>Match query features to all segmented words features </li></ul></ul><ul><ul><li>Rank the segmented words by similarity </li></ul></ul><ul><ul><li>Return the most similar segmented words </li></ul></ul><ul><ul><li>No user feedback! </li></ul></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    15. 15. <ul><li>Two levels </li></ul><ul><li>Guest </li></ul><ul><li>Administrator </li></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK Search Word Spotting Query by Example Free Text Query User management + settings Guest √ √ √ Administrator √ √ √ √ √
    16. 16. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK Query by Keyword Query by Example Free Text OFFLINE PREPARATION – ADMINISTRATIVE TASKS Page segmentation and features extraction Admin Admin Admin Keywords definition Admin Letter templates definition Admin Admin Word Spotting by User ’ s feedback Admin ONLINE USAGE Searching All Users All Users All Users
    17. 17. <ul><li>Document corpus </li></ul><ul><li>French book </li></ul><ul><li>( 153 pages, 47836 words) </li></ul><ul><li>German book </li></ul><ul><li>( 126 pages, 24596 words) </li></ul><ul><li>Segmentation </li></ul><ul><li>Projections </li></ul><ul><li>RLSA </li></ul><ul><li>USAL1 (Connected components) </li></ul><ul><li>USAL2 (Projections) </li></ul><ul><li>Features </li></ul><ul><li>Hybrid (Projections+Zones) </li></ul><ul><li>Center of Masses </li></ul><ul><li>Overall 80 experiments </li></ul><ul><li>Each experiment performed </li></ul><ul><ul><li>Without user feedback </li></ul></ul><ul><ul><li>With 1, 2, and 3 user selected words </li></ul></ul><ul><li>Keywords </li></ul><ul><li>5 keywords per book </li></ul><ul><li>French : Le Dernier fils de France, ou le Duc de Normandie, fils de Louis XVI et de Marie-Antoinette, par A. , 1838 </li></ul><ul><li>German : Aufschlüsse zur Magie aus geprüften Erfahrungen über verborgene philosophische Wissenschaften und verdeckte Geheimnisse der Natur , 1788 </li></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    18. 18. <ul><li>Feature extraction </li></ul><ul><ul><li>Hybrid provided better results + is faster than Center of masses </li></ul></ul> (a) (b) Average precision vs recall diagrams of word spotting in relation to feature extraction methods for (a) Book A and (b) Book B. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    19. 19. <ul><li>User feedback </li></ul><ul><ul><li>User feedback improved the results </li></ul></ul> (a) (b) Average precision vs recall diagrams of word spotting in relation to the user’s feedback involvement for (a) Book A and (b) Book B . IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    20. 20. <ul><li>Segmentation issues </li></ul><ul><li>Query by Example + Free Text Query </li></ul><ul><ul><li>In Query by Example the performance is similar to User Feedback when one relevant instance is selected </li></ul></ul><ul><ul><li>In both methods the results are related to the similarity threshold </li></ul></ul>IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
    21. 21. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK

    ×