First Thesis Presentation

2,042 views

Published on

More information about our thesis can be found on our blog: http://augmentjapan.wordpress.com/ .

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,042
On SlideShare
0
From Embeds
0
Number of Embeds
543
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • We found a good number of papers regarding finding text in images. A common problem among these papers, however, is the general lack of details in regards to their methods and/or uses parameters and thresholds. Other methods use Simple Vector Machines and other Machine Learning techniques, for which we do not have the required data to train them, nor the knowledge on how to use them properly. Our last hope lies with a method presented as the Stroke Width Transform. We found a C++ implementation of a variation of this method, and are currently testing to see if we can use it.
  • We found a good number of papers regarding finding text in images. A common problem among these papers, however, is the general lack of details in regards to their methods and/or uses parameters and thresholds. Other methods use Simple Vector Machines and other Machine Learning techniques, for which we do not have the required data to train them, nor the knowledge on how to use them properly. Our last hope lies with a method presented as the Stroke Width Transform. We found a C++ implementation of a variation of this method, and are currently testing to see if we can use it.
  • We found a good number of papers regarding finding text in images. A common problem among these papers, however, is the general lack of details in regards to their methods and/or uses parameters and thresholds. Other methods use Simple Vector Machines and other Machine Learning techniques, for which we do not have the required data to train them, nor the knowledge on how to use them properly. Our last hope lies with a method presented as the Stroke Width Transform. We found a C++ implementation of a variation of this method, and are currently testing to see if we can use it.
  • We found a good number of papers regarding finding text in images. A common problem among these papers, however, is the general lack of details in regards to their methods and/or uses parameters and thresholds. Other methods use Simple Vector Machines and other Machine Learning techniques, for which we do not have the required data to train them, nor the knowledge on how to use them properly. Our last hope lies with a method presented as the Stroke Width Transform. We found a C++ implementation of a variation of this method, and are currently testing to see if we can use it.
  • Masao Uchiyama Hitoshi Isahara
  • First Thesis Presentation

    1. 1. Matthias Vandenbussche Annelies Van der Borght Promotor: Erik Duval Supervisor: Sten Govaerts Stand-in Supervisor: Gonzalo Parra http://augmentjapan.wordpress.com/
    2. 2. <ul><li>Introduction </li></ul><ul><li>Application </li></ul><ul><li>Related Work </li></ul><ul><li>Survey </li></ul><ul><li>Locating Text </li></ul><ul><li>Preprocessing Text Regions </li></ul><ul><li>Optical Character Recognition (OCR) </li></ul><ul><li>Translations </li></ul><ul><li>Paper Prototype </li></ul><ul><li>Technologies </li></ul><ul><li>Schedule </li></ul><ul><li>Statistics </li></ul>
    3. 6. <ul><li>Mobile application </li></ul><ul><li>Camera feed </li></ul><ul><li>Find text </li></ul><ul><li>Translate </li></ul><ul><li>Augmented reality display </li></ul>
    4. 7. <ul><li>Input: image  video </li></ul><ul><li>Locating text: automatic  user input  combination </li></ul><ul><li>Translation: word per word  sign </li></ul>
    5. 8. 225
    6. 17. <ul><li>Problems: </li></ul><ul><li>Most lack detail </li></ul><ul><li>Most use Simple Vector Machines (SVM) or Neural Networks (NN) </li></ul>
    7. 20. <ul><li>TiRG </li></ul><ul><li>Stroke Width Transform (B. Ephstein et al., 2010) </li></ul><ul><li>Requiring user interaction </li></ul>
    8. 23. NHOCR Tesseract WiseTREND Aspire  ABBYY Finereader
    9. 24. <ul><li>Best recognition rates: </li></ul><ul><li>Best perfect match rates: </li></ul>NHOCR Tesseract ABBYY 39.5% 49% 65.5% NHOCR Tesseract ABBYY 11% 3% 31%
    10. 25. Google Translate Bing Translator SYSTRAN InterTran WordLingo myGengo OneHourTranslation Apertium F-measure BLEU NIST [0;1] [0;1] ? [0;11]
    11. 30. “ We are very sorry for the inconvenience” “ Sorry, my wife might read over the evil described in 菩”
    12. 35. <ul><li>Pre-test questionnaire </li></ul><ul><li>Briefing </li></ul><ul><li>4 scenarios </li></ul><ul><li>Post-test questionnaire </li></ul><ul><li>System Usability Scale (SUS) </li></ul>
    13. 37. <ul><li>Video feed on startup </li></ul><ul><li>► Information message on startup </li></ul><ul><li>Option menu flow on Android </li></ul><ul><li>► remove buttons </li></ul><ul><li>“ X” in iPhone option menu </li></ul><ul><li>► remove or rename to “done” </li></ul><ul><li>“ Languages”  “Translation Services” </li></ul><ul><li>► Renaming of options </li></ul>
    14. 38. <ul><li>Access camera feed </li></ul><ul><li>Alter camera feed </li></ul><ul><li>Locate text </li></ul><ul><li>Track text </li></ul><ul><li>Access to accelerometer </li></ul>
    15. 39. Camera Augment video Locating text Tracking Sensor HTML5 iPhone    Android phone    PhoneGap iPhone    Android phone    Native app iPhone      Android phone     
    16. 44. <ul><li>Now - March: finishing implementation </li></ul><ul><li>Begin March: first iteration </li></ul><ul><li>Begin April: second iteration </li></ul><ul><li>~16 April: comparative user tests </li></ul><ul><li>Begin May - end: writing final text </li></ul>
    17. 45. <ul><li>Own blog: </li></ul><ul><ul><li>19 posts </li></ul></ul><ul><ul><li>15 comments </li></ul></ul><ul><li>Matthias: </li></ul><ul><ul><li>26 comments on other blogs </li></ul></ul><ul><ul><li>172 #thesis11 </li></ul></ul><ul><ul><li>243h 45min worked </li></ul></ul><ul><li>Annelies: </li></ul><ul><ul><li>15 comments on other blogs </li></ul></ul><ul><ul><li>110 #thesis11 </li></ul></ul><ul><ul><li>247h 45min hours worked </li></ul></ul>
    18. 46. <ul><li>Camera-based Kanji OCR for Mobile-phones: Practical Issues (M. Koga et al., 2005) </li></ul><ul><li>Translation camera on mobile phone (Y. Watanabe et al., 2003) </li></ul><ul><li>TranslatAR: A mobile augmented reality translator (V. Fragoso et al., 2011) </li></ul><ul><li>Automatic detection and translation of text from natural scenes (J. Yang et al., 2002) </li></ul><ul><li>TiRG: http://sourceforge.net/projects/tirg/ </li></ul><ul><li>Detecting Text in Natural Scenes with Stroke Width Transform (B. Epshtein et al., 2010) Implementation at: https://sites.google.com/site/roboticssaurav/strokewidthnokia </li></ul><ul><li>Text/Graphics Separation and Skew Correction of Text Regions of Business Card Images for Mobile Devices (A. F. Mollah et al., 2010) </li></ul><ul><li>Correction of perspective text image based on gradient method (L. Tong and Y. Zhang, 2010) </li></ul><ul><li>Reliable measures for aligning japanese-english news articles and sentences (M. Utiyama and H. Isahara, 2003) The set can be found at: http://mastarpj.nict.go.jp/%7Emutiyama/jea/ </li></ul><ul><li>F-measure: Evaluation of machine translation and its evaluation (J. Turian et al., 2003) </li></ul><ul><li>NIST: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics (G. Doddington, 2002) </li></ul><ul><li>BLEU: a method for automatic evaluation of machine translation (K. Papineni et al., 2002) </li></ul><ul><li>Automatic Detection and Translation of Text from Natural Scenes (Jie Yang et al., 2002) </li></ul><ul><li>SUS - A quick and dirty usability scale (J. Brooke, 1996) </li></ul>
    19. 47. <ul><li>A comprehensive method for multilingual video text detection, localization, and extraction (M. R. Lyu et al., 2005) </li></ul><ul><li>A real-time tracker for markerless augmented reality (A. I. Comport et al., 2003) </li></ul><ul><li>A robust text detection algorithm in images and video frames (Qixiang Ye et al., 2003) </li></ul><ul><li>Automatic detection and recognition of signs from natural scenes (Xilin Chen et al., 2004) </li></ul><ul><li>Automatic detection and translation of text from natural scenes (Jie Yang et al., 2002) </li></ul><ul><li>Automatic text location in images and video frames (A. K. Jain and Bin Yu, 1998) </li></ul><ul><li>Camera-based Kanji OCR for Mobile-phones: Practical Issues (M. Koga et al., 2005) </li></ul><ul><li>Comparative Evaluation of Online Machine Translation Systems with Legal Texts (Chunyu Kit and Tak Ming Wong, 2008) </li></ul><ul><li>Correction of perspective text image based on gradient method (Lijing Tong and Yan Zhang, 2010) </li></ul><ul><li>Design-based research: what we learn when we engage in design of interactive systems (Željko Obrenović, 2007) </li></ul><ul><li>Detecting Text in Natural Scenes with Stroke Width Transform (Boris Epshtein, 2010) </li></ul><ul><li>Detection of Text on Road Signs From Video (W Wu et al., 2005) </li></ul><ul><li>Evaluation of machine translation and its evaluation (Joseph P. Turian et al., 2003) </li></ul><ul><li>Fast and robust text detection in images and video frames (Q. Ye et al., 2005) </li></ul><ul><li>Kanji recognition in scene images without detection of text fields - robust against variation of viewpoint, contrast, and background texture (A. Suzuki et al., 2004) </li></ul><ul><li>Markerless augmented reality with a real-time affine region tracker (V Ferrari et al., 2001) </li></ul><ul><li>Multiple target detection and tracking with guaranteed framerates on mobile phones (D. Wagner et al., 2009) </li></ul><ul><li>Performance Evaluation for Text Localization Algorithms: An Empirical Study (Yi-Feng Pan and Cheng-Lin Liu, 2010) </li></ul><ul><li>Real-time vision-based camera tracking for augmented reality applications (Dieter Koller et al., 1997) </li></ul><ul><li>Robust text detection in natural images with edge-enhanced maximally stable extremal regions (Huizhong Chen et al., 2011) </li></ul><ul><li>Sequential correction of perspective warp in camera-based documents (Camille Monnier et al., 2005) </li></ul><ul><li>Snoopertext: A multiresolution system for text detection in complex visual scenes (Minetto, R. Et al., 2010) </li></ul><ul><li>Text Detection on Nokia N900 Using Stroke Width Transform (Saurav Kumar and Andrew Perrault, 2010) </li></ul><ul><li>Text extraction of street level images (J Fabrizio et al., 2009) </li></ul><ul><li>Text information extraction in images and video: a survey (K. Jung, 2004) </li></ul><ul><li>Text locating from natural scene images using image intensities (Jisoo Kim et al., 2005) </li></ul><ul><li>Text/Graphics Separation and Skew Correction of Text Regions of Business Card Images for Mobile Devices (Ayatullah Faruk Mollah et al., 2010) </li></ul><ul><li>TranslatAR: A mobile augmented reality translator (Victor Fragoso et al., 2011) </li></ul><ul><li>Translation and the Internet: Evaluating the Quality of Free Online Machine Translators (Stephen Hampshire and Carmen Porta Salvia, 2010) </li></ul><ul><li>Translation camera on mobile phone (Y. Watanabe et al., 2003) </li></ul><ul><li>Video text recognition using feature compensation as category-dependent feature extraction (M. Mori, 2003) </li></ul>
    20. 48. <ul><li>A Fast Skew Correction Technique for Camera Captured Business Card Images (A. F. Mollah, 2009) </li></ul><ul><li>A new robust algorithm for video text extraction (E. Wong, 2003) </li></ul><ul><li>An evaluation tool for machine translation: Fast evaluation for MT research (S. Nieen et al., 2000) </li></ul><ul><li>An Overview of the Tesseract OCR Engine (Ray Smith, 2007) </li></ul><ul><li>Automatic evaluation of machine translation quality using n-gram co-occurrence statistics (George Doddington, 2002) </li></ul><ul><li>Automatic location of text in video frames (Xian-Sheng Hua et al., 2001) </li></ul><ul><li>BLEU: a Method for Automatic Evaluation of Machine Translation (Kishore Papineni et al., 2002) </li></ul><ul><li>Camera-based analysis of text and documents: a survey (Jian Liang et al., 2005) </li></ul><ul><li>Character extraction of license plates from video (Y. T. Cui and Q. Huang, 1997) </li></ul><ul><li>Color Edge Detection Using Multiscale Quaternion Convolution (Jiangyan Xu et al., 2010) </li></ul><ul><li>Connected components labeling - algorithms in Mathematica, Java, C++ and C# (Mariusz Jankowski and Jens-Peer Kuska , 2004) </li></ul><ul><li>End-to-End Scene Text Recognition (Kai Wang et al., 2011) </li></ul><ul><li>Error Evaluation and Applicability of OCR Systems (V. Alexandrov, 2003) </li></ul><ul><li>Extraction of illusory linear clues in perspectively skewed documents (M. Pilu, 2001) </li></ul><ul><li>Fast, cheap, and creative: Evaluating translation quality using Amazon's Mechanical Turk (Chris Callison-Burch, 2009) </li></ul><ul><li>From Mirroring to Guiding: A Review of State of the Art Technology for Supporting Collaborative Learning (Amy Soller et al., 2005) </li></ul><ul><li>Improvement of video text recognition by character selection (T. Mita and O. Hori, 2001) </li></ul><ul><li>JEIDA's Test-Sets for Quality Evaluation of MT Systems: Technical Evaluation from the Developer's Point of View (Hitoshi Isahara, 1995) </li></ul><ul><li>Kanji Character Detection from Complex Real Scene Images based on Character Properties (Lianli Xu et al., 2008) </li></ul><ul><li>Localizing and segmenting text in images and videos (Rainer Lienhart and Axel Wernicke, 2002) </li></ul><ul><li>Locating text in complex color images (Y. Zhong et al., 1995) </li></ul><ul><li>Marker-less Vision Based Tracking for Mobile Augmented Reality (D. Beier et al., 2003) </li></ul><ul><li>Objective evaluation criteria for machine translation (A. J. Petit, 1977) </li></ul><ul><li>Perspective Correction Methods for Camera-Based Document Analysis (L. Jagannathan and C. V. Jawahar, 2005) </li></ul><ul><li>Re-evaluating machine translation results with paraphrase support (Liang Zhou et al., 2006) </li></ul><ul><li>Re-evaluating the Role of BLEU in Machine Translation Research (Chris Callison-Burch et al., 2006) </li></ul><ul><li>Reliable measures for aligning Japanese-English news articles and sentences (Masao Utiyama and Hitoshi Isahara, 2003) </li></ul><ul><li>SUS - A quick and dirty usability scale (John Brooke, 1996) </li></ul><ul><li>Text detection and segmentation in complex color images (C. Garcia and X. Apostolidis, 2000) </li></ul><ul><li>Text scanner with text detection technology on image sequences (T. Kurata and M. Kourogi, 2002) </li></ul><ul><li>TextFinder: An Automatic System to Detect and Recognize Text In Images (Victor Wu et al., 1999) </li></ul><ul><li>Using multiple edit distances to automatically grade outputs from Machine translation systems (Yasuhiro Akiba et al., 2006) </li></ul>

    ×