[RakutenTechConf2013] [C4-1] Text detection in product images


Published on

Rakuten Technology Conference 2013
"Text detection in product images"
Naoki Chiba (Rakuten)

Published in: Technology, Art & Photos
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hello, my name is Naoki Chiba. Today I am going to talk about text detection in product images
  • These are examples of product images, which contain sales pitches such price, store name and shipping information. Text detection’s applications would be content retrieval/filtering, character recognition and text translation into different languages for international sales.
  • Here is the outline of today’s talk. After talking about text detection overview, I am going to review current methods. And then I will talk about Rakuten’s approach.
  • In academia, text detection has been an active area of research for a long time, started from from traditional scanned OCR, which scan documents by a flat-bed scanner. As matter of fact, current text detection is different.Because of the popularity of imaging devices such as mobile cameras, images may contain illumination variations, perspective distortion and the text is shorter than before. Text images can be categorized into two types: digital born text, which was inserted by an editor and natural scene text, which is having a lot of attention in academia.
  • Product images have two different purposes. The first is to show sales pitches. The second is to show a product list to represent product variation. Depending on the purpose, role of text is different.
  • This is an example of a product list. If the image contains store specific information such as merchant’s name, price or shipping information that might not be good.
  • Another example is what we call “Now printing” images. We use this type of image when the product images are not available although the product has been released or we take pre orders. These images are going to be updated, when the product photo is available. But we need to detect them first. The problem here is that they are provided by our merchants, not by Rakuten due to online market place model. We do not know what images they are going to use before hand.
  • In summary, product images can be regarded as text in natural scene images in addition to digital born text, which are mixed and difficult to detect.
  • Next, I am going to show some current methods in academia to detect text in images.
  • Current methods can be categorized into three methods: texture-based, region-based and hybrids of those methods.
  • Texture based method uses special texture to find text by scanning a window across the image. Then it classifies a window by classifiers such as Support Vector Machine, AdaBoost or Neural network. But it has two problems. First, it is scale and rotation variant. Second is that the computational cost is high.
  • The second method is called region-based method. It examines local features either edges or color clustering, followed by connected component analysis, text line grouping and word separation. But the problem is that it may contain a lot of false candidates. Therefore the third type is
  • Therefore, the third type is hybrid method, which is getting a lot of attentions these days. It is based on region-based method either by edges or by color clustering. And then it confirms fthat the detected regions are text or not by a classifier using machine learning techniques.
  • Still there are some problems. We would like to solve the following two problems. One is character/word annotation. So character annotation is a time-consuming task, especially when we have a lot of data. Also transparent text is hard to detect.
  • For example, character annotation is to locate rectangles on top of text characters by hand. If the image contains a lot of characters, annotation by a human operator is very time consuming, especially, when we have a lot of images.
  • Another problem we would like to solve is transparent text, which is difficult to detect, because the edges are weak. But once we detect them, there is a possibility to recover the background behind the text.
  • So to solve these problems, I would like to show what RI, Rakuten Institute of Technology, is doing.
  • To avoid character/word annotation, we built a text image classifier by using only image-wise annotation, which is much more efficient. We are also working on transparent text detection and background recovery. I am going to show the details of the two.
  • Our text image detection is based on image-wise annotation, which is much less time than character or word annotation.By clustering detected regions by a machine learning technique, we can get a measure of text likeliness.
  • When each detected region can be represented by image features f1 and f2, we cluster them by the features. Based on image-wise annotation, we can a probability of being text for each cluster. For example, red dots show regions appeared in text images and blue dots in non-text. Tcluster C4 has regions only appeared in non-text images, it is unlikely to be text.
  • We measured the performance against a typical previous method. It was significantly better. The accuracy has been increased around 20%.
  • Another problem we are solving is transparent text and background recovery.
  • We propose adaptive edge detection by analyzing image content. To recover background, we estimate text color and opacity which is transparency of text.
  • These are examples of detected edges. Compared with traditional edge detectors such as Sobel or Canny, ours are better.
  • Let me introduce how we do our detection. We measure image complexity as texture strength by analyzing image content. We can measure it by eigenspace analysis. Based on the texture strength, we can setup edge detection thresholds adaptively.
  • To detect text, we are having a hybrid method. Based on region-based, edge stroke width transform with a machine learning technique.
  • Here is a system flow. After adaptive edge detection, we work on component analysis and detect text. Once we detect text, we can recover background.
  • Here are examples. Our system was able to detect transparent text.
  • Here is a system flow. After adaptive edge detection, we work on component analysis and detect text. Once we detect text, we can recover background.
  • Transparent text can be represented by this formula. Observing pixel vales, I, are mixture of background values , O, and text color T. The mixing ratio is determined by opacity gamma. Assuming that text color and opacity are uniform in the text, we can solve these parameters by a least square method when we have two sets or more data because the number of unknown parameters is two.
  • This is an example of recovered image.
  • We also compared with a previous method called InPainting, which tries to fill the hole of text by surrounding pixel pattern. Although InPainting cannot recover the original content, in this case small hole, ours was able to recover it.
  • Thank you for your attention. The details will be presented at Asian Conference on Pattern Recognition next month.
  • [RakutenTechConf2013] [C4-1] Text detection in product images

    1. 1. Text detection in product images 10/26/2013 Naoki Chiba, Lead Scientist Rakuten Institute of Technology Rakuten Inc. http://rit.rakuten.co.jp/
    2. 2. Product images Sales pitches in images Applications: • Content retrieval/filtering • Recognition • Translation 2
    3. 3. RIT Text Detector Far more accurate Works like magic 3
    4. 4. Outline 1 Text detection overview 2 Current methods 3 RIT’s approach 4
    5. 5. Outline 1 Text detection overview 2 Current methods 3 RIT’s approach 5
    6. 6. Academic Research • Natural scene OCR ≠ traditional scanned OCR – – – – Camera captured Illumination variations Perspective distortion Short text Digital-born text Natural-scene text Source: ICDAR Text locating competition 6
    7. 7. Product Images - Two Purposes Text’s role is different 1. Sales pitches 1. Product list 7
    8. 8. Product list Sales pitch (Merchant’s names, Price, Shipping) 8
    9. 9. “Now Printing” images Showing image unavailability, but.. Not Updated 9
    10. 10. Text detection for product images More accurate Much Faster 10
    11. 11. Outline 1 Text detection overview 2 Current methods 3 RIT’s approach 11
    12. 12. Current methods 1. Texture based (Classifier-based) 2. Region based (Connected components) 3. Hybrids 12
    13. 13. 1. Texture-based method • Special texture • Scan • Classifier (SVM, AdaBoost or Neural network) Problems: • Scale/Rotation variant • High computation 13
    14. 14. 2. Region-based method • Local features (edges or color clustering) • Connected component analysis • Text lines and word separation Output of Stroke width transform Problem: • False candidates 14
    15. 15. 3. Hybrid method B Classifier SVM Random Forrest AdaBoost Region based Edge (Stroke Width Transform) Color clustering 15
    16. 16. Problems 1. Character/word annotation Time-consuming task 2. Transparent text Hard to detect 16
    17. 17. Problem 1: Character/word annotation Time consuming for many images 17
    18. 18. Problem 2: Transparent text ? • Weak edges (difficult to detect) 18
    19. 19. Outline 1 Text detection overview 2 Current methods 3 RIT’s approach 19
    20. 20. RIT’s Approach 1. Character/word annotation Time-consuming task Text image classifier using imagewise annotation 2. Transparent text Hard to detect Transparent text detection and background recovery 20
    21. 21. 1. Text image classifier using image-wise annotation • Text image detection (not char/word) – Image-wise annotation (less time) – Clustering detected regions (measure text likeliness) 21
    22. 22. Image-wise Annotation 送料無料 text Draw rectangles Character-wise non-text Classify text/non-text Image-wise 22
    23. 23. f2 Clustering detected regions P(C1) = 3/4 x x C1 C5 x C3 x x C2 P(C4) = 0/3 C4 Region in text images Region in non-text images x f1 Cluster center 23
    24. 24. Comparison Better than a typical method Accuracy 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% Current Proposed • Rakuten 500 images • Compared w/a traditional region-based method 24
    25. 25. RIT’s Approach 1. Character/word annotation Time-consuming task Text image classifier using imagewise annotation 2. Transparent text Hard to detect Transparent text detection and background recovery 25
    26. 26. 2. Transparent text detection and background recovery • Edge Detection with adaptive threshold – Image content analysis • Background recovery – Text color/opacity estimation 26
    27. 27. Edge detection with adaptive thresholds • Less noise Weak edges are better preserved 27
    28. 28. Texture strength Measuring image complexity Image patches: Direction and energy: eigenvectors and eigenvalues[1] Texture strength: [1] Xiang Zhu and Peyman Milanfar, “Automatic parameter selection for denoising algorithms using a no-reference measure of image content,” IEEE transactions on image processing, pp. 3116–32, 2010. 28
    29. 29. Proposed text detection 1. Texture based (Classifier based) SVM/Random Forest/AdaBoost 2. Region based (Connected components) Edge/Color Clustering 3. Hybrids Region (Edge Stroke Width) + Texture (AdaBoost) 29
    30. 30. System flow • Input image Components Analysis Adaptive Edge detection Stroke width transform and Connected component Detected text 30
    31. 31. Detection result (a) constant threshold (b) proposed 31
    32. 32. System flow • Input image Components Analysis Adaptive Edge detection Detected text Stroke width transform and Connected component Background recovery 32
    33. 33. Transparent Text opacity I text color I = O(1- r)+ rT O I: observed pixel value O: original pixel value • 2 >= equations • Least squares solution • 2 unknown 33
    34. 34. Extraction result (a) original (b) recovered 34
    35. 35. Comparison with InPainting Original Magic Patented! InPainting Rakuten 35
    36. 36. Details: ACPR 2013 Thank you! 36