Text extraction from natural scene image, a survey

Text extraction from natural
scene image: A survey
Honggang Zhang, Kaili Zhao, Yi-Zhe Song, Jun Guo
Neurocomputing 122 (2013)

Natural images everywhere
 We want to detect text from natural images

Overview
Input Images Pre-processing
Text Detection
& Localization
 Detect text locations and boundary boxes

Overview
Text Enhancement
& Segmentation
Text
Recognition
(OCR)
Text
 Text regions - low-resolution & noise
 Segment text from the background

Text detection & localization
a. Edge based methods
b. Texture based methods
c. Connected Component(CC)-based methods
d. Stroke based methods
e. Others

Edge based text detection
 Idea : Scene texts are designed to be easily read, thus have strong edges
 Methods : Edge detector (e.g. Canny operator) and binarization method are
used to extract text and to eliminate non-text regions
+ Efficient and simple !
- Sensitive to the influence of shadow or highlight
N. Ezaki, M. Bulacu, and L. Schomaker, “Text detection from natural scene images: Towards a system for
visually impaired persons,” in Int. Conf. on Pattern Recognition, Cambridge, UK, Aug. 2004, pp. 683–686

Texture based text detection
 Idea : Find distinct textural properties from non-text regions(background)
 Methods : Gaussian filtering, Histogram of oriented gradients (HOG), Wavelet
decomposition, Fourier transform, Discrete Cosine Transform (DCT), Local Binary Pattern
(LBP)
 Extract features over a certain region
 Identify the existence of text by classifier
+ Can detect and localize texts
accurately even from noisy images
- Relatively slow, sensitive to
text alignment & orientation
 Some advanced techniques:
 Coars-to-fine strategy -> fast
 Local Haar Binary Pattern (LHBP) –> preserve & uniform inconsistent text-background contrasts
(a) input image (640 480) (b) texture classification result
Kim, Kwang In, Keechul Jung, and Jin Hyung Kim. "Texture-based approach for text detection in images using support vector machines and
continuously adaptive mean shift algorithm." Pattern Analysis and Machine Intelligence, IEEE Transactions on 25.12 (2003): 1631-1639.

Connected component-based text detection
 Idea : Segment candidate text components by edge detection or color clustering,
and prune non-text components with classifiers
 Methods :
 group small components into successively larger components until all regions are
identified in the image (bottom-up approach)
 Identify text components and group them to localize text regions
 Block adjacency graph(BAG) - connected component extraction
 Priority Adaptive segmentation(PAS) – character segmentation
+ low computation cost, can be directly used for text recognition
- Cannot segment accurately without prior knowledge (text position, scale)
- Designing fast and reliable connected component analyzer is difficult due to many
confusing non-text regions

Stroke based text detection
 Idea : Text = a combination of stroke components
 Methods :
1) By segmentation, text stroke candidates are extracted
(Gabor filter, Stroke Width Transform(SWT))
1) verification by feature extraction and classification
2) grouping by clustering
+ provide robust and nearly constant stroke features
(e.g. stroke width)
+ Intuitive & simple, therefore easy to implement
- complex backgrounds can be problem
Text tends to maintain fixed stroke width
Epshtein, Boris, Eyal Ofek, and Yonatan Wexler. "Detecting text in natural scenes with stroke width
transform." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.
An example of SWT based text detection

Others
1. Some hybrid approaches to deal with many
variations in text
2. Detect texts of arbitrary orientations with
rotation-invariant features based on SWT
3. Color reduction method: reduce the total
number of colors in each RGB components
4. Small letter detection in images, Limited to
some standard font sizes (remove less than 10
pixels) …
Yao, Cong, et al. "Detecting texts of arbitrary orientations in natural images."Computer
Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
Kumar, Manoj, Young Chul Kim, and Guee Sang Lee. "Text detection using multilayer separation in real scene
images." Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010.

Text enhancement & segmentation
 Tranditional OCR software are suffering from natural scene, low resolution
images
 Enhancing and segmenting text with complex background (noisy images)
 Many advanced binarization algorithm for text enhancement is proposed
ex) Transform the gray level of each pixel to the new domain
(a) Badly illuminated document images (b) binarization
Valizadeh, M., et al. "A novel hybrid algorithm for binarization of badly illuminated document
images." Computer Conference, 2009. CSICC 2009. 14th International CSI. IEEE, 2009.

Further survey - OCR with deep learning
 OCR with Convolutional Neural Network(CNN) on some challenging images
 8 dataset from sports video, google street view, google image search, natural scene
images, news image) – total 9 million images (900k validation set)
 Outperform existing state-of-the-art approaches (90~98% accuracy)
 Ex) BBC news text search
Jaderberg, Max, et al. "Reading Text in the Wild with Convolutional Neural Networks." arXiv preprint arXiv:1412.1842 (2014).
Result sample
Many word bounding box proposals Reduce FP by random forest classifier

Public dataset
A. 2003/2005 ICDAR Text Localization Contest trail
test database
 251 images, ground truth of the word bounding boxes
 Most widely used database
- Most of the texts are horizontal.
- All the texts are in English
B. KAIST Scene Text Database
 3000 images in different environments (outdoors,
indoors, under different lighting conditions)
 Captured either by high-resolution camera or low-
resolution mobile phone camera
 Scene texts are in Korean, English, and mixed language
C. The Street View Text (SVT) dataset
 Google street view images
 High variation, low resolution
D. NEOCR (Natural Environment OCR dataset)
 659 real world images with 5238 annotated bounding
boxes
A
B
C
D

Applications
 Google Goggles : translate the world into text information
 Baidu translation

Text extraction from natural scene image, a survey

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Text extraction from natural scene image, a survey

Similar to Text extraction from natural scene image, a survey (20)

More from SOYEON KIM

More from SOYEON KIM (20)

Recently uploaded

Recently uploaded (20)

Text extraction from natural scene image, a survey