5. Text detection & localization
a. Edge based methods
b. Texture based methods
c. Connected Component(CC)-based methods
d. Stroke based methods
e. Others
6. Edge based text detection
Idea : Scene texts are designed to be easily read, thus have strong edges
Methods : Edge detector (e.g. Canny operator) and binarization method are
used to extract text and to eliminate non-text regions
+ Efficient and simple !
- Sensitive to the influence of shadow or highlight
N. Ezaki, M. Bulacu, and L. Schomaker, “Text detection from natural scene images: Towards a system for
visually impaired persons,” in Int. Conf. on Pattern Recognition, Cambridge, UK, Aug. 2004, pp. 683–686
7. Texture based text detection
Idea : Find distinct textural properties from non-text regions(background)
Methods : Gaussian filtering, Histogram of oriented gradients (HOG), Wavelet
decomposition, Fourier transform, Discrete Cosine Transform (DCT), Local Binary Pattern
(LBP)
Extract features over a certain region
Identify the existence of text by classifier
+ Can detect and localize texts
accurately even from noisy images
- Relatively slow, sensitive to
text alignment & orientation
Some advanced techniques:
Coars-to-fine strategy -> fast
Local Haar Binary Pattern (LHBP) –> preserve & uniform inconsistent text-background contrasts
(a) input image (640 480) (b) texture classification result
Kim, Kwang In, Keechul Jung, and Jin Hyung Kim. "Texture-based approach for text detection in images using support vector machines and
continuously adaptive mean shift algorithm." Pattern Analysis and Machine Intelligence, IEEE Transactions on 25.12 (2003): 1631-1639.
8. Connected component-based text detection
Idea : Segment candidate text components by edge detection or color clustering,
and prune non-text components with classifiers
Methods :
group small components into successively larger components until all regions are
identified in the image (bottom-up approach)
Identify text components and group them to localize text regions
Block adjacency graph(BAG) - connected component extraction
Priority Adaptive segmentation(PAS) – character segmentation
+ low computation cost, can be directly used for text recognition
- Cannot segment accurately without prior knowledge (text position, scale)
- Designing fast and reliable connected component analyzer is difficult due to many
confusing non-text regions
9. Stroke based text detection
Idea : Text = a combination of stroke components
Methods :
1) By segmentation, text stroke candidates are extracted
(Gabor filter, Stroke Width Transform(SWT))
1) verification by feature extraction and classification
2) grouping by clustering
+ provide robust and nearly constant stroke features
(e.g. stroke width)
+ Intuitive & simple, therefore easy to implement
- complex backgrounds can be problem
Text tends to maintain fixed stroke width
Epshtein, Boris, Eyal Ofek, and Yonatan Wexler. "Detecting text in natural scenes with stroke width
transform." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.
An example of SWT based text detection
10. Others
1. Some hybrid approaches to deal with many
variations in text
2. Detect texts of arbitrary orientations with
rotation-invariant features based on SWT
3. Color reduction method: reduce the total
number of colors in each RGB components
4. Small letter detection in images, Limited to
some standard font sizes (remove less than 10
pixels) …
Yao, Cong, et al. "Detecting texts of arbitrary orientations in natural images."Computer
Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
Kumar, Manoj, Young Chul Kim, and Guee Sang Lee. "Text detection using multilayer separation in real scene
images." Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010.
11. Text enhancement & segmentation
Tranditional OCR software are suffering from natural scene, low resolution
images
Enhancing and segmenting text with complex background (noisy images)
Many advanced binarization algorithm for text enhancement is proposed
ex) Transform the gray level of each pixel to the new domain
(a) Badly illuminated document images (b) binarization
Valizadeh, M., et al. "A novel hybrid algorithm for binarization of badly illuminated document
images." Computer Conference, 2009. CSICC 2009. 14th International CSI. IEEE, 2009.
12. Further survey - OCR with deep learning
OCR with Convolutional Neural Network(CNN) on some challenging images
8 dataset from sports video, google street view, google image search, natural scene
images, news image) – total 9 million images (900k validation set)
Outperform existing state-of-the-art approaches (90~98% accuracy)
Ex) BBC news text search
Jaderberg, Max, et al. "Reading Text in the Wild with Convolutional Neural Networks." arXiv preprint arXiv:1412.1842 (2014).
Result sample
Many word bounding box proposals Reduce FP by random forest classifier
13. Public dataset
A. 2003/2005 ICDAR Text Localization Contest trail
test database
251 images, ground truth of the word bounding boxes
Most widely used database
- Most of the texts are horizontal.
- All the texts are in English
B. KAIST Scene Text Database
3000 images in different environments (outdoors,
indoors, under different lighting conditions)
Captured either by high-resolution camera or low-
resolution mobile phone camera
Scene texts are in Korean, English, and mixed language
C. The Street View Text (SVT) dataset
Google street view images
High variation, low resolution
D. NEOCR (Natural Environment OCR dataset)
659 real world images with 5238 annotated bounding
boxes
A
B
C
D