Khmer ocr itc

Khmer OCR
LONG Seangmeng
Lecturer and researcher, GIC - ITC
seangmeng@itc.edu.kh
17/02/2012
1

Khmer OCR
• OCR System
• Khmer OCR Project
• State of the Art
• Work Done
• Current Work
2

Khmer OCR Project
• 2011-2012
• Team
– 1 researcher
– 1 intern student (5th year)
• Develop a Khmer OCR system
– Font independent
– Size independent
4

State of the Art
Author Limitation Result
C. Chey, P. Kumhom and K.
Chamnongthai
10 characters (បពជកភណ
ឃសវទ)
92%
C. Chey, P. Kumhom and K.
Chamnongthai
20 fonts 92.85% (size 22)
91.66% (size 18)
89.27% (size 12)
L. Ing and A. Muaz Limon R1 22 98.88%
V. Kruy Font and size independent 97%
5
Tesseract
• Top 3 engines in 1995
• Most accurate open source OCR engine

Work Done
• Training Tesseract for Khmer font
– Khmer OS font
– 2210 character clusters
– 11 MB
• Problems
– Some characters not detected
– Some characters misdetected
6

Current Work
• Improve works done by Vanna Kruy
– Improve performance
– Create an easy-to-use GUI
– Make it easy to add new fonts
7

Thanks for your attention!
Questions???
8

Khmer ocr itc

More Related Content

More from Solin TEM

Recently uploaded

Khmer ocr itc