SlideShare a Scribd company logo
1 of 35
Support Vector Machine (SVM) Based
Classifier For Khmer Printed
Character-set Recognition
ការប្រើ្ាស់បវិធើសាស្រសត
របស់បSVM ក្នុងការបសម្គា ល់ប
តួអក្សរខ្មែរបបៅក្នុងបរូបភាព
Mr. SOK Pongsametrey
28th October 2014
Content
•Introduction
•Related Research
•Research Methodology
•Experiment and Discussion
•Conclusion and Future Works
•References
•Demo
•Q & A
INTRODUCTION
•Already some researches on Optical Character
Recognition (OCR) on Khmer language
•Some methods are used on different part of the
OCR system such as on Segmentation or
Recognition but no yet full system from image
processing to output currently in use or produce.
•Support Vector Machine (SVM) is a new method
for classification and recognition for Khmer OCR
but it does not apply for Segmentation
RELATED RESEARCHES
RELATED RESEARCHES
Problem Statement
•There are some methods already introduced for
Khmer OCR in text segmentation and recognition
but speed, memory and accuracy is still a concern
for a real Khmer OCR system.
•Khmer language has five levels; one main level in
the middle, two levels superscript and two levels
subscript.
Research Scope
• A new method, Support Vector Machine (SVM) for
Khmer OCR via Khmer Printed Character-set in Bitmap
format
• One Khmer Font name: Khmer OS Content (or Kh OS
Content) for training size of 32pt and target of the
document of size: 28pt, 32pt & 36pt
• Apply for any image files
• Segmentation uses Edge Detection from previous
research of Ieh Setha et al, 2012
About SVM
• SVM is a binary classifier
• Linear SVM is firstly introduced but data is not always linear
• Non-linear SVM introduces to solve non-linear data but
performance is not well performed
• Sequential Minimal Optimization (SMO) is a learning algorithm
• SMO is an optimal algorithm to work with Non-linear SVM
About SVM : Classifier
• Non-linear SVM with following Kernel Trick:
• Linear SVM Kernel
• Polynomial SVM Kernel
• Gaussian SVM Kernel
• Why SVM?
• In concept, training less dataset both size &
font face with acceptable accuracy
• Minimal consuming time and resource in both
training & recognition
METHODOLOGIES
•Two parts have been covered in this section
Raw Data
Training
Java Implementation
KhmerOCR Application with Accord.NET, C# Implementation
Data Training
• Training the Unicode Character or piece of the character, font size 32pt
• Text rendering is using ASCII (Limon)
• Define Levels of Khmer Characters
• Classified Types of Khmer Characters
Data preparation
• A Java application to produce different
text file of all charset, main, superscript,
subscript character files
• SVM Data Training: Feature Extraction
Training Module
Data Training: 5 vertical Levels of Cluster
• Middle level characters
• Subscript level characters (another level is lower)
• Superscript level characters (another level is higher)
Data Training – Classified Types
• Connected-point Single Character
• Disconnected-point Single
Characters
• Connected-point Combined-
characters
• Disconnected-point Combined-
character
Data Training – Module
• The process of training is to
convert the image character
file into the string of binary
data (0, 1) of matrix 32x32
dimension
• Small Java Application to
convert TrueType font
character into image file and
generate the matrix
Data Training – SVM
•SVM Data Training
•There are 500 support vectors
• All characters set: 250
• Main level character: 177
• Superscript character: 21
• Subscript character: 52
•Feature extract is an array of 1024 blocks, extracted
from binary matrix of 32 x 32
Data Training – SVM
• SVM Data Training
Optical Character Recognition (OCR)
OCR
Segmentation
Single
Character
Feature
Extraction
SVM
Recognition
Character
Assembling
OCR Process
Input Image Extract Image
Feature
Process Matrix
Analysis
Character
Assembling
Process Trained
SVM Kernel
Output ASCII
Character
Single Character
Feature
Limon to
Unicode
Converter
OCR - Segmentation
• Segmentation uses Edge Detection in order to separate line and
character.
• Line Separation Segmentation is a process to separate line by line
of Khmer text in the document
• Character Segmentation is a process to segment the character in a
series of character set
OCR – Classification & Recognition
• Uses Multi-class SVM with One-Against-One (OAO) because SVM
is binary classifier
• OAO can represent by discarding the redundant option
𝑘 =
𝑛(𝑛 − 1)
2
OCR – Character Assembling
• Two issues to solve here
EXPERIMENT
•Experiment Environment
Item Description
Operating System Microsoft Windows
Development Tools Microsoft Visual Studio 2012
Programming
Language
C#
Running Environment .Net Framework 3.5
Framework Accord.NET 2.9 (latest version 2.11)
Experiment Inputs Bitmap files (*.bmp) – Font size: 28pt, 32pt, 36pt
Algorithms Edge Detection (Segmentation)
SVM (Multi-class, Non-Linear, SMO)
Experiment Outputs Text files (training set, log)
Machine-code character
Testing Participation 4 Persons
Experiment Process
• Sample Experiment Data around 3000 characters
• Test on font size: 24pt, 28pt, 32pt, 36pt, 48pt, 60pt
• The testing focus on 3 SVM kernels (Linear, Polynomial & Gaussian)
with training algorithms of SMO
Load
Binary Text
Files
Select
Kernel &
Parameters
SVM
Training
with SMO
Load Data
to Test,
and Click
SVM Analys
Assembling
and Output
ASCII
(Limon)
Experiment: Example
Experiment - Accuracy
95.50%
96.00%
96.50%
97.00%
97.50%
98.00%
98.50%
99.00%
Gaussian Linear Polynomial
Accuracy, font size 36pt
Accuracy
98.54%
98.44%
96.78%
Experiment – CPU Usage
0%
10%
20%
30%
40%
50%
60%
70%
Gaussian Linear Polynomial
CPU
CPU
43%
62%
53%
Experiment – Timespent
Gaussian spent 3 sec. 690ms on training and 3sec. 286ms on recognition
3690.47 3285.857
1837.00
30314.42857
1030
3102.286
0.00
5000.00
10000.00
15000.00
20000.00
25000.00
30000.00
35000.00
Training Timespent Recognition Timespent
Gaussian
Linear
Polynomial
Evaluation
• Evaluates on Speed, Memory & Accuracy
• According to the experiment, Gaussian Kernel is more better than
other kernel so it’s chosen for this evaluation
• Gaussian Kernel, 98.54% accuracy, Avg. CPU of 43% and time spent
of more than 3 seconds
Evaluation – Gaussian with other font sizes
• Font size 28pt has 98.17%; 32pt has 98.62%; 36pt has 98.54%;
48pt has 98.61%; 60pt has 99.33%;
• More clear pixels, the more accuracy (segmentation)
98.17%
98.62%
98.54%
98.61%
99.33%
97.40%
97.60%
97.80%
98.00%
98.20%
98.40%
98.60%
98.80%
99.00%
99.20%
99.40%
99.60%
28pt 32pt 36pt 48pt 60pt
Accuracy of different font sizes
Discussion
• SVM with Gaussian Kernel is the best choice for SVM based Khmer
OCR.
• Gaussian kernel is a preferred kernel when we don’t know much
about the data we are trying to model.
• Some SVM error that can’t solve
Found but
wrong
Character
Expected
Issued on Tested
Font Size Explain
ំ ំ 28pt, 36pt
Error because of the segmentation,
application error
ជ ផ 32pt, 36pt SVM classification error
ខ ន 28pt, 32pt SVM classification error
ណ ញ 32pt SVM classification error
ឍ ញ 28pt SVM classification error
ំំor ំ ំ 28pt, 32pt
SVM classification error &
Segmentation issue
CONCLUSION
• Khmer characters in bitmap document are well recognized by SVM
method
• SVM method uses less CPU (12% better) and time spent on
recognition (16 seconds faster)
• The research study with training data of 32pt and test on font size
28pt, 32pt and 36pt got accuracy of 98.17%, 98.62% and 98.54%
accordingly
FUTURE WORKS
• Binary data to represent Khmer character seem too limit for
classification in SVM, so a study on each Character’s feature to use
in SVM will bring more better result
• Apply Multiple Type of Khmer Unicode fonts
• Apply the Scanned document
• Improve Current Segmentation Method
REFERENCES
• Thesis - Iech Setha and Taing Nguonly (2012, December), A
Combined Edge Detection and Template Matching for Khmer Printed
Character-set Recognition.
• Book - Peter H. (2012). Machine Learning in Action.
• Webpages. Accord.NET Framework. (2014, October). Retrieved
from: http://accord-framework.net/
DEMO
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set Recognition

More Related Content

What's hot

Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningPaxcel Technologies
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMohsin Ul Haq
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernelsankit_ppt
 
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machineShital Andhale
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaMacha Pujitha
 
Support Vector Machine (SVM)
Support Vector Machine (SVM)Support Vector Machine (SVM)
Support Vector Machine (SVM)Sana Rahim
 
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepManish nath choudhary
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machineRuta Kambli
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetPawandeep Kaur
 

What's hot (19)

Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernels
 
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machine
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
Support Vector Machine (SVM)
Support Vector Machine (SVM)Support Vector Machine (SVM)
Support Vector Machine (SVM)
 
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by Step
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Support vector machine-SVM's
Support vector machine-SVM'sSupport vector machine-SVM's
Support vector machine-SVM's
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
 

Viewers also liked

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Treparel
 
Ensemble of Exemplar-SVM for Object Detection and Beyond
Ensemble of Exemplar-SVM for Object Detection and BeyondEnsemble of Exemplar-SVM for Object Detection and Beyond
Ensemble of Exemplar-SVM for Object Detection and Beyondzukun
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector MachineDerek Kane
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attributionReza Ramezani
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...PyData
 
Authorship analysis using function words forensic linguistics
Authorship analysis using function words forensic linguisticsAuthorship analysis using function words forensic linguistics
Authorship analysis using function words forensic linguisticsVlad Mackevic
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
Automatic handwriting recognition
Automatic handwriting recognitionAutomatic handwriting recognition
Automatic handwriting recognitionBIJIT GHOSH
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLPbutest
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineMusa Hawamdah
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
Flag segmentation, feature extraction & identification using support vector m...
Flag segmentation, feature extraction & identification using support vector m...Flag segmentation, feature extraction & identification using support vector m...
Flag segmentation, feature extraction & identification using support vector m...R M Shahidul Islam Shahed
 
Recognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkRecognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkeSAT Journals
 

Viewers also liked (20)

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
 
Ensemble of Exemplar-SVM for Object Detection and Beyond
Ensemble of Exemplar-SVM for Object Detection and BeyondEnsemble of Exemplar-SVM for Object Detection and Beyond
Ensemble of Exemplar-SVM for Object Detection and Beyond
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attribution
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
 
Authorship analysis using function words forensic linguistics
Authorship analysis using function words forensic linguisticsAuthorship analysis using function words forensic linguistics
Authorship analysis using function words forensic linguistics
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
Automatic handwriting recognition
Automatic handwriting recognitionAutomatic handwriting recognition
Automatic handwriting recognition
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Flag segmentation, feature extraction & identification using support vector m...
Flag segmentation, feature extraction & identification using support vector m...Flag segmentation, feature extraction & identification using support vector m...
Flag segmentation, feature extraction & identification using support vector m...
 
Recognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural networkRecognition of handwritten digits using rbf neural network
Recognition of handwritten digits using rbf neural network
 

Similar to Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set Recognition

Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Venturesmicrosoftventures
 
Khmer ocr using gfd_seminar_day
Khmer ocr using gfd_seminar_dayKhmer ocr using gfd_seminar_day
Khmer ocr using gfd_seminar_daySolin TEM
 
Contribution of recurrent connectionist language models in improving lstm bas...
Contribution of recurrent connectionist language models in improving lstm bas...Contribution of recurrent connectionist language models in improving lstm bas...
Contribution of recurrent connectionist language models in improving lstm bas...anna8885
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Offline Omni Font Arabic Optical Text Recognition System using Prolog Classif...
Offline Omni Font Arabic Optical Text Recognition System using Prolog Classif...Offline Omni Font Arabic Optical Text Recognition System using Prolog Classif...
Offline Omni Font Arabic Optical Text Recognition System using Prolog Classif...GBM
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxSan Kim
 
online examination portal project presentation
online examination portal project presentationonline examination portal project presentation
online examination portal project presentationShobhit Jain
 
OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima as af
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontIRJET Journal
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate RecognitionGilbert
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalBiniam Asnake
 
Improving code completion for Pharo
Improving code completion for PharoImproving code completion for Pharo
Improving code completion for PharoMarcus Denker
 
Improving Code Completion in Pharo
Improving Code Completion in PharoImproving Code Completion in Pharo
Improving Code Completion in PharoESUG
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneIvo Andreev
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 

Similar to Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set Recognition (20)

Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
Khmer ocr using gfd_seminar_day
Khmer ocr using gfd_seminar_dayKhmer ocr using gfd_seminar_day
Khmer ocr using gfd_seminar_day
 
Contribution of recurrent connectionist language models in improving lstm bas...
Contribution of recurrent connectionist language models in improving lstm bas...Contribution of recurrent connectionist language models in improving lstm bas...
Contribution of recurrent connectionist language models in improving lstm bas...
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Offline Omni Font Arabic Optical Text Recognition System using Prolog Classif...
Offline Omni Font Arabic Optical Text Recognition System using Prolog Classif...Offline Omni Font Arabic Optical Text Recognition System using Prolog Classif...
Offline Omni Font Arabic Optical Text Recognition System using Prolog Classif...
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
 
Assignment-1-NF.docx
Assignment-1-NF.docxAssignment-1-NF.docx
Assignment-1-NF.docx
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
online examination portal project presentation
online examination portal project presentationonline examination portal project presentation
online examination portal project presentation
 
OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate Recognition
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based Retrieval
 
Improving code completion for Pharo
Improving code completion for PharoImproving code completion for Pharo
Improving code completion for Pharo
 
Improving Code Completion in Pharo
Improving Code Completion in PharoImproving Code Completion in Pharo
Improving Code Completion in Pharo
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Chatbot
ChatbotChatbot
Chatbot
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set Recognition

  • 1. Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set Recognition ការប្រើ្ាស់បវិធើសាស្រសត របស់បSVM ក្នុងការបសម្គា ល់ប តួអក្សរខ្មែរបបៅក្នុងបរូបភាព Mr. SOK Pongsametrey 28th October 2014
  • 2. Content •Introduction •Related Research •Research Methodology •Experiment and Discussion •Conclusion and Future Works •References •Demo •Q & A
  • 3. INTRODUCTION •Already some researches on Optical Character Recognition (OCR) on Khmer language •Some methods are used on different part of the OCR system such as on Segmentation or Recognition but no yet full system from image processing to output currently in use or produce. •Support Vector Machine (SVM) is a new method for classification and recognition for Khmer OCR but it does not apply for Segmentation
  • 6. Problem Statement •There are some methods already introduced for Khmer OCR in text segmentation and recognition but speed, memory and accuracy is still a concern for a real Khmer OCR system. •Khmer language has five levels; one main level in the middle, two levels superscript and two levels subscript.
  • 7. Research Scope • A new method, Support Vector Machine (SVM) for Khmer OCR via Khmer Printed Character-set in Bitmap format • One Khmer Font name: Khmer OS Content (or Kh OS Content) for training size of 32pt and target of the document of size: 28pt, 32pt & 36pt • Apply for any image files • Segmentation uses Edge Detection from previous research of Ieh Setha et al, 2012
  • 8. About SVM • SVM is a binary classifier • Linear SVM is firstly introduced but data is not always linear • Non-linear SVM introduces to solve non-linear data but performance is not well performed • Sequential Minimal Optimization (SMO) is a learning algorithm • SMO is an optimal algorithm to work with Non-linear SVM
  • 9. About SVM : Classifier • Non-linear SVM with following Kernel Trick: • Linear SVM Kernel • Polynomial SVM Kernel • Gaussian SVM Kernel • Why SVM? • In concept, training less dataset both size & font face with acceptable accuracy • Minimal consuming time and resource in both training & recognition
  • 10. METHODOLOGIES •Two parts have been covered in this section Raw Data Training Java Implementation KhmerOCR Application with Accord.NET, C# Implementation
  • 11. Data Training • Training the Unicode Character or piece of the character, font size 32pt • Text rendering is using ASCII (Limon) • Define Levels of Khmer Characters • Classified Types of Khmer Characters Data preparation • A Java application to produce different text file of all charset, main, superscript, subscript character files • SVM Data Training: Feature Extraction Training Module
  • 12. Data Training: 5 vertical Levels of Cluster • Middle level characters • Subscript level characters (another level is lower) • Superscript level characters (another level is higher)
  • 13. Data Training – Classified Types • Connected-point Single Character • Disconnected-point Single Characters • Connected-point Combined- characters • Disconnected-point Combined- character
  • 14. Data Training – Module • The process of training is to convert the image character file into the string of binary data (0, 1) of matrix 32x32 dimension • Small Java Application to convert TrueType font character into image file and generate the matrix
  • 15. Data Training – SVM •SVM Data Training •There are 500 support vectors • All characters set: 250 • Main level character: 177 • Superscript character: 21 • Subscript character: 52 •Feature extract is an array of 1024 blocks, extracted from binary matrix of 32 x 32
  • 16. Data Training – SVM • SVM Data Training
  • 17. Optical Character Recognition (OCR) OCR Segmentation Single Character Feature Extraction SVM Recognition Character Assembling
  • 18. OCR Process Input Image Extract Image Feature Process Matrix Analysis Character Assembling Process Trained SVM Kernel Output ASCII Character Single Character Feature Limon to Unicode Converter
  • 19. OCR - Segmentation • Segmentation uses Edge Detection in order to separate line and character. • Line Separation Segmentation is a process to separate line by line of Khmer text in the document • Character Segmentation is a process to segment the character in a series of character set
  • 20. OCR – Classification & Recognition • Uses Multi-class SVM with One-Against-One (OAO) because SVM is binary classifier • OAO can represent by discarding the redundant option 𝑘 = 𝑛(𝑛 − 1) 2
  • 21. OCR – Character Assembling • Two issues to solve here
  • 22. EXPERIMENT •Experiment Environment Item Description Operating System Microsoft Windows Development Tools Microsoft Visual Studio 2012 Programming Language C# Running Environment .Net Framework 3.5 Framework Accord.NET 2.9 (latest version 2.11) Experiment Inputs Bitmap files (*.bmp) – Font size: 28pt, 32pt, 36pt Algorithms Edge Detection (Segmentation) SVM (Multi-class, Non-Linear, SMO) Experiment Outputs Text files (training set, log) Machine-code character Testing Participation 4 Persons
  • 23. Experiment Process • Sample Experiment Data around 3000 characters • Test on font size: 24pt, 28pt, 32pt, 36pt, 48pt, 60pt • The testing focus on 3 SVM kernels (Linear, Polynomial & Gaussian) with training algorithms of SMO Load Binary Text Files Select Kernel & Parameters SVM Training with SMO Load Data to Test, and Click SVM Analys Assembling and Output ASCII (Limon)
  • 25. Experiment - Accuracy 95.50% 96.00% 96.50% 97.00% 97.50% 98.00% 98.50% 99.00% Gaussian Linear Polynomial Accuracy, font size 36pt Accuracy 98.54% 98.44% 96.78%
  • 26. Experiment – CPU Usage 0% 10% 20% 30% 40% 50% 60% 70% Gaussian Linear Polynomial CPU CPU 43% 62% 53%
  • 27. Experiment – Timespent Gaussian spent 3 sec. 690ms on training and 3sec. 286ms on recognition 3690.47 3285.857 1837.00 30314.42857 1030 3102.286 0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 Training Timespent Recognition Timespent Gaussian Linear Polynomial
  • 28. Evaluation • Evaluates on Speed, Memory & Accuracy • According to the experiment, Gaussian Kernel is more better than other kernel so it’s chosen for this evaluation • Gaussian Kernel, 98.54% accuracy, Avg. CPU of 43% and time spent of more than 3 seconds
  • 29. Evaluation – Gaussian with other font sizes • Font size 28pt has 98.17%; 32pt has 98.62%; 36pt has 98.54%; 48pt has 98.61%; 60pt has 99.33%; • More clear pixels, the more accuracy (segmentation) 98.17% 98.62% 98.54% 98.61% 99.33% 97.40% 97.60% 97.80% 98.00% 98.20% 98.40% 98.60% 98.80% 99.00% 99.20% 99.40% 99.60% 28pt 32pt 36pt 48pt 60pt Accuracy of different font sizes
  • 30. Discussion • SVM with Gaussian Kernel is the best choice for SVM based Khmer OCR. • Gaussian kernel is a preferred kernel when we don’t know much about the data we are trying to model. • Some SVM error that can’t solve Found but wrong Character Expected Issued on Tested Font Size Explain ំ ំ 28pt, 36pt Error because of the segmentation, application error ជ ផ 32pt, 36pt SVM classification error ខ ន 28pt, 32pt SVM classification error ណ ញ 32pt SVM classification error ឍ ញ 28pt SVM classification error ំំor ំ ំ 28pt, 32pt SVM classification error & Segmentation issue
  • 31. CONCLUSION • Khmer characters in bitmap document are well recognized by SVM method • SVM method uses less CPU (12% better) and time spent on recognition (16 seconds faster) • The research study with training data of 32pt and test on font size 28pt, 32pt and 36pt got accuracy of 98.17%, 98.62% and 98.54% accordingly
  • 32. FUTURE WORKS • Binary data to represent Khmer character seem too limit for classification in SVM, so a study on each Character’s feature to use in SVM will bring more better result • Apply Multiple Type of Khmer Unicode fonts • Apply the Scanned document • Improve Current Segmentation Method
  • 33. REFERENCES • Thesis - Iech Setha and Taing Nguonly (2012, December), A Combined Edge Detection and Template Matching for Khmer Printed Character-set Recognition. • Book - Peter H. (2012). Machine Learning in Action. • Webpages. Accord.NET Framework. (2014, October). Retrieved from: http://accord-framework.net/
  • 34. DEMO