SlideShare a Scribd company logo
1 of 1
Project Proposal Form
CS791A – Machine Learning
Program Name: Fused Optical Character Recognition System (FOCRS)
Program Participants: Nick Bartlow, Nathan. Kalka New:_X_ Continuation:____
Description: According to [1], “Optical character recognition (OCR), as understood in the following, is the whole process of
transforming a document image (machine printed or handwritten) into a corresponding ASCII text. Many steps are necessary to
perform this task, e.g. layout analysis, image preprocessing, line segmentation, character recognition, contextual
postprocessing... Modifying one of them may lead to completely different results.” Much research has gone into the
development of applications to provide (semi) automatic OCR. As technology has matured, performance of such applications
has improved dramatically. That said, performance of applications is necessarily a function of the testing environment / data
repositories. Additionally, applications often approach the problem of OCR with (semi) orthogonal methodologies to reach a
final solution. Given this fact, various data fusion methodologies may lead to promising results if multiple OCR packages are
combined appropriately.
Experimental Plan: We intend on taking a series of freely distributed OCR packages (gocr, tesseract, ocrad, ocropus,
etc…) and developing a framework for combining their output with the intention of arriving at an increased level of accuracy
relative to the individual packages alone. Techniques such as boosting, cascading, and adaptive fusion frameworks may be
investigated to this end. Besides applying the chosen techniques on machine generated datasets and samples from paper
documents scanned electronically, we will also collect a dataset consisting of handwriting samples gathered electronically
through a tablet PC. Formal comparisons of performance will include recognition of individual characters as well as passages
of text.
Related Work Elsewhere: [1] Incorporates
geometrical criteria to prevent incorrect character
segmentations as well as improving performance through
classical combination rules such as Borda Count or
Plurality Vote. [2] Focuses on obtaining a tradeoff
between speed and recognition accuracy through a
cascade of classifiers. [3] Investigates the utility of string
alignment algorithms in merging outputs from multiple
OCR classifiers.
How ours is Different: To the best of our knowledge we
have not seen any experiments of OCR technologies on
handwritten databases collected electronically through tablet PC.
Although, the results of collection in this format may be arguably
similar to scanned handwriting samples, we anticipate different
recognition challenges with data acquired in this manner. Besides
natural differences in quality related to the capture device (tablet
PC vs. paper), the dynamics of the writing process also changes.
We expect this change in “writing style” to be observed in varying
degrees from individual to individual.
Related Work in:
[1] E. Wilczok, W. Lellmann, “Adaptive Combination of
Commercial OCR Systems,” Book Series Lecture Notes in
Computer Science, Vol. 2956, 2004.
[2] K. Chellapilla, M. Shilman, P. Simard, “Optimally
combining a cascade of classifiers,” Proceedings of SPIE, 2006.
[3] J.C. Handley, “Improving OCR accuracy through
combination: A survey,” Proc. IEEE Int. Conf. Syst. Man
Cybern. Vol. 5, pp. 4330-4333. 1998.
Milestones:
(1) Construct / collect a database of testing images including
machine generated text, scanned handwritten text, and
electronically gathered text.
(2) Construct a cascade of classifiers and or fusion framework for
individual OCR packages.
(3) Analyze results on data collected from (1).
Deliverables: Milestones will result in a technical
report / presentation.
Budget: Total: ~$40,000, Students: $30,000 (get us while
we’re still cheap) Travel: $5,000, Other (software, office supplies)
$5,000 (we need new machines).
Progress to Date: Various individual OCR programs installed / sanity tested.
Knowledge Transfer Target Date: 2 months, Fall 2007.

More Related Content

What's hot

USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd Iaetsd
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYSOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
 
A unified approach for spatial data query
A unified approach for spatial data queryA unified approach for spatial data query
A unified approach for spatial data queryIJDKP
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal
 
Document clustering for forensic analysis an approach for improving computer ...
Document clustering for forensic analysis an approach for improving computer ...Document clustering for forensic analysis an approach for improving computer ...
Document clustering for forensic analysis an approach for improving computer ...JPINFOTECH JAYAPRAKASH
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)Konstantinos Zagoris
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique ofIJDKP
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXINGIJDKP
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
20 26 jan17 walter latex
20 26 jan17 walter latex20 26 jan17 walter latex
20 26 jan17 walter latexIAESIJEECS
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 

What's hot (17)

USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clustering
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYSOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
 
A unified approach for spatial data query
A unified approach for spatial data queryA unified approach for spatial data query
A unified approach for spatial data query
 
chemengine karthi acs sandiego rev1.0
chemengine karthi acs sandiego rev1.0chemengine karthi acs sandiego rev1.0
chemengine karthi acs sandiego rev1.0
 
ChemEngine ACS
ChemEngine ACSChemEngine ACS
ChemEngine ACS
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
 
Document clustering for forensic analysis an approach for improving computer ...
Document clustering for forensic analysis an approach for improving computer ...Document clustering for forensic analysis an approach for improving computer ...
Document clustering for forensic analysis an approach for improving computer ...
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXING
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
20 26 jan17 walter latex
20 26 jan17 walter latex20 26 jan17 walter latex
20 26 jan17 walter latex
 
H04564550
H04564550H04564550
H04564550
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 

Similar to Project Proposal Form

Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Editor IJARCET
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Editor IJARCET
 
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET Journal
 
IRJET- Intelligent Character Recognition of Handwritten Characters using ...
IRJET-  	  Intelligent Character Recognition of Handwritten Characters using ...IRJET-  	  Intelligent Character Recognition of Handwritten Characters using ...
IRJET- Intelligent Character Recognition of Handwritten Characters using ...IRJET Journal
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) Systemiosrjce
 
IRJET- Offline Transcription using AI
IRJET-  	  Offline Transcription using AIIRJET-  	  Offline Transcription using AI
IRJET- Offline Transcription using AIIRJET Journal
 
Document Analyser Using Deep Learning
Document Analyser Using Deep LearningDocument Analyser Using Deep Learning
Document Analyser Using Deep LearningIRJET Journal
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyEr. Ashish Pandey
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 
Ocr accuracy improvement on
Ocr accuracy improvement onOcr accuracy improvement on
Ocr accuracy improvement onsipij
 
OCR Datasets Unleashed.docx
OCR Datasets Unleashed.docxOCR Datasets Unleashed.docx
OCR Datasets Unleashed.docxShalini104884
 
OCR Datasets Unleashed.docx
OCR Datasets Unleashed.docxOCR Datasets Unleashed.docx
OCR Datasets Unleashed.docxShalini104884
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )SBGC
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...SBGC
 
Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...IOSR Journals
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontIRJET Journal
 
Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text ImageEditor IJCATR
 

Similar to Project Proposal Form (20)

Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015
 
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten Characters
 
IRJET- Intelligent Character Recognition of Handwritten Characters using ...
IRJET-  	  Intelligent Character Recognition of Handwritten Characters using ...IRJET-  	  Intelligent Character Recognition of Handwritten Characters using ...
IRJET- Intelligent Character Recognition of Handwritten Characters using ...
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) System
 
D017222226
D017222226D017222226
D017222226
 
IRJET- Offline Transcription using AI
IRJET-  	  Offline Transcription using AIIRJET-  	  Offline Transcription using AI
IRJET- Offline Transcription using AI
 
Document Analyser Using Deep Learning
Document Analyser Using Deep LearningDocument Analyser Using Deep Learning
Document Analyser Using Deep Learning
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper Study
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
50120130406005
5012013040600550120130406005
50120130406005
 
Ocr accuracy improvement on
Ocr accuracy improvement onOcr accuracy improvement on
Ocr accuracy improvement on
 
OCR Datasets Unleashed.docx
OCR Datasets Unleashed.docxOCR Datasets Unleashed.docx
OCR Datasets Unleashed.docx
 
OCR Datasets Unleashed.docx
OCR Datasets Unleashed.docxOCR Datasets Unleashed.docx
OCR Datasets Unleashed.docx
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
 
Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text Image
 
Telugu letters dataset and parallel deep convolutional neural network with a...
Telugu letters dataset and parallel deep convolutional neural  network with a...Telugu letters dataset and parallel deep convolutional neural  network with a...
Telugu letters dataset and parallel deep convolutional neural network with a...
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Project Proposal Form

  • 1. Project Proposal Form CS791A – Machine Learning Program Name: Fused Optical Character Recognition System (FOCRS) Program Participants: Nick Bartlow, Nathan. Kalka New:_X_ Continuation:____ Description: According to [1], “Optical character recognition (OCR), as understood in the following, is the whole process of transforming a document image (machine printed or handwritten) into a corresponding ASCII text. Many steps are necessary to perform this task, e.g. layout analysis, image preprocessing, line segmentation, character recognition, contextual postprocessing... Modifying one of them may lead to completely different results.” Much research has gone into the development of applications to provide (semi) automatic OCR. As technology has matured, performance of such applications has improved dramatically. That said, performance of applications is necessarily a function of the testing environment / data repositories. Additionally, applications often approach the problem of OCR with (semi) orthogonal methodologies to reach a final solution. Given this fact, various data fusion methodologies may lead to promising results if multiple OCR packages are combined appropriately. Experimental Plan: We intend on taking a series of freely distributed OCR packages (gocr, tesseract, ocrad, ocropus, etc…) and developing a framework for combining their output with the intention of arriving at an increased level of accuracy relative to the individual packages alone. Techniques such as boosting, cascading, and adaptive fusion frameworks may be investigated to this end. Besides applying the chosen techniques on machine generated datasets and samples from paper documents scanned electronically, we will also collect a dataset consisting of handwriting samples gathered electronically through a tablet PC. Formal comparisons of performance will include recognition of individual characters as well as passages of text. Related Work Elsewhere: [1] Incorporates geometrical criteria to prevent incorrect character segmentations as well as improving performance through classical combination rules such as Borda Count or Plurality Vote. [2] Focuses on obtaining a tradeoff between speed and recognition accuracy through a cascade of classifiers. [3] Investigates the utility of string alignment algorithms in merging outputs from multiple OCR classifiers. How ours is Different: To the best of our knowledge we have not seen any experiments of OCR technologies on handwritten databases collected electronically through tablet PC. Although, the results of collection in this format may be arguably similar to scanned handwriting samples, we anticipate different recognition challenges with data acquired in this manner. Besides natural differences in quality related to the capture device (tablet PC vs. paper), the dynamics of the writing process also changes. We expect this change in “writing style” to be observed in varying degrees from individual to individual. Related Work in: [1] E. Wilczok, W. Lellmann, “Adaptive Combination of Commercial OCR Systems,” Book Series Lecture Notes in Computer Science, Vol. 2956, 2004. [2] K. Chellapilla, M. Shilman, P. Simard, “Optimally combining a cascade of classifiers,” Proceedings of SPIE, 2006. [3] J.C. Handley, “Improving OCR accuracy through combination: A survey,” Proc. IEEE Int. Conf. Syst. Man Cybern. Vol. 5, pp. 4330-4333. 1998. Milestones: (1) Construct / collect a database of testing images including machine generated text, scanned handwritten text, and electronically gathered text. (2) Construct a cascade of classifiers and or fusion framework for individual OCR packages. (3) Analyze results on data collected from (1). Deliverables: Milestones will result in a technical report / presentation. Budget: Total: ~$40,000, Students: $30,000 (get us while we’re still cheap) Travel: $5,000, Other (software, office supplies) $5,000 (we need new machines). Progress to Date: Various individual OCR programs installed / sanity tested. Knowledge Transfer Target Date: 2 months, Fall 2007.