Slideshare.net (beta)

 
Post to TwitterPost to Twitter
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 0 (more)

2008 Klosters Talk Kt3

From tmbdev, 5 months ago

275 views  |  0 comments  |  0 favorites  |  2 downloads
 

Categories

Add Category
 
 

Tags

nlp modeling language statistical page layout analysis document recognition pattern

more

 
 

Groups / Events

 

 
Embed
options

More Info

This slideshow is Public
Total Views: 275
on Slideshare: 275
from embeds: 0

Slideshow transcript

Slide 1:Recognizing 1017 “Letters” Thomas M. Breuel DFKI & U. Kaiserslautern 1

Slide 2:We're building an OCR system. 2

Slide 3:OCR 2 Browser and Design Testing There are multiple implementations of HTML rendering engines; some common ones are Microsoft's Internet Explorer, Mozilla's Gecko, Apple's Safari, Opera's browser, and KDE's KHTML. Each of these render web pages differently due to bugs and incomplete specifi­ cations of web standards. Common defects are missing text, text that is unintentionally rendered overlapping, text that unintentionally overlaps graphical elements, bad font sub­ stitutions, bad spacing, and unreadable choices of foreground and background colors. Our approach to this problem is to render the HTML into an image­based representa­ tion and then subject