OCRFeeder LinuxTag 2011

  • 1,371 views
Uploaded on

The slides for the presentation about OCRFeeder given at LinuxTag 2011.

The slides for the presentation about OCRFeeder given at LinuxTag 2011.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Great presentation.Today, businesses in many industries make extensive use of OCR technology for document automation. Practically every company that deals with paper documents can benefit from OCR.Many businesses, aware of the environmental impact of wasteful paper use, and also just fed up with paper clutter, are moving towards the ideal of a paper-free office.OCR Cloud 2.0 platform can convert virtually any image (TIF, JPG, PNG, BMP) or PDF to any standard text-based document type (TXT, DOC, RTF, XLS, PPT, XML, HTML) or searchable PDF.For free developer account signup here-http://www.ocr-it.com/ocr-cloud-2-0-api
    Are you sure you want to
    Your message goes here
    Be the first to like this
No Downloads

Views

Total Views
1,371
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
13
Comments
1
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. static void _f_do_barnacle_install_properties(GObjectClass *gobject_class) {OCRFeeder GParamSpec *pspec; /* Party code attribute */ pspec = g_param_spec_uint64 (F_DO_BARNACLE_CODE, "Barnacle code.", "Barnacle code", 0, G_MAXUINT64, G_MAXUINT64 /* default value */,Converting printed documents into G_PARAM_READABLE | G_PARAM_WRITABLE | G_PARAM_PRIVATE);digital formats g_object_class_install_property (gobject_class, F_DO_BARNACLE_PROP_CODE,Joaquim Rochajrocha@igalia.com Berlin, May 2011
  • 2. What is it?Document Analysis and Optical Character Recognition for GNOME Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 3. Why? Paper has a number of problemsNo applications for GNU/Linux to do a fair job Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 4. Paper problems: Security Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011 CC Photo by: http://www.flickr.com/photos/badwsky/
  • 5. Paper problems: Preservation Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011 CC Photo by: http://www.flickr.com/photos/98469445@N00/
  • 6. Paper problems:Data processing Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011 CC Photo by: http://www.flickr.com/photos/hugovk/
  • 7. Paper problems: Ecology Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011 CC Photo by: http://www.flickr.com/photos/pranavsingh/
  • 8. No fair conversion apps for GNU/Linuxapart from OCR engines, but... Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 9. OCR != Document Conversion (it only deals with chars) (does not consider the layout)(does not distinguish contents) Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 10. Whats needed is Document Analysis and Recognition(conversion of documents to an electronic format) (first projects in the 80s) Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 11. Where are were we at? * Some closed solutions* Only for proprietary systems * Various prices * still... arguable results Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 12. How Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 13. So many layouts... Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011 CC Photo by: http://www.flickr.com/photos/uber-tuber/
  • 14. Layouts vary with the type of documentWhat works on detecting one, wont work on others Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 15. OCRFeeder focuses on contents, not on layouts! Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 16. Key concept: If a document image can bedivided in windows of 1 (content) or 0 (not content),then it is possible to group all the 1s and outline the contents Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 17. Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 18. Recognition:System-wide OCR engines are used Engines are configured from the GUI or XML files Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 19. Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 20. Most known free OCR engines are detected and configured automatically: * Tesseract * GOCR * OCRAD * Cuneiform Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 21. Exportation formats: ODT HTML Plain text Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 22. User interaction: Users can edit everythingand review the algorithms resultsSo, UI can work in attended and unattended waysCLI only works in an unattended mode Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 23. Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 24. Demo time! Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 25. Other features: * PDF importation* Unpaper preprocessor * Font style edition * Image deskewing * OCR results cleaning* Project saving/loading Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 26. A11y:* OCRFeeder is a very useful tool for visually impaired users * Last year, the main target of itsdevelopment was to improve a11y Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 27. Future: * Integrate Ocropus as an alternative analysis backend* More exportation formats: HOCR, PDF, etc.* Make OCR engines management easier Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 28. Webpage:http://live.gnome.org/OCRFeedergit:http://git.gnome.org/ocrfeederBugzilla:http://bugzilla.gnome.orgproduct: OCRFeeder Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 29. Manual in German:http://wiki.ubuntuusers.de/OCRFeeder Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
  • 30. Thank you! Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011