Yerbabuena Software ~ 2.010
Nuxeo World 2010
Yerbabuena applications for Nuxeo DM
and some succesful cases
Francisco José González Barea
Victor Manuel Sánchez Sánchez
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Nuxeo World 2010 - Summary
✔ Who is Yerbabuena Software?
✔ Yerbabuena Applications for Nuxeo DM
➔ Using OCR smartly
➔ Intelligent Document Management.
➢ Auto-tagging
➢ Semantic Features
➔ Mobile Clients
➢ Windows Mobile
➢ Iphone
➢ Android
Who is Yerbabuena Software?
● Company founded in 2005
● ~ 20 workers
● Activity
➢ Developement of Nuxeo DM applications
➢ Support
➢ Training
➢ I+D projects
✔ Nuxeo 2010 Eureka project
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Using OCR smartly
● SCR = Smart Character Recognition
● Not same as ICR
● Architecture
➢ Image Treatment
➢ OCR
➢ Text Treatment
● Adaptability to customer needs
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Using OCR smartly SCR→
● Sucessful Cases:
Extraction of Invoices fields
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Intelligent Document Management
● Intelligent ~ Automatic
✔ Increase machine work
✔ Decrease Human work
● Two different ways
✔ Classify and search documents
✔ Identify and work with documents
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Intelligent DM. Auto-Tagging
●
Extract document full text
➢ Previous OCR if an image
● OpenCalais[1]
analysis
➢ External web service
➢ Extract tags from plain text depending on
content meaning
➢ RDF[2]
file as result
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Intelligent DM. Auto-Tagging
●
Extract relevant words
➢ Depending on document type
➢ Depending on style features
● DBPedia[3]
analysis
➢ Semantic Wikipedia (RDF)
➢ Semantic Query to extract related fields to
each relevant word
➔ SparQL[4]
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Intelligent DM. Semantic Features
● New Nuxeo service
● Based on Semantic Web Technologies[5]
● Needs:
✔ Language to describe DM world to a machine →
OWL[6]
✔ External tools Jena→ [7]
, Pellet[8]
, etc...
✔ I+D: Database storage instead of RAM Persistent→
Reasoner
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Intelligent DM. Semantic Features
● Architecture:
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Intelligent DM. Semantic Features
● So, what have we achieved?
Now, Nuxeo:
✔ is able to tag documents automatically
✔ is able to identify document types automatically
✔ is able to classify documents automatically
✔ is continuously learning
✔ is able to start operations over documents automatically
(i.e. workflows)
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Intelligent Document Management
● Sucessful Cases:
✔ Documents auto-tagging
✔ Detecting document type
✔ Learning to identify document types
✔ Automatic operations over documents (Workflows)
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Mobile Clients
● REST ( Windows Mobile )
● CMIS specification ( Android & Iphone )
● Features
➢ Multiple servers
➢ Viewing documents
➢ Share documents by various ways (e-mail, QR Code, etc.)
➢ Upload documents from camera phone
➢ Create notes and folders
➢ Favourite documents
➢ Document search (FullText and title search)
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Mobile Client – Windows Mobile
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Video Demonstration
Mobile Client – Iphone
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Video Demonstration
Mobile Client – Android
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Video Demonstration
References
[1] OpenCalais home page: http://www.opencalais.com/
[2] RDF Concepts: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
[3] DBPedia home page: http://dbpedia.org/About
[4] SparQL Query Language for RDF:
http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/
[5] W3C: http://www.w3.org/
[6] OWL Quick Reference Guide:
http://www.w3.org/TR/2009/REC-owl2-quick-reference-20091027/
[7] Jena on sourceforge: http://jena.sourceforge.net/
[8] Pellet OWL reasoner: http://clarkparsia.com/pellet
[9] CMIS on wikipedia:
http://en.wikipedia.org/wiki/Content_Management_Interoperability_Services
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Yerbabuena Software on WWW
Main web: http://www.yerbabuena.es
Spanish blog: http://blog.yerbabuena.es/
English blog: http://blog.yerbabuenasoftware.com/
Research blog: http://yerbabuenaresearch.blogspot.com/
Youtube Channel:
http://www.youtube.com/user/YerbabuenaSoftware
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
Nuxeo World 2010 - Questions
Thank you
Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010

Nuxeo world-2010

  • 1.
    Yerbabuena Software ~2.010 Nuxeo World 2010 Yerbabuena applications for Nuxeo DM and some succesful cases Francisco José González Barea Victor Manuel Sánchez Sánchez
  • 2.
    Franciso José González– Victor Manuel Sánchez Yerbabuena Software ~ 2.010 Nuxeo World 2010 - Summary ✔ Who is Yerbabuena Software? ✔ Yerbabuena Applications for Nuxeo DM ➔ Using OCR smartly ➔ Intelligent Document Management. ➢ Auto-tagging ➢ Semantic Features ➔ Mobile Clients ➢ Windows Mobile ➢ Iphone ➢ Android
  • 3.
    Who is YerbabuenaSoftware? ● Company founded in 2005 ● ~ 20 workers ● Activity ➢ Developement of Nuxeo DM applications ➢ Support ➢ Training ➢ I+D projects ✔ Nuxeo 2010 Eureka project Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 4.
    Using OCR smartly ●SCR = Smart Character Recognition ● Not same as ICR ● Architecture ➢ Image Treatment ➢ OCR ➢ Text Treatment ● Adaptability to customer needs Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 5.
    Using OCR smartlySCR→ ● Sucessful Cases: Extraction of Invoices fields Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 6.
    Intelligent Document Management ●Intelligent ~ Automatic ✔ Increase machine work ✔ Decrease Human work ● Two different ways ✔ Classify and search documents ✔ Identify and work with documents Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 7.
    Intelligent DM. Auto-Tagging ● Extractdocument full text ➢ Previous OCR if an image ● OpenCalais[1] analysis ➢ External web service ➢ Extract tags from plain text depending on content meaning ➢ RDF[2] file as result Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 8.
    Intelligent DM. Auto-Tagging ● Extractrelevant words ➢ Depending on document type ➢ Depending on style features ● DBPedia[3] analysis ➢ Semantic Wikipedia (RDF) ➢ Semantic Query to extract related fields to each relevant word ➔ SparQL[4] Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 9.
    Intelligent DM. SemanticFeatures ● New Nuxeo service ● Based on Semantic Web Technologies[5] ● Needs: ✔ Language to describe DM world to a machine → OWL[6] ✔ External tools Jena→ [7] , Pellet[8] , etc... ✔ I+D: Database storage instead of RAM Persistent→ Reasoner Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 10.
    Intelligent DM. SemanticFeatures ● Architecture: Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 11.
    Intelligent DM. SemanticFeatures ● So, what have we achieved? Now, Nuxeo: ✔ is able to tag documents automatically ✔ is able to identify document types automatically ✔ is able to classify documents automatically ✔ is continuously learning ✔ is able to start operations over documents automatically (i.e. workflows) Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 12.
    Intelligent Document Management ●Sucessful Cases: ✔ Documents auto-tagging ✔ Detecting document type ✔ Learning to identify document types ✔ Automatic operations over documents (Workflows) Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 13.
    Mobile Clients ● REST( Windows Mobile ) ● CMIS specification ( Android & Iphone ) ● Features ➢ Multiple servers ➢ Viewing documents ➢ Share documents by various ways (e-mail, QR Code, etc.) ➢ Upload documents from camera phone ➢ Create notes and folders ➢ Favourite documents ➢ Document search (FullText and title search) Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 14.
    Mobile Client –Windows Mobile Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010 Video Demonstration
  • 15.
    Mobile Client –Iphone Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010 Video Demonstration
  • 16.
    Mobile Client –Android Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010 Video Demonstration
  • 17.
    References [1] OpenCalais homepage: http://www.opencalais.com/ [2] RDF Concepts: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ [3] DBPedia home page: http://dbpedia.org/About [4] SparQL Query Language for RDF: http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/ [5] W3C: http://www.w3.org/ [6] OWL Quick Reference Guide: http://www.w3.org/TR/2009/REC-owl2-quick-reference-20091027/ [7] Jena on sourceforge: http://jena.sourceforge.net/ [8] Pellet OWL reasoner: http://clarkparsia.com/pellet [9] CMIS on wikipedia: http://en.wikipedia.org/wiki/Content_Management_Interoperability_Services Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 18.
    Yerbabuena Software onWWW Main web: http://www.yerbabuena.es Spanish blog: http://blog.yerbabuena.es/ English blog: http://blog.yerbabuenasoftware.com/ Research blog: http://yerbabuenaresearch.blogspot.com/ Youtube Channel: http://www.youtube.com/user/YerbabuenaSoftware Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010
  • 19.
    Nuxeo World 2010- Questions Thank you Franciso José González – Victor Manuel Sánchez Yerbabuena Software ~ 2.010