DOMEO ANNOTATION TOOLKITAND TEXT MININGCREATING,   VISUALISING, CURATING AND SHARINGTEXT MINING RESULTSPaolo Ciccarese, Ph...
 Domeo Annotation Toolkit is a collection of software  components that allow to create and share  annotation of web docum...
ANNOTATION ONTOLOGY   OWL vocabulary for representing and sharing    annotation and semantic annotationof digital    reso...
DOMEO AND TEXT MINING SERVICES Domeo allows to trigger text mining algorithms  when they are available through web servic...
NCBO ANNOTATOR                                                            http://www.bioontology.org/annotator-service We...
DOMEO AND THE NCBO ANNOTATOR                                                       http://www.bioontology.org/annotator-se...
RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
NCBO ANNOTATOR RESULTS IN DOMEOList of recognizedentities
RESULTS CURATION                   Customizable
CUMULATIVE RESULTS CURATION One item only All instances with the same text match All instances independently from the t...
SERIALIZATION IN AO/RDF
SOFTWARE CONNECTORSAt the current stage For each text mining service we have to write a  specific connector that normally...
UIMA, CLEREZZA AND AOOSS BASED    INFRASTRUCTURE FOR TEXT MINING OVERONTOLOGIESTommasoTeofili and Paolo Ciccaresetommaso@a...
APACHE UIMA Architecturalframework for UIM OASIS standard Build, deploy and run text mining pipelines Scaling capabili...
UIMA TYPES Defining annotation domain in Typesystems Types and features are just declared Existing Typesystemscan be  i...
APACHE CLEREZZA Service platform for linked data OSGi-based RDF API RESTful Web Service Framework TripleStore indepen...
UIMA/CLEREZZA CONVENTION devs  can create custom types / typesystems need to manage URIs integration of services vs ont...
CLEREZZABASEANNOTATION DESCRIPTOR
CLEREZZABASEENTITYDESCRIPTOR
BEFORE
AFTER (URI FIELD INHERITED)
CONVERSION STRATEGIES UIMA  annotations stored inside CAS Services “talking” via webservices + RDF CAS to RDF mapping v...
CONVERSION STRATEGIESChange mapping strategies via XML/Eclipse pluginOr in the descriptor directly <nameValuePair> <name>m...
CLEREZZA WEB SERVICES EXAMPLE
LOOKING AHEADDOMEO TOOLKIT V. 2Paolo Ciccarese, PhD
DOMEO ANNOTATION TOOLKIT V.2 DomeoAnnotation Toolkit v.2 is planned by the end  of the first quarter of 2012 It will con...
DOMEO FEDERATION We currently have two instances of the Domeo  Toolkit and the number of instances is going to  increase...
Annotation Flow                                                                         Web Service  DOMEO FEDERATION     ...
SOFTWARE ANNOTATION ACCESSNodes can access annotations of other nodes through Through Web Services       Annotation by U...
USERS ANNOTATION ACCESSUsers can export their own annotation in AO RDF   Annotation by document   Annotation by corpora ...
RequestCURRENT DOMEO ARCHITECTURE                              Annotation                              Domeo              ...
DOMEO NODE ARCHITECTURE> ACCESSING EXTERNAL ANNOTATION Other          1                                         2         ...
DOMEO NODE ARCHITECTURE> ADDING A SPARQL ENDPOINT Other                                            External Domeo         ...
DOMEO NODE ARCHITECTURE    > TEXT MINING ALGORITHMS INTEGRATION     Other                                                 ...
DOMEO AND TEXT MININGIN SUMMARY   Run algorithms within Domeo     Making available the algorithms through Web Services  ...
W3C COMMUNITY GROUPOPEN ANNOTATION Annotation Ontology (AO) and Open Annotation  Collaboration (OAC) are merging Unified...
THANK YOU!If you are interested in using - or contributing to -the Domeo Annotation Toolkit follow our websitehttp://annot...
Upcoming SlideShare
Loading in...5
×

Domeo, Text Mining, UIMA and Clerezza

2,295

Published on

Paolo Ciccarese and Tommaso Teofili

These slides present
- current facilities and future plans for the Domeo Annotation Toolkit relating specifically to textmining use cases.
- and details of the integration of the Domeo Annotation Toolkit with Apache UIMA through Apache Clerezza.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,295
On Slideshare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Domeo, Text Mining, UIMA and Clerezza

  1. 1. DOMEO ANNOTATION TOOLKITAND TEXT MININGCREATING, VISUALISING, CURATING AND SHARINGTEXT MINING RESULTSPaolo Ciccarese, PhDpaolo.ciccarese@gmail.comJanuary 30th 2012, W3C Scientific Discourse Call
  2. 2.  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments It can export and exchange all the annotation in Annotation Ontology (AO) RDF format The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser http://annotationframework.org/
  3. 3. ANNOTATION ONTOLOGY OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest http://purl.org/ao/home  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance
  4. 4. DOMEO AND TEXT MINING SERVICES Domeo allows to trigger text mining algorithms when they are available through web services Software connectors have to be developed to translate the results in a suitable format The results are displayed in the web documents Users can record their feedback/judgment through customizable user interfaces
  5. 5. NCBO ANNOTATOR http://www.bioontology.org/annotator-service Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts It is possible to preselect the ontologies of interests as one of the many parameters
  6. 6. DOMEO AND THE NCBO ANNOTATOR http://www.bioontology.org/annotator-service Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
  7. 7. RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
  8. 8. NCBO ANNOTATOR RESULTS IN DOMEOList of recognizedentities
  9. 9. RESULTS CURATION Customizable
  10. 10. CUMULATIVE RESULTS CURATION One item only All instances with the same text match All instances independently from the text match
  11. 11. SERIALIZATION IN AO/RDF
  12. 12. SOFTWARE CONNECTORSAt the current stage For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix And keep it up to date!
  13. 13. UIMA, CLEREZZA AND AOOSS BASED INFRASTRUCTURE FOR TEXT MINING OVERONTOLOGIESTommasoTeofili and Paolo Ciccaresetommaso@apache.org
  14. 14. APACHE UIMA Architecturalframework for UIM OASIS standard Build, deploy and run text mining pipelines Scaling capabilities for large volumes of data NLP/TM algorithms wrapped as Analysis Engines http://uima.apache.org/
  15. 15. UIMA TYPES Defining annotation domain in Typesystems Types and features are just declared Existing Typesystemscan be imported/exported/enhanced Ease data exchange between AEs Two “main” types  TOP  Annotation
  16. 16. APACHE CLEREZZA Service platform for linked data OSGi-based RDF API RESTful Web Service Framework TripleStore independent Integrated with Apache UIMA http://incubator.apache.org/clerezza/
  17. 17. UIMA/CLEREZZA CONVENTION devs can create custom types / typesystems need to manage URIs integration of services vs ontology sharing ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
  18. 18. CLEREZZABASEANNOTATION DESCRIPTOR
  19. 19. CLEREZZABASEENTITYDESCRIPTOR
  20. 20. BEFORE
  21. 21. AFTER (URI FIELD INHERITED)
  22. 22. CONVERSION STRATEGIES UIMA annotations stored inside CAS Services “talking” via webservices + RDF CAS to RDF mapping via Clerezza Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
  23. 23. CONVERSION STRATEGIESChange mapping strategies via XML/Eclipse pluginOr in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
  24. 24. CLEREZZA WEB SERVICES EXAMPLE
  25. 25. LOOKING AHEADDOMEO TOOLKIT V. 2Paolo Ciccarese, PhD
  26. 26. DOMEO ANNOTATION TOOLKIT V.2 DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012 It will consist in major refactoring to improve modularity and make plug-ins writing easier It will include various new features and will be the first step towards a federated architecture It will be open source!
  27. 27. DOMEO FEDERATION We currently have two instances of the Domeo Toolkit and the number of instances is going to increase We need to define a clean architecture that supports communication between instances or nodes Instances should be able to access each other annotations in multiple ways
  28. 28. Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQLEx: DT3 retrieves annotation from DT1 through a web serviceand from DT2 through a SPARQL query against its triplestore
  29. 29. SOFTWARE ANNOTATION ACCESSNodes can access annotations of other nodes through Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  … SPARQL queries, when a SPARQL end-point is available
  30. 30. USERS ANNOTATION ACCESSUsers can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
  31. 31. RequestCURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  32. 32. DOMEO NODE ARCHITECTURE> ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services ConnectorDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  33. 33. DOMEO NODE ARCHITECTURE> ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector TriplestoreDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  34. 34. DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
  35. 35. DOMEO AND TEXT MININGIN SUMMARY Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture. Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
  36. 36. W3C COMMUNITY GROUPOPEN ANNOTATION Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging Unified model for representing and sharing annotation in RDF http://www.w3.org/community/openannotation/
  37. 37. THANK YOU!If you are interested in using - or contributing to -the Domeo Annotation Toolkit follow our websitehttp://annotationframework.org or contactpaolo.ciccarese -at- gmail.com

×