Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Domeo, Text Mining, UIMA and Clerezza


Published on

Creating, visualizing, curating and sharing text mining results

Published in: Technology
  • Be the first to comment

Domeo, Text Mining, UIMA and Clerezza

  1. 1. DOMEO ANNOTATION TOOLKITAND TEXT MININGCREATING, VISUALISING, CURATING AND SHARINGTEXT MINING RESULTSPaolo Ciccarese, PhDpaolo.ciccarese@gmail.comJanuary 30th 2012, W3C Scientific Discourse Call
  2. 2.  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments It can export and exchange all the annotation in Annotation Ontology (AO) RDF format The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser
  3. 3. ANNOTATION ONTOLOGY OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance
  4. 4. DOMEO AND TEXT MINING SERVICES Domeo allows to trigger text mining algorithms when they are available through web services Software connectors have to be developed to translate the results in a suitable format The results are displayed in the web documents Users can record their feedback/judgment through customizable user interfaces
  5. 5. NCBO ANNOTATOR Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts It is possible to preselect the ontologies of interests as one of the many parameters
  6. 6. DOMEO AND THE NCBO ANNOTATOR Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
  7. 7. RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
  8. 8. NCBO ANNOTATOR RESULTS IN DOMEOList of recognizedentities
  9. 9. RESULTS CURATION Customizable
  10. 10. CUMULATIVE RESULTS CURATION One item only All instances with the same text match All instances independently from the text match
  12. 12. SOFTWARE CONNECTORSAt the current stage For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix And keep it up to date!
  14. 14. APACHE UIMA Architecturalframework for UIM OASIS standard Build, deploy and run text mining pipelines Scaling capabilities for large volumes of data NLP/TM algorithms wrapped as Analysis Engines
  15. 15. UIMA TYPES Defining annotation domain in Typesystems Types and features are just declared Existing Typesystemscan be imported/exported/enhanced Ease data exchange between AEs Two “main” types  TOP  Annotation
  16. 16. APACHE CLEREZZA Service platform for linked data OSGi-based RDF API RESTful Web Service Framework TripleStore independent Integrated with Apache UIMA
  17. 17. UIMA/CLEREZZA CONVENTION devs can create custom types / typesystems need to manage URIs integration of services vs ontology sharing ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
  20. 20. BEFORE
  22. 22. CONVERSION STRATEGIES UIMA annotations stored inside CAS Services “talking” via webservices + RDF CAS to RDF mapping via Clerezza Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
  23. 23. CONVERSION STRATEGIESChange mapping strategies via XML/Eclipse pluginOr in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
  25. 25. LOOKING AHEADDOMEO TOOLKIT V. 2Paolo Ciccarese, PhD
  26. 26. DOMEO ANNOTATION TOOLKIT V.2 DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012 It will consist in major refactoring to improve modularity and make plug-ins writing easier It will include various new features and will be the first step towards a federated architecture It will be open source!
  27. 27. DOMEO FEDERATION We currently have two instances of the Domeo Toolkit and the number of instances is going to increase We need to define a clean architecture that supports communication between instances or nodes Instances should be able to access each other annotations in multiple ways
  28. 28. Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQLEx: DT3 retrieves annotation from DT1 through a web serviceand from DT2 through a SPARQL query against its triplestore
  29. 29. SOFTWARE ANNOTATION ACCESSNodes can access annotations of other nodes through Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  … SPARQL queries, when a SPARQL end-point is available
  30. 30. USERS ANNOTATION ACCESSUsers can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
  31. 31. RequestCURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  32. 32. DOMEO NODE ARCHITECTURE> ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services ConnectorDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  33. 33. DOMEO NODE ARCHITECTURE> ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector TriplestoreDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  34. 34. DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
  35. 35. DOMEO AND TEXT MININGIN SUMMARY Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture. Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
  36. 36. W3C COMMUNITY GROUPOPEN ANNOTATION Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging Unified model for representing and sharing annotation in RDF
  37. 37. THANK YOU!If you are interested in using - or contributing to -the Domeo Annotation Toolkit follow our website or contactpaolo.ciccarese -at-