Domeo, Text Mining, UIMA and Clerezza

  • 1,988 views
Uploaded on

Paolo Ciccarese and Tommaso Teofili …

Paolo Ciccarese and Tommaso Teofili

These slides present
- current facilities and future plans for the Domeo Annotation Toolkit relating specifically to textmining use cases.
- and details of the integration of the Domeo Annotation Toolkit with Apache UIMA through Apache Clerezza.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,988
On Slideshare
0
From Embeds
0
Number of Embeds
16

Actions

Shares
Downloads
0
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DOMEO ANNOTATION TOOLKITAND TEXT MININGCREATING, VISUALISING, CURATING AND SHARINGTEXT MINING RESULTSPaolo Ciccarese, PhDpaolo.ciccarese@gmail.comJanuary 30th 2012, W3C Scientific Discourse Call
  • 2.  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments It can export and exchange all the annotation in Annotation Ontology (AO) RDF format The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser http://annotationframework.org/
  • 3. ANNOTATION ONTOLOGY OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest http://purl.org/ao/home  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance
  • 4. DOMEO AND TEXT MINING SERVICES Domeo allows to trigger text mining algorithms when they are available through web services Software connectors have to be developed to translate the results in a suitable format The results are displayed in the web documents Users can record their feedback/judgment through customizable user interfaces
  • 5. NCBO ANNOTATOR http://www.bioontology.org/annotator-service Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts It is possible to preselect the ontologies of interests as one of the many parameters
  • 6. DOMEO AND THE NCBO ANNOTATOR http://www.bioontology.org/annotator-service Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
  • 7. RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
  • 8. NCBO ANNOTATOR RESULTS IN DOMEOList of recognizedentities
  • 9. RESULTS CURATION Customizable
  • 10. CUMULATIVE RESULTS CURATION One item only All instances with the same text match All instances independently from the text match
  • 11. SERIALIZATION IN AO/RDF
  • 12. SOFTWARE CONNECTORSAt the current stage For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix And keep it up to date!
  • 13. UIMA, CLEREZZA AND AOOSS BASED INFRASTRUCTURE FOR TEXT MINING OVERONTOLOGIESTommasoTeofili and Paolo Ciccaresetommaso@apache.org
  • 14. APACHE UIMA Architecturalframework for UIM OASIS standard Build, deploy and run text mining pipelines Scaling capabilities for large volumes of data NLP/TM algorithms wrapped as Analysis Engines http://uima.apache.org/
  • 15. UIMA TYPES Defining annotation domain in Typesystems Types and features are just declared Existing Typesystemscan be imported/exported/enhanced Ease data exchange between AEs Two “main” types  TOP  Annotation
  • 16. APACHE CLEREZZA Service platform for linked data OSGi-based RDF API RESTful Web Service Framework TripleStore independent Integrated with Apache UIMA http://incubator.apache.org/clerezza/
  • 17. UIMA/CLEREZZA CONVENTION devs can create custom types / typesystems need to manage URIs integration of services vs ontology sharing ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
  • 18. CLEREZZABASEANNOTATION DESCRIPTOR
  • 19. CLEREZZABASEENTITYDESCRIPTOR
  • 20. BEFORE
  • 21. AFTER (URI FIELD INHERITED)
  • 22. CONVERSION STRATEGIES UIMA annotations stored inside CAS Services “talking” via webservices + RDF CAS to RDF mapping via Clerezza Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
  • 23. CONVERSION STRATEGIESChange mapping strategies via XML/Eclipse pluginOr in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
  • 24. CLEREZZA WEB SERVICES EXAMPLE
  • 25. LOOKING AHEADDOMEO TOOLKIT V. 2Paolo Ciccarese, PhD
  • 26. DOMEO ANNOTATION TOOLKIT V.2 DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012 It will consist in major refactoring to improve modularity and make plug-ins writing easier It will include various new features and will be the first step towards a federated architecture It will be open source!
  • 27. DOMEO FEDERATION We currently have two instances of the Domeo Toolkit and the number of instances is going to increase We need to define a clean architecture that supports communication between instances or nodes Instances should be able to access each other annotations in multiple ways
  • 28. Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQLEx: DT3 retrieves annotation from DT1 through a web serviceand from DT2 through a SPARQL query against its triplestore
  • 29. SOFTWARE ANNOTATION ACCESSNodes can access annotations of other nodes through Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  … SPARQL queries, when a SPARQL end-point is available
  • 30. USERS ANNOTATION ACCESSUsers can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
  • 31. RequestCURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 32. DOMEO NODE ARCHITECTURE> ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services ConnectorDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 33. DOMEO NODE ARCHITECTURE> ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector TriplestoreDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 34. DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
  • 35. DOMEO AND TEXT MININGIN SUMMARY Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture. Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
  • 36. W3C COMMUNITY GROUPOPEN ANNOTATION Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging Unified model for representing and sharing annotation in RDF http://www.w3.org/community/openannotation/
  • 37. THANK YOU!If you are interested in using - or contributing to -the Domeo Annotation Toolkit follow our websitehttp://annotationframework.org or contactpaolo.ciccarese -at- gmail.com