Domeo, Text Mining, UIMA and Clerezza

2,260 views

Published on

Creating, visualizing, curating and sharing text mining results

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,260
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Domeo, Text Mining, UIMA and Clerezza

    1. 1. DOMEO ANNOTATION TOOLKITAND TEXT MININGCREATING, VISUALISING, CURATING AND SHARINGTEXT MINING RESULTSPaolo Ciccarese, PhDpaolo.ciccarese@gmail.comJanuary 30th 2012, W3C Scientific Discourse Call
    2. 2.  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments It can export and exchange all the annotation in Annotation Ontology (AO) RDF format The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser http://annotationframework.org/
    3. 3. ANNOTATION ONTOLOGY OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest http://purl.org/ao/home  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance
    4. 4. DOMEO AND TEXT MINING SERVICES Domeo allows to trigger text mining algorithms when they are available through web services Software connectors have to be developed to translate the results in a suitable format The results are displayed in the web documents Users can record their feedback/judgment through customizable user interfaces
    5. 5. NCBO ANNOTATOR http://www.bioontology.org/annotator-service Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts It is possible to preselect the ontologies of interests as one of the many parameters
    6. 6. DOMEO AND THE NCBO ANNOTATOR http://www.bioontology.org/annotator-service Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
    7. 7. RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
    8. 8. NCBO ANNOTATOR RESULTS IN DOMEOList of recognizedentities
    9. 9. RESULTS CURATION Customizable
    10. 10. CUMULATIVE RESULTS CURATION One item only All instances with the same text match All instances independently from the text match
    11. 11. SERIALIZATION IN AO/RDF
    12. 12. SOFTWARE CONNECTORSAt the current stage For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix And keep it up to date!
    13. 13. UIMA, CLEREZZA AND AOOSS BASED INFRASTRUCTURE FOR TEXT MINING OVERONTOLOGIESTommasoTeofili and Paolo Ciccaresetommaso@apache.org
    14. 14. APACHE UIMA Architecturalframework for UIM OASIS standard Build, deploy and run text mining pipelines Scaling capabilities for large volumes of data NLP/TM algorithms wrapped as Analysis Engines http://uima.apache.org/
    15. 15. UIMA TYPES Defining annotation domain in Typesystems Types and features are just declared Existing Typesystemscan be imported/exported/enhanced Ease data exchange between AEs Two “main” types  TOP  Annotation
    16. 16. APACHE CLEREZZA Service platform for linked data OSGi-based RDF API RESTful Web Service Framework TripleStore independent Integrated with Apache UIMA http://incubator.apache.org/clerezza/
    17. 17. UIMA/CLEREZZA CONVENTION devs can create custom types / typesystems need to manage URIs integration of services vs ontology sharing ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
    18. 18. CLEREZZABASEANNOTATION DESCRIPTOR
    19. 19. CLEREZZABASEENTITYDESCRIPTOR
    20. 20. BEFORE
    21. 21. AFTER (URI FIELD INHERITED)
    22. 22. CONVERSION STRATEGIES UIMA annotations stored inside CAS Services “talking” via webservices + RDF CAS to RDF mapping via Clerezza Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
    23. 23. CONVERSION STRATEGIESChange mapping strategies via XML/Eclipse pluginOr in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
    24. 24. CLEREZZA WEB SERVICES EXAMPLE
    25. 25. LOOKING AHEADDOMEO TOOLKIT V. 2Paolo Ciccarese, PhD
    26. 26. DOMEO ANNOTATION TOOLKIT V.2 DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012 It will consist in major refactoring to improve modularity and make plug-ins writing easier It will include various new features and will be the first step towards a federated architecture It will be open source!
    27. 27. DOMEO FEDERATION We currently have two instances of the Domeo Toolkit and the number of instances is going to increase We need to define a clean architecture that supports communication between instances or nodes Instances should be able to access each other annotations in multiple ways
    28. 28. Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQLEx: DT3 retrieves annotation from DT1 through a web serviceand from DT2 through a SPARQL query against its triplestore
    29. 29. SOFTWARE ANNOTATION ACCESSNodes can access annotations of other nodes through Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  … SPARQL queries, when a SPARQL end-point is available
    30. 30. USERS ANNOTATION ACCESSUsers can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
    31. 31. RequestCURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
    32. 32. DOMEO NODE ARCHITECTURE> ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services ConnectorDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
    33. 33. DOMEO NODE ARCHITECTURE> ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector TriplestoreDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
    34. 34. DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
    35. 35. DOMEO AND TEXT MININGIN SUMMARY Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture. Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
    36. 36. W3C COMMUNITY GROUPOPEN ANNOTATION Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging Unified model for representing and sharing annotation in RDF http://www.w3.org/community/openannotation/
    37. 37. THANK YOU!If you are interested in using - or contributing to -the Domeo Annotation Toolkit follow our websitehttp://annotationframework.org or contactpaolo.ciccarese -at- gmail.com

    ×