• Save
Domeo, Text Mining, UIMA and Clerezza
Upcoming SlideShare
Loading in...5
×
 

Domeo, Text Mining, UIMA and Clerezza

on

  • 2,317 views

Paolo Ciccarese and Tommaso Teofili ...

Paolo Ciccarese and Tommaso Teofili

These slides present
- current facilities and future plans for the Domeo Annotation Toolkit relating specifically to textmining use cases.
- and details of the integration of the Domeo Annotation Toolkit with Apache UIMA through Apache Clerezza.

Statistics

Views

Total Views
2,317
Views on SlideShare
1,848
Embed Views
469

Actions

Likes
4
Downloads
0
Comments
0

26 Embeds 469

http://hcklab.blogspot.com 313
http://hcklab.blogspot.in 21
http://hcklab.blogspot.ca 17
http://hcklab.blogspot.it 17
http://hcklab.blogspot.mx 11
http://hcklab.blogspot.de 11
http://hcklab.blogspot.jp 10
http://hcklab.blogspot.co.uk 9
http://hcklab.blogspot.ru 7
http://hcklab.blogspot.fr 7
http://hcklab.blogspot.com.es 7
http://hcklab.blogspot.com.au 6
http://hcklab.blogspot.com.ar 5
http://hcklab.blogspot.hu 4
https://www.google.com 4
http://hcklab.blogspot.cz 4
https://twitter.com 3
http://a0.twimg.com 3
http://hcklab.blogspot.ro 2
http://hcklab.blogspot.be 2
http://newsblur.com 1
http://hcklab.blogspot.com.br 1
http://hcklab.blogspot.ch 1
http://hcklab.blogspot.nl 1
http://prlog.ru 1
http://hcklab.blogspot.co.at 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Domeo, Text Mining, UIMA and Clerezza Domeo, Text Mining, UIMA and Clerezza Presentation Transcript

  • DOMEO ANNOTATION TOOLKITAND TEXT MININGCREATING, VISUALISING, CURATING AND SHARINGTEXT MINING RESULTSPaolo Ciccarese, PhDpaolo.ciccarese@gmail.comJanuary 30th 2012, W3C Scientific Discourse Call
  •  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments It can export and exchange all the annotation in Annotation Ontology (AO) RDF format The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser http://annotationframework.org/
  • ANNOTATION ONTOLOGY OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest http://purl.org/ao/home  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance View slide
  • DOMEO AND TEXT MINING SERVICES Domeo allows to trigger text mining algorithms when they are available through web services Software connectors have to be developed to translate the results in a suitable format The results are displayed in the web documents Users can record their feedback/judgment through customizable user interfaces View slide
  • NCBO ANNOTATOR http://www.bioontology.org/annotator-service Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts It is possible to preselect the ontologies of interests as one of the many parameters
  • DOMEO AND THE NCBO ANNOTATOR http://www.bioontology.org/annotator-service Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
  • RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
  • NCBO ANNOTATOR RESULTS IN DOMEOList of recognizedentities
  • RESULTS CURATION Customizable
  • CUMULATIVE RESULTS CURATION One item only All instances with the same text match All instances independently from the text match
  • SERIALIZATION IN AO/RDF
  • SOFTWARE CONNECTORSAt the current stage For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix And keep it up to date!
  • UIMA, CLEREZZA AND AOOSS BASED INFRASTRUCTURE FOR TEXT MINING OVERONTOLOGIESTommasoTeofili and Paolo Ciccaresetommaso@apache.org
  • APACHE UIMA Architecturalframework for UIM OASIS standard Build, deploy and run text mining pipelines Scaling capabilities for large volumes of data NLP/TM algorithms wrapped as Analysis Engines http://uima.apache.org/
  • UIMA TYPES Defining annotation domain in Typesystems Types and features are just declared Existing Typesystemscan be imported/exported/enhanced Ease data exchange between AEs Two “main” types  TOP  Annotation
  • APACHE CLEREZZA Service platform for linked data OSGi-based RDF API RESTful Web Service Framework TripleStore independent Integrated with Apache UIMA http://incubator.apache.org/clerezza/
  • UIMA/CLEREZZA CONVENTION devs can create custom types / typesystems need to manage URIs integration of services vs ontology sharing ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
  • CLEREZZABASEANNOTATION DESCRIPTOR
  • CLEREZZABASEENTITYDESCRIPTOR
  • BEFORE
  • AFTER (URI FIELD INHERITED)
  • CONVERSION STRATEGIES UIMA annotations stored inside CAS Services “talking” via webservices + RDF CAS to RDF mapping via Clerezza Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
  • CONVERSION STRATEGIESChange mapping strategies via XML/Eclipse pluginOr in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
  • CLEREZZA WEB SERVICES EXAMPLE
  • LOOKING AHEADDOMEO TOOLKIT V. 2Paolo Ciccarese, PhD
  • DOMEO ANNOTATION TOOLKIT V.2 DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012 It will consist in major refactoring to improve modularity and make plug-ins writing easier It will include various new features and will be the first step towards a federated architecture It will be open source!
  • DOMEO FEDERATION We currently have two instances of the Domeo Toolkit and the number of instances is going to increase We need to define a clean architecture that supports communication between instances or nodes Instances should be able to access each other annotations in multiple ways
  • Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQLEx: DT3 retrieves annotation from DT1 through a web serviceand from DT2 through a SPARQL query against its triplestore
  • SOFTWARE ANNOTATION ACCESSNodes can access annotations of other nodes through Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  … SPARQL queries, when a SPARQL end-point is available
  • USERS ANNOTATION ACCESSUsers can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
  • RequestCURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • DOMEO NODE ARCHITECTURE> ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services ConnectorDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • DOMEO NODE ARCHITECTURE> ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector TriplestoreDomeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
  • DOMEO AND TEXT MININGIN SUMMARY Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture. Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
  • W3C COMMUNITY GROUPOPEN ANNOTATION Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging Unified model for representing and sharing annotation in RDF http://www.w3.org/community/openannotation/
  • THANK YOU!If you are interested in using - or contributing to -the Domeo Annotation Toolkit follow our websitehttp://annotationframework.org or contactpaolo.ciccarese -at- gmail.com