SlideShare a Scribd company logo
1 of 37
DOMEO ANNOTATION TOOLKIT
AND TEXT MINING


CREATING,   VISUALISING, CURATING AND SHARING
TEXT MINING RESULTS

Paolo Ciccarese, PhD
paolo.ciccarese@gmail.com


January 30th 2012, W3C Scientific Discourse Call
 Domeo Annotation Toolkit is a collection of software
  components that allow to create and share
  annotation of web documents and their fragments
 It can export and exchange all the annotation in
  Annotation Ontology (AO) RDF format
 The Domeo client is the user interface that can be
  used to produce manual and semi-automatic
  annotation of HTML documents directly in your
  browser


                              http://annotationframework.org/
ANNOTATION ONTOLOGY
   OWL vocabulary for representing and sharing
    annotation and semantic annotationof digital
    resources and their fragments:
       Is orthogonal to the domain(s) of interest




                                                     http://purl.org/ao/home
       Supports Stand-off annotation
       Offers tools for identifying fragments
       Designed with extension points
       Defines basic annotation containers
       Supports versioning
       Tracks provenance
DOMEO AND TEXT MINING SERVICES
 Domeo allows to trigger text mining algorithms
  when they are available through web services
 Software connectors have to be developed to
  translate the results in a suitable format
 The results are displayed in the web documents

 Users can record their feedback/judgment through
  customizable user interfaces
NCBO ANNOTATOR




                                                            http://www.bioontology.org/annotator-service
 Web service that annotates textual metadata (e.g.
  journal abstract) with relevant ontology concepts
 It is possible to preselect the ontologies of interests
  as one of the many parameters
DOMEO AND THE NCBO ANNOTATOR




                                                       http://www.bioontology.org/annotator-service
   Domeo allows automatic/manual annotation with
    terms coming from selected ontologies managed by
    the BioPortal
RUNNING NCBO ANNOTATOR




 Additional text mining services
 will be listed here
NCBO ANNOTATOR RESULTS IN DOMEO




List of recognized
entities
RESULTS CURATION

                   Customizable
CUMULATIVE RESULTS CURATION
 One item only
 All instances with the same text match

 All instances independently from the text match
SERIALIZATION IN AO/RDF
SOFTWARE CONNECTORS
At the current stage
 For each text mining service we have to write a
  specific connector that normally is translating offset
  and range into prefix and postfix
 And keep it up to date!
UIMA, CLEREZZA AND AO
OSS BASED    INFRASTRUCTURE FOR TEXT MINING OVER
ONTOLOGIES

TommasoTeofili and Paolo Ciccarese
tommaso@apache.org
APACHE UIMA
 Architecturalframework for UIM
 OASIS standard

 Build, deploy and run text mining pipelines

 Scaling capabilities for large volumes of data

 NLP/TM algorithms wrapped as Analysis Engines




                                   http://uima.apache.org/
UIMA TYPES
 Defining annotation domain in Typesystems
 Types and features are just declared

 Existing Typesystemscan be
  imported/exported/enhanced
 Ease data exchange between AEs

 Two “main” types
   TOP
   Annotation
APACHE CLEREZZA
 Service platform for linked data
 OSGi-based

 RDF API

 RESTful Web Service Framework

 TripleStore independent

 Integrated with Apache UIMA




                          http://incubator.apache.org/clerezza/
UIMA/CLEREZZA CONVENTION
 devs  can create custom types / typesystems
 need to manage URIs

 integration of services vs ontology sharing

 ClerezzaTypeSystem
     ClerezzaBaseAnnotation
         uri
     ClerezzaBaseEntity
       uri
       label (rdfs:label)

       references (annotations referring this entity)

     service specific annotations and entity types are defined
      subclassing the above
CLEREZZABASEANNOTATION DESCRIPTOR
CLEREZZABASEENTITYDESCRIPTOR
BEFORE
AFTER (URI FIELD INHERITED)
CONVERSION STRATEGIES
 UIMA  annotations stored inside CAS
 Services “talking” via webservices + RDF

 CAS to RDF mapping via Clerezza

 Pluggable mapping strategies
   Clerezza Default
   AnnotationOntology
   …
CONVERSION STRATEGIES
Change mapping strategies via XML/Eclipse plugin




Or in the descriptor directly
 <nameValuePair>
 <name>mappingStrategy</name>
 <value><string>ao</string></value>
 </nameValuePair>
CLEREZZA WEB SERVICES EXAMPLE
LOOKING AHEAD
DOMEO TOOLKIT V. 2

Paolo Ciccarese, PhD
DOMEO ANNOTATION TOOLKIT V.2
 DomeoAnnotation Toolkit v.2 is planned by the end
  of the first quarter of 2012
 It will consist in major refactoring to improve
  modularity and make plug-ins writing easier
 It will include various new features and will be the
  first step towards a federated architecture
 It will be open source!
DOMEO FEDERATION
 We currently have two instances of the Domeo
  Toolkit and the number of instances is going to
  increase
 We need to define a clean architecture that
  supports communication between instances or
  nodes
 Instances should be able to access each other
  annotations in multiple ways
Annotation Flow
                                                                         Web Service
  DOMEO FEDERATION                                                       Triplestore



      Domeo                                        Domeo    Web Client
               Web Client
      Node 1                                       Node 2




                                          SPARQL
                                      Web Client
                             Domeo                                         DomeoN
                             Node 3                                         ode 4
                    SPARQL




Ex: DT3 retrieves annotation from DT1 through a web service
and from DT2 through a SPARQL query against its triplestore
SOFTWARE ANNOTATION ACCESS
Nodes can access annotations of other nodes through
 Through Web Services
       Annotation by User
       Annotation by Group
       Annotation by Document
       Annotation by Corpora
       …
   SPARQL queries, when a SPARQL end-point is available
USERS ANNOTATION ACCESS
Users can export their own annotation in AO RDF
   Annotation by document
   Annotation by corpora
   All of the annotation
Request
CURRENT DOMEO ARCHITECTURE                              Annotation


                              Domeo
                              Web Client
                    AO-RDF




                Annotation
               Web Services



                               Domeo
                                                           User
                                           MySQL           Annotation
                                                           Export
 Text Mining                                       UI
 Connector




   NCBO
 Web Service

  NCBO
 Annotator
DOMEO NODE ARCHITECTURE
> ACCESSING EXTERNAL ANNOTATION
 Other          1                                         2
                                            External
 Domeo                        Domeo
                                           Triplestore
  Node                        Web Client
                    AO-RDF
                                           SPARQL

     AO-RDF                                   AO-RDF


                Annotation                 Triple Store
               Web Services                Connector



Domeo v.2 Node
                                                                   User
                                           MySQL                   Annotation
                                                                   Export
 Text Mining                                                  UI
 Connector




   NCBO
 Web Service

   NCBO
  Annotator
DOMEO NODE ARCHITECTURE
> ADDING A SPARQL ENDPOINT
 Other
                                            External
 Domeo                        Domeo
                                           Triplestore
  Node                        Web Client
                    AO-RDF
                                           SPARQL

     AO-RDF                                   AO-RDF


                Annotation                 Triple Store    SPARQL
               Web Services                Connector

                                                          Triplestore
Domeo v.2 Node
                                                                        User
                                           MySQL                        Annotation
                                                                        Export
 Text Mining                                                      UI
 Connector




   NCBO
 Web Service

   NCBO
  Annotator
DOMEO NODE ARCHITECTURE
    > TEXT MINING ALGORITHMS INTEGRATION
     Other                                                                     1
                                                                 External
     Domeo                            Domeo
                                                                Triplestore
      Node                            Web Client
                        AO-RDF
                                                                SPARQL

         AO-RDF                                                    AO-RDF


                    Annotation                                  Triple Store        SPARQL
                   Web Services                                 Connector

                                                                                   Triplestore
    Domeo v.2 Node
                              3                                 MySQL                            User
                                                                                                 Annotation
                                                                                                 Export
     Text Mining      Clerezza                Text Mining                                  UI
     Connector        Connector               Connector
2                                                           4


       NCBO            Clerezza               Text Mining
                                    Library




     Web Service      Web Service              Manager

       NCBO              UIMA                 Text Mining
      Annotator        Algorithm               Algorithm
DOMEO AND TEXT MINING
IN SUMMARY
   Run algorithms within Domeo
     Making available the algorithms through Web Services
     Integrating the algorithms - as libraries – within the
      Domeo architecture.
   Run algorithms separately and then
     Load the results into a Domeo node through web
      services
     Store the results directly in the (a) triplestore
     Store the results directly in the database
W3C COMMUNITY GROUP
OPEN ANNOTATION
 Annotation Ontology (AO) and Open Annotation
  Collaboration (OAC) are merging
 Unified model for representing and sharing
  annotation in RDF




                 http://www.w3.org/community/openannotation/
THANK YOU!
If you are interested in using - or contributing to -
the Domeo Annotation Toolkit follow our website
http://annotationframework.org or contact
paolo.ciccarese -at- gmail.com

More Related Content

Viewers also liked

Viewers also liked (20)

Gislaine
GislaineGislaine
Gislaine
 
Rfgdgfdfgdgdg
RfgdgfdfgdgdgRfgdgfdfgdgdg
Rfgdgfdfgdgdg
 
Linked in pp
Linked in ppLinked in pp
Linked in pp
 
Reference letter
Reference letterReference letter
Reference letter
 
635566847895062650 (1)
635566847895062650 (1)635566847895062650 (1)
635566847895062650 (1)
 
Fichas de animales (3)
Fichas de animales (3)Fichas de animales (3)
Fichas de animales (3)
 
Awdq
AwdqAwdq
Awdq
 
Aut oct12 13
Aut oct12 13Aut oct12 13
Aut oct12 13
 
Fotos
FotosFotos
Fotos
 
Aula obst verbal ok 12 09
Aula obst verbal ok 12 09Aula obst verbal ok 12 09
Aula obst verbal ok 12 09
 
Rd4
Rd4Rd4
Rd4
 
1465882066-106511783
1465882066-1065117831465882066-106511783
1465882066-106511783
 
Doc arquivos da cidade
Doc arquivos da cidadeDoc arquivos da cidade
Doc arquivos da cidade
 
Oficina origami folheto
Oficina origami folhetoOficina origami folheto
Oficina origami folheto
 
Documentos slide
Documentos slideDocumentos slide
Documentos slide
 
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
 
SherlockNet
SherlockNet SherlockNet
SherlockNet
 
Valoriser le numérique en médiathèque
Valoriser le numérique en médiathèqueValoriser le numérique en médiathèque
Valoriser le numérique en médiathèque
 
Everything about pest
Everything about pestEverything about pest
Everything about pest
 
10/13/16 Breakout Session III: The Role of Rural Education and Knowledge Systems
10/13/16 Breakout Session III: The Role of Rural Education and Knowledge Systems10/13/16 Breakout Session III: The Role of Rural Education and Knowledge Systems
10/13/16 Breakout Session III: The Role of Rural Education and Knowledge Systems
 

Similar to Domeo, Text Mining, UIMA and Clerezza

Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)ukdpe
 
Html 5 Revolution
Html 5 RevolutionHtml 5 Revolution
Html 5 RevolutionAlex Ivy
 
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns FrameworksMike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns Frameworksukdpe
 
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresIntroduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresSandro Pereira
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?Thomas Roessler
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Eb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnEb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnValtech
 
Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Alexandre Morgaut
 
Donabe-essex-conference-readout
Donabe-essex-conference-readoutDonabe-essex-conference-readout
Donabe-essex-conference-readoutDebojyoti Dutta
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptmartinlippert
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5David Nuescheler
 
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...James Broberg
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Exposing Business Value
Exposing Business ValueExposing Business Value
Exposing Business ValueESUG
 
Loadrunner Protocol bundle list
Loadrunner Protocol bundle listLoadrunner Protocol bundle list
Loadrunner Protocol bundle listBharath Marrivada
 
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)Saltlux zinyus
 
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)zinyus
 

Similar to Domeo, Text Mining, UIMA and Clerezza (20)

Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)
 
Html 5 Revolution
Html 5 RevolutionHtml 5 Revolution
Html 5 Revolution
 
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns FrameworksMike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
 
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresIntroduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Eb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnEb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management En
 
Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29
 
Corba
CorbaCorba
Corba
 
Donabe-essex-conference-readout
Donabe-essex-conference-readoutDonabe-essex-conference-readout
Donabe-essex-conference-readout
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScript
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
 
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Exposing Business Value
Exposing Business ValueExposing Business Value
Exposing Business Value
 
Loadrunner Protocol bundle list
Loadrunner Protocol bundle listLoadrunner Protocol bundle list
Loadrunner Protocol bundle list
 
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
 
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
 

More from Tommaso Teofili

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRTommaso Teofili
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakTommaso Teofili
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in SlingTommaso Teofili
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr Tommaso Teofili
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache HamaTommaso Teofili
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiTommaso Teofili
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in SolrTommaso Teofili
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on codeTommaso Teofili
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA IntroductionTommaso Teofili
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU TourTommaso Teofili
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationTommaso Teofili
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic SearchTommaso Teofili
 

More from Tommaso Teofili (19)

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IR
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit Oak
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGi
 
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on code
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU Tour
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata Generation
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic Search
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Domeo, Text Mining, UIMA and Clerezza

  • 1. DOMEO ANNOTATION TOOLKIT AND TEXT MINING CREATING, VISUALISING, CURATING AND SHARING TEXT MINING RESULTS Paolo Ciccarese, PhD paolo.ciccarese@gmail.com January 30th 2012, W3C Scientific Discourse Call
  • 2.  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments  It can export and exchange all the annotation in Annotation Ontology (AO) RDF format  The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser http://annotationframework.org/
  • 3. ANNOTATION ONTOLOGY  OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest http://purl.org/ao/home  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance
  • 4. DOMEO AND TEXT MINING SERVICES  Domeo allows to trigger text mining algorithms when they are available through web services  Software connectors have to be developed to translate the results in a suitable format  The results are displayed in the web documents  Users can record their feedback/judgment through customizable user interfaces
  • 5. NCBO ANNOTATOR http://www.bioontology.org/annotator-service  Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts  It is possible to preselect the ontologies of interests as one of the many parameters
  • 6. DOMEO AND THE NCBO ANNOTATOR http://www.bioontology.org/annotator-service  Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
  • 7. RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
  • 8. NCBO ANNOTATOR RESULTS IN DOMEO List of recognized entities
  • 9. RESULTS CURATION Customizable
  • 10. CUMULATIVE RESULTS CURATION  One item only  All instances with the same text match  All instances independently from the text match
  • 12. SOFTWARE CONNECTORS At the current stage  For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix  And keep it up to date!
  • 13. UIMA, CLEREZZA AND AO OSS BASED INFRASTRUCTURE FOR TEXT MINING OVER ONTOLOGIES TommasoTeofili and Paolo Ciccarese tommaso@apache.org
  • 14. APACHE UIMA  Architecturalframework for UIM  OASIS standard  Build, deploy and run text mining pipelines  Scaling capabilities for large volumes of data  NLP/TM algorithms wrapped as Analysis Engines http://uima.apache.org/
  • 15. UIMA TYPES  Defining annotation domain in Typesystems  Types and features are just declared  Existing Typesystemscan be imported/exported/enhanced  Ease data exchange between AEs  Two “main” types  TOP  Annotation
  • 16. APACHE CLEREZZA  Service platform for linked data  OSGi-based  RDF API  RESTful Web Service Framework  TripleStore independent  Integrated with Apache UIMA http://incubator.apache.org/clerezza/
  • 17. UIMA/CLEREZZA CONVENTION  devs can create custom types / typesystems  need to manage URIs  integration of services vs ontology sharing  ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
  • 21. AFTER (URI FIELD INHERITED)
  • 22. CONVERSION STRATEGIES  UIMA annotations stored inside CAS  Services “talking” via webservices + RDF  CAS to RDF mapping via Clerezza  Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
  • 23. CONVERSION STRATEGIES Change mapping strategies via XML/Eclipse plugin Or in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
  • 25. LOOKING AHEAD DOMEO TOOLKIT V. 2 Paolo Ciccarese, PhD
  • 26. DOMEO ANNOTATION TOOLKIT V.2  DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012  It will consist in major refactoring to improve modularity and make plug-ins writing easier  It will include various new features and will be the first step towards a federated architecture  It will be open source!
  • 27. DOMEO FEDERATION  We currently have two instances of the Domeo Toolkit and the number of instances is going to increase  We need to define a clean architecture that supports communication between instances or nodes  Instances should be able to access each other annotations in multiple ways
  • 28. Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQL Ex: DT3 retrieves annotation from DT1 through a web service and from DT2 through a SPARQL query against its triplestore
  • 29. SOFTWARE ANNOTATION ACCESS Nodes can access annotations of other nodes through  Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  …  SPARQL queries, when a SPARQL end-point is available
  • 30. USERS ANNOTATION ACCESS Users can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
  • 31. Request CURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 32. DOMEO NODE ARCHITECTURE > ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services Connector Domeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 33. DOMEO NODE ARCHITECTURE > ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 34. DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector 2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
  • 35. DOMEO AND TEXT MINING IN SUMMARY  Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture.  Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
  • 36. W3C COMMUNITY GROUP OPEN ANNOTATION  Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging  Unified model for representing and sharing annotation in RDF http://www.w3.org/community/openannotation/
  • 37. THANK YOU! If you are interested in using - or contributing to - the Domeo Annotation Toolkit follow our website http://annotationframework.org or contact paolo.ciccarese -at- gmail.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n