Semantic Intensity Spectrum and Semantic Integration Algorithms

661 views

Published on

highlights of the CROSI project on semantic integration, introduction of semantic alignment algorithms

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
661
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Semantic Intensity Spectrum and Semantic Integration Algorithms

  1. 1. Semantic Intensity Spectrum <ul><li>At least 25 systems have been developed for ontology alignment, matching. </li></ul><ul><li>A classification technique for ontology alignment approaches </li></ul><ul><li>Based on semantic intensity </li></ul><ul><ul><li>Semantics: the intended meanings of ontological entities </li></ul></ul><ul><ul><li>Some methods consider only syntactical features (semantic poor) </li></ul></ul><ul><ul><ul><li>E.g. String distance, String equality </li></ul></ul></ul><ul><ul><li>Semantics are added via: </li></ul></ul><ul><ul><ul><li>meanings of words provided by external lexicons </li></ul></ul></ul><ul><ul><ul><li>Positions in taxonomies </li></ul></ul></ul><ul><ul><ul><li>Relations with other ontological entities </li></ul></ul></ul><ul><ul><ul><li>Logic entailment </li></ul></ul></ul><ul><ul><ul><li>Classified instance data </li></ul></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  2. 2. Semantic Intensity Spectrum SIS Diagram http://www.aktors.org/crosi/si-spectrum/ <ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  3. 3. Semantic Intensity Spectrum Existing systems <ul><li>Duplicate efforts in developing largely overlapped algorithms </li></ul><ul><ul><li>Re-implement algorithms e.g. string distance </li></ul></ul><ul><ul><li>Similar heuristic rules </li></ul></ul><ul><li>Different performance and different results w.r.t. the same test sets </li></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  4. 4. Semantic Intensity Spectrum Diversity of existing systems <ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  5. 5. Semantic Intensity Spectrum A possible solution <ul><li>Combine existing systems to minimise development efforts </li></ul><ul><li>Possibility of combining in a meaningful way </li></ul><ul><ul><li>Many systems output in compatible format </li></ul></ul><ul><ul><li>Heterogeneous outputs need to be normalised using heuristic rules, e.g. convert “more general than” into numeric values </li></ul></ul><ul><li>Reuse available packages </li></ul><ul><ul><li>E.g. SecondString for computing string distance </li></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  6. 6. A principled architecture <ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  7. 7. A principled architecture: Signature Extraction <ul><li>Ontologies can be captured with a set of ontological signatures </li></ul><ul><ul><li>Local signatures: </li></ul></ul><ul><ul><ul><li>Labels, IDs, and URIs </li></ul></ul></ul><ul><ul><ul><li>Declaimed properties, property domains and ranges </li></ul></ul></ul><ul><ul><ul><li>Equivalent and complement classes, inverse and functional properties </li></ul></ul></ul><ul><ul><ul><li>Instantiated classes </li></ul></ul></ul><ul><ul><li>Global signatures: </li></ul></ul><ul><ul><ul><li>Super-, sub-classes, properties </li></ul></ul></ul><ul><ul><ul><li>Disjoint classes </li></ul></ul></ul><ul><ul><ul><li>Sibling classes </li></ul></ul></ul><ul><ul><ul><li>Comments, version information </li></ul></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  8. 8. A principled architecture: Multiple matchers <ul><li>Specialised internal matchers targeting at particular signatures </li></ul><ul><ul><li>Name matchers </li></ul></ul><ul><ul><ul><li>String distance based matchers </li></ul></ul></ul><ul><ul><ul><li>WordNet based matchers </li></ul></ul></ul><ul><ul><li>Class matchers </li></ul></ul><ul><ul><ul><li>Taxonomy based matchers </li></ul></ul></ul><ul><ul><ul><li>Definition based matchers </li></ul></ul></ul><ul><li>Invoking existing ontology matching/alignment systems as external matchers </li></ul><ul><ul><li>FOAM API </li></ul></ul><ul><ul><li>INRIA Alignment API </li></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  9. 9. CMS design commitments <ul><li>Avoid reinventing the wheel </li></ul><ul><ul><li>Use existing packages to enhance internal matchers </li></ul></ul><ul><ul><li>Use existing mapping/alignment systems as external matchers </li></ul></ul><ul><li>Semantically enriched matchers based on the definition of concepts </li></ul><ul><ul><li>Propagate similarity along concept hierarchies </li></ul></ul><ul><ul><li>Refine concept similarity by taking into account the names, domains and ranges of declared properties </li></ul></ul><ul><ul><li>Compute similarity using WordNet hierarchies </li></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  10. 10. String distance <ul><li>Reuse of existing packages </li></ul><ul><ul><li>SecondString Metrics </li></ul></ul><ul><ul><ul><li>Jaro, MongeElkan, NeedlemanWunsh, etc. </li></ul></ul></ul><ul><ul><li>Soundex Metrics </li></ul></ul><ul><li>Consider only the local names of ontological entities </li></ul><ul><ul><li>Namespace is ignored </li></ul></ul><ul><ul><li>Names of super(sub)classes are ignored </li></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  11. 11. WordNet-based algorithms <ul><li>Use JWNL WordNet Java Lib </li></ul><ul><li>Names only </li></ul><ul><ul><li>Synonyms are retrieved and compared with string equality or string distance </li></ul></ul><ul><ul><li>Composite names are split and stop words are removed </li></ul></ul><ul><ul><ul><li>E.g. “has_name” => “name” </li></ul></ul></ul><ul><li>WordNet hierarchy </li></ul><ul><ul><li>Calculate distance between two Words </li></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  12. 12. WordNet-based algorithms WordNet hierarchy <ul><li>h the distance between “Word” and Root </li></ul><ul><li>h’ the distance between “Word’” and Root </li></ul><ul><li>H the distance between common subsumer of “Word” and “Word’” and Root </li></ul><ul><li>Similarity between “Word” and “Word’” is computed as </li></ul><ul><li>2H/(h+h’) </li></ul>Root Common Subsumer Word’ Word <ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>H h’ h
  13. 13. Canonical Name <ul><li>C=> A.B.D.C </li></ul><ul><li>C’=> A’.B’.D’.E’.C’ </li></ul><ul><li>Compute the similarity between C and C’ as well as the respective similarity between every pair of super classes of C and C’ </li></ul><ul><li>Penalise the similarity between C and C’ with those of their super classes </li></ul>C’ A’ B’ D’ E’ A B C D <ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  14. 14. Structure algorithm f (name similarity, domain similarity, range similarity ) <ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>P1’(C’, B’) P2’ P3’ P1(C, B) P2 P3 H G Domain of P1 H’ G’ I’ Domain of P1’ A B D Range of P1 E A’ B’ D’ E’ Range of P1’ F’
  15. 15. Structure algorithm cnt’d <ul><li>Structure </li></ul><ul><ul><li>Retrieve declaimed properties </li></ul></ul><ul><ul><li>For each property, retrieve its domains and ranges </li></ul></ul><ul><ul><li>Compare property’s name, domain and range </li></ul></ul><ul><li>StructurePlus </li></ul><ul><ul><li>Compare also the super and sub classes of property’s domain and range </li></ul></ul><ul><li>When compare domains and ranges, using existing name matching techniques </li></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  16. 16. Implemented Matchers powered by existing java libs <ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  17. 17. Post alignment <ul><li>Aggregator </li></ul><ul><ul><li>Weighted average based aggregation </li></ul></ul><ul><ul><li>Weights are manually set by users </li></ul></ul><ul><li>Evaluator </li></ul><ul><ul><li>Nothing is more qualified than a human inspector with domain knowledge and experiences </li></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  18. 18. CMS: deployment options <ul><li>Run CMS from command line </li></ul><ul><ul><li>A batch file is provided </li></ul></ul><ul><li>Invoke CMS as an API </li></ul><ul><li>Run CMS as a service </li></ul><ul><ul><li>via JSP interface </li></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  19. 19. Demo <ul><li>Ontologies: web directory, small size and simple structure </li></ul><ul><li>Run with different weights </li></ul><ul><li>Output to different formats </li></ul><ul><ul><li>OWL, SKOS, HTML, XML(OAEI) </li></ul></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>
  20. 20. Use of SIS <ul><li>Declaimed functionalities of existing systems can be justified against this spectrum </li></ul><ul><li>A reference for selecting the right mapping techniques for a particular problem </li></ul><ul><li>A designer’s aid for navigating through different mapping approaches with emphasis on the use of semantics </li></ul><ul><li>Project aims & targets </li></ul><ul><li>Timeline and deliverables </li></ul><ul><li>Semantic Intensity Spectrum </li></ul><ul><li>Modular architecture </li></ul><ul><li>Algorithms </li></ul><ul><li>CMS </li></ul><ul><li>Evaluation </li></ul><ul><li>Lessons learnt & future work </li></ul>

×