Your SlideShare is downloading. ×
0
Contextual Ontology Alignment of LOD with  an Upper Ontology: A Case Study with                 ProtonPrateekJain, Peter Z...
Outline•   Introduction•   Background•   Challenges•   Existing Approaches•   BLOOMS+ Approach•   Conclusion & Future Work...
Outline•   Introduction•   Background•   Challenges•   Existing Approaches•   BLOOMS+ Approach•   Conclusion & Future Work...
Web of Data              4
Linked Open Data• “The term Linked Data is used to describe a method of exposing,  sharing, and connecting data via de-ref...
Outline•   Introduction•   Background•   Challenges•   Existing Approaches•   BLOOMS+ Approach•   Conclusion & Future Work...
If everything is nice, why am I here..• Lack of Conceptual Description of Datasets• Absence of Schema Level Links• Lack of...
What can be done?• Relationships are at the heart of Semantics.• LOD captures instance level relationships, but lacks clas...
Schema Matching• Schema matching is the process of identifying that two objects  are semantically related.• In two schemas...
Why does it matters?• Massive amount of data available within enterprise which refers  to same entities, terminology is di...
Outline•   Introduction•   Background•   Challenges•   Existing Approaches•   BLOOMS+ Approach•   Conclusion & Future Work...
Existing Approaches   A survey of approaches to automatic Ontology matching by Erhard Rahm, Philip A. Bernstein in the VLD...
Outline•   Introduction•   Background•   Challenges•   Existing Approaches•   BLOOMS+ Approach•   Conclusion & Future Work...
Our ApproachUse knowledge contributed by users           Structured knowledge contributed by                              ...
Rabbit out of a hat?• Traditional auxiliary data sources like (WordNet, Upper Level  Ontologies) have limited coverage and...
Wikipedia• The English version alone contains more than 2.9 million  articles.• It is continually expanded by approximatel...
Schema Matching on LOD using WikipediaCategorization• On Wikipedia, categories are used to organize the entire project.• W...
BLOOMS+ Approach – Step 1• Pre-process the input schema   •   Remove property restrictions   •   Remove individuals, prope...
BLOOMS+ Approach – Step 2• For each concept name processed in the previous step   – Identify article in Wikipedia correspo...
BLOOMS+ Approach – Step 3• In the tree Ti, find n (the number of common nodes which occurs  in Tj).• Compute overlap Os be...
Contextual Similarity• BLOOMS+ computes contextual similarity between a source  class C and target D to further determine ...
BLOOMS+ Approach – Step 4• BLOOMS+ retrieves all super classes of C and D up to level 2  (can be changed). The set of supe...
BLOOMS Approach – Step 5• BLOOMS+ computes the overall contextual similarity between C  and D with respect to Ti and Tj us...
BLOOMS Approach – Step 6• BLOOMS+ computes the overall similarity between classes C  and D w.r.t. BLOOMS+ trees Ti and Tj ...
Alignment decision• If O(Ti,Tj) = O(Ti,Tj), then BLOOMS+ sets    – C owl:equivalentClass D.• If O(Ti,Tj) <O(Tj,,Ti), then ...
Results BLOOMS+                  26
Outline•   Introduction•   Background•   Challenges•   Existing Approaches•   BLOOMS+ Approach•   Conclusion & Future Work...
Conclusion• We have presented a system called BLOOMS+ for performing  ontology alignment using contextual information.• BL...
Future Work• Extended BLOOMS to utilize contextual information available on  community generated data.• New weighting mech...
References•   PrateekJain,Peter Z. Yeh, KunalVerma, Reymonrod Vasquez, Mariana    Damova, Pascal Hitzler and Amit P. Sheth...
Thank You!Questions?
Upcoming SlideShare
Loading in...5
×

ESWC 2011 BLOOMS+

161

Published on

Presentation during ESWC 2011 for BLOOMS+

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
161
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • A bit of introduction about us. How the work came about as a result of our collaboration between Kno.e.sis, Accenture and Ontotext.
  • Some introduction about LOD, since the track is not LOD specific track.
  • Some introduction about LOD, since the track is not LOD specific track.
  • Some introduction about LOD, since the track is not LOD specific track.
  • Some introduction about LOD, since the track is not LOD specific track.
  • Some introduction about LOD, since the track is not LOD specific track.
  • Transcript of "ESWC 2011 BLOOMS+ "

    1. 1. Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with ProtonPrateekJain, Peter Z. Yeh, KunalVerma, Reymonrod Vasquez, Mariana Damova, Pascal Hitzler and Amit P. Sheth Kno.e.sis, Wright State University, Dayton, OH Ontotext, Sofia, Bulgaria, Accenture Technology Labs, San Jose, CA
    2. 2. Outline• Introduction• Background• Challenges• Existing Approaches• BLOOMS+ Approach• Conclusion & Future Work• References 2
    3. 3. Outline• Introduction• Background• Challenges• Existing Approaches• BLOOMS+ Approach• Conclusion & Future Work• References 3
    4. 4. Web of Data 4
    5. 5. Linked Open Data• “The term Linked Data is used to describe a method of exposing, sharing, and connecting data via de-referenceable URIs on the Web.”- Wikipedia• Datasets part of Linked Open Data include – Geographical Datasets – Movies – Life Science, Genes, Proteins – General Information (Wikipedia), Customer Reviews,… – US Census, Senator Voting Records,….• Links primarily at instance level to assert equality between entities Example: linkedMDB:film/77 owl:sameAsdbpedia:resource/Pulp_Fiction• By September 2010 LOD is estimated to have 25 billion RDF triples, interlinked by around 395 million RDF links. 5
    6. 6. Outline• Introduction• Background• Challenges• Existing Approaches• BLOOMS+ Approach• Conclusion & Future Work• References 6
    7. 7. If everything is nice, why am I here..• Lack of Conceptual Description of Datasets• Absence of Schema Level Links• Lack of expressivity• Difficulties with respect to querying using SPARQL – Schema heterogeneity – Entity disambiguation – Ranking of results 7
    8. 8. What can be done?• Relationships are at the heart of Semantics.• LOD captures instance level relationships, but lacks class level relationships. – Superclass – Subclass – Equivalence• How to find these relationships? – Perform a matching of the LOD Ontology’s using state of the art schema matching tools.• Desirable – Considering the size of LOD, at least have results which a human can curate. 8
    9. 9. Schema Matching• Schema matching is the process of identifying that two objects are semantically related.• In two schemas DB1.Student (Name, SSN, Level, Major, Marks) and DB2.Grad-Student (Name, ID, Major, Grades); possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades (100-90 A; 90-80 B..).• Need for high quality data for querying and analytics in large enterprises.• Schema mapping provides a way of resolving discrepancies in data. 9
    10. 10. Why does it matters?• Massive amount of data available within enterprise which refers to same entities, terminology is different.• Enterprise information asset awareness.• Finding relevant and related schemata,• Project planning. – Can project specific requirements be fulfilled with the data at disposal.• Generating an exchange schema. – Collaboration with clients which use different schemas. Reference: K. Smith, P. Mork, L. Seligman, A. Rosenthal, M. Morse, D. Allen, and M. Li. The Role of Schema Metching in Large Enterprises. CIDR, 2009. 10
    11. 11. Outline• Introduction• Background• Challenges• Existing Approaches• BLOOMS+ Approach• Conclusion & Future Work• References 11
    12. 12. Existing Approaches A survey of approaches to automatic Ontology matching by Erhard Rahm, Philip A. Bernstein in the VLDB Journal 10: 334–350 (2001) 12
    13. 13. Outline• Introduction• Background• Challenges• Existing Approaches• BLOOMS+ Approach• Conclusion & Future Work• References 13
    14. 14. Our ApproachUse knowledge contributed by users Structured knowledge contributed by users To improve 14
    15. 15. Rabbit out of a hat?• Traditional auxiliary data sources like (WordNet, Upper Level Ontologies) have limited coverage and are insufficient for LOD datasets. • LOD datasets have diverse domains• Community generated data although noisy but is rich in • Content • Structure • Has a “self healing property”• Problems like Schema Matching have a dimension of context associated with them. Since community generated data is created by diverse set of people, hence captures diverse context. 15
    16. 16. Wikipedia• The English version alone contains more than 2.9 million articles.• It is continually expanded by approximately 100,000 active volunteer editors world-wide.• Allows multiple points of view to be mentioned with their proper contexts.• Article creation/correction is an ongoing activity with no down time. 16
    17. 17. Schema Matching on LOD using WikipediaCategorization• On Wikipedia, categories are used to organize the entire project.• Wikipedias category system consists of overlapping trees.• Simple rules for categorization – “If logical membership of one category implies logical membership of a second, then the first category should be made a subcategory” – “Pages are not placed directly into every possible category, only into the most specific one in any branch” – “Every Wikipedia article should belong to at least one category.” 17
    18. 18. BLOOMS+ Approach – Step 1• Pre-process the input schema • Remove property restrictions • Remove individuals, properties• Tokenize the class names • Remove underscores, hyphens and other delimiters • Breakdown complex class names – example: SemanticWeb => Semantic Web 18
    19. 19. BLOOMS+ Approach – Step 2• For each concept name processed in the previous step – Identify article in Wikipedia corresponding to the concept. – Each article related to the concept indicates a sense of the usage of the word.• For each article found in the previous step – Identify the Wikipedia category to which it belongs. – For each category found, find its parent categories till level 4.• Once the “BLOOMS tree” for each of the sense of the source concept is created (Ti), utilize it for comparison with the “BLOOMS tree” of the target concepts (Tj). – BLOOMS trees are created for individual senses of the concepts. 19
    20. 20. BLOOMS+ Approach – Step 3• In the tree Ti, find n (the number of common nodes which occurs in Tj).• Compute overlap Os between the source and target tree.• Exponentiation of the inverse depth of common node gives less node to nodes which appear lower in the hierarchy (generic nodes)• Log of tree avoids bias against large trees. 20
    21. 21. Contextual Similarity• BLOOMS+ computes contextual similarity between a source class C and target D to further determine if they should be aligned.• Information about super classes of C and D is a good source of contextual information.• If the super classes agree, it is a good alignment otherwise it should be penalized.• For example, Jaguar has super classes such as Car and Vehicle, and Cat has super classes such as Feline and Mammal, then the alignment should be penalized because its contextual similarity is low. 21
    22. 22. BLOOMS+ Approach – Step 4• BLOOMS+ retrieves all super classes of C and D up to level 2 (can be changed). The set of super classes is N( C ) and N (D).• For each BLOOMS+ tree pair ( Ti, Tj) between C and D, BLOOMS+ determines the number of super classes in N(C) and N(D) in following way.• A super class c ∈ N(C) is supported by Tiif either of the following conditions are satisfied:– – The name of c matches a node inTj – The Wikipedia article (or article category) corresponding to c based on a Wikipedia search web service call using the name of c – matches a node in Ti. 22
    23. 23. BLOOMS Approach – Step 5• BLOOMS+ computes the overall contextual similarity between C and D with respect to Ti and Tj using the harmonic mean, which is instantiated as:• We chose the harmonic mean to emphasize super class neighborhoods that are not well supported (and hence should significantly lower the overall contextual similarity). 23
    24. 24. BLOOMS Approach – Step 6• BLOOMS+ computes the overall similarity between classes C and D w.r.t. BLOOMS+ trees Ti and Tj by taking the weighted average of the class and contextual similarity.• BLOOMS+ defaults alpha and beta to 1 to give equal importance.• BLOOMS+ then selects the tree pair (Ti,Tj) ∈ FC × FD with the highest overall similarity score and if this score is greater than the alignment threshold HA. 24
    25. 25. Alignment decision• If O(Ti,Tj) = O(Ti,Tj), then BLOOMS+ sets – C owl:equivalentClass D.• If O(Ti,Tj) <O(Tj,,Ti), then BLOOMS+ sets – C rdfs:subClassOf D. –• Otherwise, BLOOMS+ sets D rdfs:subClassOf C. 25
    26. 26. Results BLOOMS+ 26
    27. 27. Outline• Introduction• Background• Challenges• Existing Approaches• BLOOMS+ Approach• Conclusion & Future Work• References 27
    28. 28. Conclusion• We have presented a system called BLOOMS+ for performing ontology alignment using contextual information.• BLOOMS+ has been evaluated on alignment of three different LOD ontologies to PROTON, created manually by human experts for real world application called FactForge.• To the best of our knowledge, BLOOMS+ is the only system which utilizes contextual information present in ontology and Wikipedia category hierarchy for ontology matching.• BLOOMS+ significantly outperforms state of the art solutions for the task of ontology alignment. 28
    29. 29. Future Work• Extended BLOOMS to utilize contextual information available on community generated data.• New weighting mechanism for identifying matches between the concepts in the dataset.• Develop a polling mechanism for identifying the best source to assist in the process of schema alignment.• Allow seamless querying across datasets by utilizing the generated alignments (preliminary work LOQUS). 29
    30. 30. References• PrateekJain,Peter Z. Yeh, KunalVerma, Reymonrod Vasquez, Mariana Damova, Pascal Hitzler and Amit P. Sheth, “Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton”. Proceedings of the 8th Extended Semantic Web Conference 2011, volume 6643 of Lecture Notes in Computer Science, Heidelberg, 2011. Springer Berlin• Prateek Jain, Pascal Hitzler, Amit P. Sheth, KunalVerma, Peter Z. Yeh: Ontology Alignment for Linked Open Data. Proceedings of the 9th International Semantic Web Conference 2010, Shanghai, China, November 7th-11th, 2010. Pages 402-417.• Prateek Jain, Pascal Hitzler, Peter Z. Yeh, KunalVerma, and AmitP.Sheth, Linked Data Is Merely More Data. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: Linked Data Meets Artificial Intelligence. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp. 82-86. ISBN 978-1-57735-461-1. 30
    31. 31. Thank You!Questions?
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×