Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

579 views

Published on

The Linked Open Data (LOD) Cloud has gained significant traction over the past few years. With over 275 interlinked datasets across diverse domains such as life science, geography, politics, and more, the LOD Cloud has the potential to support a variety of applications ranging from open domain question answering to drug discovery.

Despite its significant size (approx. 30 billion triples), the data is relatively sparely interlinked (approx. 400 million links). A semantically richer LOD Cloud is needed to fully realize its potential. Data in the LOD Cloud are currently interlinked mainly via the owl:sameAs property, which is inadequate for many applications. Additional properties capturing relations based on causality or partonomy are needed to enable the answering of complex questions and to support applications.

In this work, we present a solution to enrich the LOD Cloud by automatically detecting partonomic relationships, which are well-established, fundamental properties grounded in linguistics and philosophy. We empirically evaluate our solution across several domains, and show that our approach performs well on detecting partonomic properties between LOD Cloud data.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
579
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

  1. 1. Moving beyond sameAs with PLATO:Partonomy detection for Linked Data Prateek Jain, Pascal Hitzler, AmitSheth Kno.e.sis Center Wright State University, Dayton, OH Peter Z. Yeh, KunalVerma Accenture Technology Labs San Jose, CA May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain
  2. 2. Outline• Introduction - Linked Open Data• Challenges• PLATO – Partonomic Relationship detection• Conclusion & Future Work May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 2
  3. 3. Tim Berners-Lee 2006• from http://www.w3.org/DesignIssues/LinkedData.html1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)4. Include links to other URIs. so that they can discover more things. May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 3
  4. 4. Linked Open Data 2011 May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 4
  5. 5. Linked Open DataNumber of Datasets Number of triples (Sept 2011)2011-09-19 295 31,634,213,7702010-09-22 2032009-07-14 95 with 503,998,829 out-links2008-09-18 452007-10-08 252007-05-01 12 From http://www4.wiwiss.fu-berlin.de/lodcloud/state/ May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 5
  6. 6. May2012 –GE Conference 2012–Prateek Jain23rd ACM HT Global Research– Prateek Jain 6
  7. 7. May2012 –GE Conference 2012–Prateek Jain23rd ACM HT Global Research– Prateek Jain 7
  8. 8. Mainstream Semantic Web? May 2012 –IBM TJ Watson Center– Prateek Jain
  9. 9. Is it really mainstream Semantic Web?• What is the relationship between the models whose instances are being linked?• How to do querying on LOD without knowing individual datasets?• How to perform schema level reasoning over LOD cloud?• A very fundamental, important and conceptual relationship namely “PART OF” has little or no existence in LOD May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 9
  10. 10. PLATO Approach May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain
  11. 11. Our ApproachUse knowledge contributed by users • Detection of relationships within and across datasets LOD Cloud May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 11
  12. 12. PLATO Approach• PLATO generates all possible partonomically linked pairs between the entities in the dataset. – Utilize “strongly” associated entities• Identify the type of each entity in the pair using WordNet. – Use Class Names – Gives the lexicographer files for the synsets corresponding to these entities• Use this information to determine the applicable OWL partonomy properties. – Using Winston’s taxonomy May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 12
  13. 13. Winston’s Taxonomy May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 13
  14. 14. PLATO Approach – Step 2• PLATO generates linguistic patterns for each applicable property based on linguistic cues suggested by Winston. – Cell Wall is made of Cellulose – Cellulose is made of Cell Wall – Cell Wall is partly Cellulose• Tests the lexical patterns for each entity pair in a corpus-driven manner. – Using Web as a corpus• PLATO counts the total number of web pages that contain the pattern – Parse the page and identify the occurance of pattern. May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 14
  15. 15. PLATO Approach – Step 3• Asserts the partonomy property with strongest supporting evidence – Cell Wall is made of Cellulose, 48 – Cellulose is made of Cell Wall, 10• PLATO also enriches the schema by generalizing from the instance level assertions. May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 15
  16. 16. PLATO Evaluation May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 16
  17. 17. Outreach• Prateek Jain, Pascal Hitzler, KunalVerma, Peter Z. Yeh and Amit P. Sheth, “Moving beyond sameAs with PLATO: Partonomy detection for Linked Data”. In Proceedings of the 23rd ACM Hypertext and Social Media conference (HT 2012), Milwaukee, WI, USA, June 25th-28th, 2012 (To Appear)• Tool available for download at http://wiki.knoesis.org/index.php/PLATO May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 17
  18. 18. End Product May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 18
  19. 19. Conclusions and Future Work May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain
  20. 20. Conclusions• PLATO is an approach for partonomicrelationship detection• Approach works for both instances and schema level relationships• Evaluation performed between and within prominent and big LOD datasets• Results validate the use of knowledge on the Web to solve tough problems May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 20
  21. 21. Future Work• Use incomplete knowledge for part of relationship identification – Machine learning based techniques• Release the schema mappings in public domain• Develop better querying system for LOD using PLATO and BLOOMS • Work in progress with ALOQUS (Submitted to ODBASE 2012)• Identify and incorporate user preferences May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 21
  22. 22. Questions? Prateek Jain Kno.e.sis Center Wright State University, Dayton, OHhttp://wiki.knoesis.org/index.php/Prateek May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain

×