Zhishi.links - A Distributed Instance Matching System


Zhishi.links Results for OAEI2011 @ 6th International Workshop on Ontology Matching @ ISWC2011

Published in: Technology, Education
  1. 1. Zhishi.links A Distributed Instance Matching System Xing Niu , Shu Rong, Yunlong Zhang and Haofen Wang 2011.10.24
  2. 2. Agenda <ul><li>Introduction </li></ul><ul><li>Architecture and Matching Strategies </li></ul><ul><li>Adaptations Made for the Evaluation </li></ul><ul><li>Results </li></ul><ul><li>Comments </li></ul>Page 
  3. 3. Introduction <ul><li>Zhishi.links is a distributed Instance Matching system </li></ul><ul><li>“ Zhishi” is the Romanized Chinese word: “ 知识” , which means knowledge </li></ul><ul><li>We used it to participate in Data Interlinking track of OAEI2011 </li></ul><ul><li>It performs the best in DI track </li></ul>Page 
  4. 4. Introduction (con’t) <ul><li>Dumps instead of online lookup services are used for interconnection because of several reasons: </li></ul><ul><ul><li>Zhishi.links originated from our Chinese LOD project, we used this system to discover links locally </li></ul></ul><ul><ul><li>Zhishi.links is designed to be a universal instance matching system, so it should not rely too much on the performances of the lookup services </li></ul></ul>Page 
  5. 5. Architecture and Matching Strategies Page 
  6. 6. Architecture and Matching Strategies Page 
  7. 7. Adaptations Made for the Evaluation <ul><li>New York Times does not provide sufficient structured descriptive data, so we crawled its topic pages </li></ul><ul><li>For resources from other three data sources, Virtual Documents are constructed by splicing values of characteristic properties </li></ul><ul><ul><li>Similarity between a Virtual Document and a topic page is calculated in semantic similarity calculation phase </li></ul></ul>Page  XXX --------------------------------------------------------------------------------- *** ------------ *** ------------ Virtual Document Value_1 Value_2 Value_3 Value_4 Value_5 Value_6 … Similarity
  8. 8. Adaptations Made for the Evaluation (con’t) <ul><li>Default names and aliases in these four data sources are well-designed. </li></ul><ul><li>Many of them are appended </li></ul><ul><ul><li>disambiguation information. (e.g. “Michael Mann (director) ”) </li></ul></ul><ul><ul><li>or supplements (e.g. “University of California , Los Angeles ”) </li></ul></ul><ul><li>Such appended phrases are isolated because: </li></ul><ul><ul><li>they can be treated as values of characteristic properties and used to calculate semantic similarities </li></ul></ul><ul><ul><ul><li>(Virtual Document) </li></ul></ul></ul><ul><ul><li>they may bring about noise when the complete labels are used for string similarity calculation </li></ul></ul><ul><ul><ul><li>Michael Mann (director) <> Michael Mann </li></ul></ul></ul>Page 
  9. 9. Adaptations Made for the Evaluation (con’t) <ul><li>Several special words in names are extracted for producing unified values of characteristic properties </li></ul><ul><li>e.g. </li></ul><ul><ul><li>Corp, Corp. and Corporation  Corp. (Organization) </li></ul></ul><ul><ul><li>Florida  Fla. (Location) </li></ul></ul><ul><ul><li>Jr. (People) </li></ul></ul>Page 
  10. 10. Results Page  Dataset Precision Recall F-measure Highest_Recall DI-nyt-geonames. 0.938 0.883 0.910 0.989 DI-nyt-dbpedia-peo. 0.971 0.970 0.970 0.992 DI-nyt-dbpedia-org. 0.896 0.932 0.913 0.957 DI-nyt-dbpedia-loc. 0.910 0.914 0.912 0.983 DI-nyt-freebase-peo. 0.929 0.924 0.926 0.964 DI-nyt-freebase-org. 0.887 0.853 0.870 0.889 DI-nyt-freebase-loc. 0.902 0.865 0.883 0.932
  11. 11. Comments <ul><li>Currently, Zhishi.links is just a prototype to test out distributed algorithm. Deploying the whole system is not easy. </li></ul><ul><li>We are trying our best to build the final portable matching system and release it at http:// </li></ul>Page 