Zhishi.links - A Distributed Instance Matching System
Upcoming SlideShare
Loading in...5
×
 

Zhishi.links - A Distributed Instance Matching System

on

  • 687 views

Zhishi.links Results for OAEI2011 @ 6th International Workshop on Ontology Matching @ ISWC2011

Zhishi.links Results for OAEI2011 @ 6th International Workshop on Ontology Matching @ ISWC2011

Statistics

Views

Total Views
687
Views on SlideShare
687
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Zhishi.links - A Distributed Instance Matching System Zhishi.links - A Distributed Instance Matching System Presentation Transcript

  • Zhishi.links A Distributed Instance Matching System Xing Niu , Shu Rong, Yunlong Zhang and Haofen Wang 2011.10.24
  • Agenda
    • Introduction
    • Architecture and Matching Strategies
    • Adaptations Made for the Evaluation
    • Results
    • Comments
    Page 
  • Introduction
    • Zhishi.links is a distributed Instance Matching system
    • “ Zhishi” is the Romanized Chinese word: “ 知识” , which means knowledge
    • We used it to participate in Data Interlinking track of OAEI2011
    • It performs the best in DI track
    Page 
  • Introduction (con’t)
    • Dumps instead of online lookup services are used for interconnection because of several reasons:
      • Zhishi.links originated from our Chinese LOD project, we used this system to discover links locally
      • Zhishi.links is designed to be a universal instance matching system, so it should not rely too much on the performances of the lookup services
    Page 
  • Architecture and Matching Strategies Page 
  • Architecture and Matching Strategies Page 
  • Adaptations Made for the Evaluation
    • New York Times does not provide sufficient structured descriptive data, so we crawled its topic pages
    • For resources from other three data sources, Virtual Documents are constructed by splicing values of characteristic properties
      • Similarity between a Virtual Document and a topic page is calculated in semantic similarity calculation phase
    Page  XXX --------------------------------------------------------------------------------- *** ------------ *** ------------ Virtual Document Value_1 Value_2 Value_3 Value_4 Value_5 Value_6 … Similarity
  • Adaptations Made for the Evaluation (con’t)
    • Default names and aliases in these four data sources are well-designed.
    • Many of them are appended
      • disambiguation information. (e.g. “Michael Mann (director) ”)
      • or supplements (e.g. “University of California , Los Angeles ”)
    • Such appended phrases are isolated because:
      • they can be treated as values of characteristic properties and used to calculate semantic similarities
        • (Virtual Document)
      • they may bring about noise when the complete labels are used for string similarity calculation
        • Michael Mann (director) <> Michael Mann
    Page 
  • Adaptations Made for the Evaluation (con’t)
    • Several special words in names are extracted for producing unified values of characteristic properties
    • e.g.
      • Corp, Corp. and Corporation  Corp. (Organization)
      • Florida  Fla. (Location)
      • Jr. (People)
    Page 
  • Results Page  Dataset Precision Recall F-measure Highest_Recall DI-nyt-geonames. 0.938 0.883 0.910 0.989 DI-nyt-dbpedia-peo. 0.971 0.970 0.970 0.992 DI-nyt-dbpedia-org. 0.896 0.932 0.913 0.957 DI-nyt-dbpedia-loc. 0.910 0.914 0.912 0.983 DI-nyt-freebase-peo. 0.929 0.924 0.926 0.964 DI-nyt-freebase-org. 0.887 0.853 0.870 0.889 DI-nyt-freebase-loc. 0.902 0.865 0.883 0.932
  • Comments
    • Currently, Zhishi.links is just a prototype to test out distributed algorithm. Deploying the whole system is not easy.
    • We are trying our best to build the final portable matching system and release it at http:// apex.sjtu.edu.cn/apex_wiki/Zhishi.links
    Page 
  •