Assessing Linked Data Mappings using Network Measures

2,357 views

Published on

When generating a lot of WoD links automatically, data quality is a pressing issue. This presentation, and the related paper, introduce LinkQA: a network based node-centric framework to analyse the impact of linkage on the network topology and assess the quality of these links.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,357
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Assessing Linked Data Mappings using Network Measures

  1. Assessing Linked Data Mappings using Network Measures Christophe Guéret, Paul Groth, Claus Stadler, Jens Lehmann 9th Extended Semantic Web Conference (ESWC) May 29, 2012 http://latc-project.euESWC - May 2012 http://aksw.org Assessing Linked Data mappings http://www.vu.nl 1/25
  2. The next 25+5 minutes The impact of links in the Web of Data Main questions What is the impact of link creation? Can we detect “bad” links based on their impact? Is adding links always a good thing? Contributions A framework to assess the impact of links Results for 5 metricsESWC - May 2012 Assessing Linked Data mappings 2/25
  3. Is this a good or a bad link ?ESWC - May 2012 Assessing Linked Data mappings 3/25
  4. Measuring the Web of Data Look at the topology using network analysis tools Impossible to get the complete graph Sampling of the graph focusing on specific nodes See the bigger picture through aggregation Build the local network around a resource Repeat the process a sufficient number of timeESWC - May 2012 Assessing Linked Data mappings 4/25
  5. Network sampling process Use SPARQL end point or de-reference the resources to get the descriptionsESWC - May 2012 Assessing Linked Data mappings 5/25
  6. Aggregation of local results Observed Target …ESWC - May 2012 Assessing Linked Data mappings 6/25
  7. Metrics Compute local scores for a resource Criteria Use only the local network Representative of a global property Not sensitive to change of observation scale 5 metrics currently available in LinkQAESWC - May 2012 Assessing Linked Data mappings 7/25
  8. What do we want to see? Increase of connectivity within topical groups Increase chances of finding related information More bridges between topical groups Improve browsing capabilities More connectivity around hubs Decrease the dependency upon the hubsESWC - May 2012 Assessing Linked Data mappings 8/25
  9. Metric 1 – Degree Metric Number of edges around the target node Target Power-law distribution of values Intuition Presence of hubsESWC - May 2012 Assessing Linked Data mappings 9/25
  10. Metric 2 – Clustering coefficient Metric Density of links around the target node Target Increase clustering around nodes Intuition Topical clustersESWC - May 2012 Assessing Linked Data mappings 10/25
  11. Metric 3 – Centrality Metric Ratio between outgoing and incoming links Target Lower the discrepancy between the values Intuition Hubs are sensitiveESWC - May 2012 Assessing Linked Data mappings 11/25
  12. Metric 4 – SameAs chains Metric Number of “open” sameAs chains Target No open sameAs Intuition Peer agreementESWC - May 2012 Assessing Linked Data mappings 12/25
  13. Metric 5 – Description enrichment Metric Richness of resource description Target Increase as possible Intuition “SameAsed” resources are complementaryESWC - May 2012 Assessing Linked Data mappings 13/25
  14. Under the hood of LinkQAESWC - May 2012 Assessing Linked Data mappings 14/25 http://www.flickr.com/photos/cradlehall/5747161514
  15. Workflow of an analysisESWC - May 2012 Assessing Linked Data mappings 15/25
  16. Output of an analysis Results on the node and aggregated scale Per metric: Indication of change with respect to the target Sorted list of outlier nodes, sorted by their distance to the target Plus, a global ranking of nodes => Input for manual inspection by an expertESWC - May 2012 Assessing Linked Data mappings 16/25
  17. Experimental resultsESWC - May 2012 Assessing Linked Data mappings 17/25
  18. Global impact of links Observe the distributions to detect bad linksESWC - May 2012 Assessing Linked Data mappings 18/25
  19. First evaluation 160 linking specifications for Silk, developed in the context of LATC 6 linking specifications with manual verification of results 50 positive links 50 negative links Execute LinkQA with 10 samples of 50 linksESWC - May 2012 Assessing Linked Data mappings 19/25
  20. Results of the detection “C” if change detected in > 50% of runsESWC - May 2012 Assessing Linked Data mappings 20/25
  21. Some explanations Low sensitivity of metrics: Lack of data Stable change 50/50 accuracy of detection: Targets may not be the right ones Sample may not be big enough Semantics agnostic measures are less performantESWC - May 2012 Assessing Linked Data mappings 21/25
  22. A closer look at the outliers See if the outliers are necessarily bad linksESWC - May 2012 Assessing Linked Data mappings 22/25
  23. Second evaluation Linking specifications for Silk, developed in the context of LATC All linking specifications sampled to have 45 positive links 5 negative links Execute LinkQA five time, on five samplesESWC - May 2012 Assessing Linked Data mappings 23/25
  24. Rank of positive and negative linksESWC - May 2012 Assessing Linked Data mappings 24/25
  25. Take home message LinkQA is a node centric approach to measure the impact of links in the WoD network Scalable, can be distributed Current results show that The 5 metrics defines are to be improved Metrics considering Semantics perform better The network sample seems too small Outliers detection improves with the number of metricsESWC - May 2012 Assessing Linked Data mappings 25/25

×