Your SlideShare is downloading. ×
0
Joshua Shinavier



 The state of the art in
     Linked Data


Advanced Semantic Web, Spring 2009
         Literature Sur...
Outline
•   Linked Data

•   Linking Open Data

•   describing linked datasets

•   growing the data web

•   keeping Link...
Linked Data overview

•   resource -- an item of interest

•   URI -- global identifier for a resource

•   representation...
The Linking Open Data initiative
•   “bootstrap” the data web with large, interconnected data sets
    to reach a critical...
The LOD cloud




      5
LOD data sets




      6
Link sets in LOD




        7
Describing linked datasets

•   voiD (Vocabulary of Interlinked Datasets)
    [Alexander, Cyganiak, Hausenblas, Zhao 09]

...
Keeping Linked Data connected

•   network-shaped Entity Name System to enable
    systematic reuse of URIs [Bouquet, Stoe...
Managing co-reference
•   many conflated resources in DBpedia [Jaffri,
    Glaser, Millard 08]

    •   representative of ...
Growing the data web

•   how to get data out there?

•   challenges of the read-write Semantic Web

    •   user awarenes...
Examples of LOD data sets


•   DBpedia [Auer, Bizer, Kobilarov, Lehmann,
    Cyganiak, Ives 07]

    •   extracts structu...
Music and movies as Linked Data
•   Linked Movie Database [Hassanzadeh, Consens 09]

    •   combines data from IMDb, Free...
Other sources of data


•   the hypertext Web itself [Li, Zhao 08]

    •   extraction of semantic links from hypertext li...
Other sources of data (cont.)

•   XML Business Reporting Language (XBRL) [Garcia, Gil
    09]

    •   mapping data to RD...
Mapping tools


•   D2R Server for customizable mappings from
    relational databases to ontologies [Bizer, Cyganiak
    ...
Aggregated resources


•   Open Archives Initiative Protocol for Metadata
    Harvesting (OAI-PMH)

    •   can be made We...
User-driven Linked Data


•   existing Linked Data datasets are more
    appropriate for machine than human
    consumptio...
User-driven Linked Data (cont.)
•   direct modification using SPARQL/Update

    •   e.g. in Tabulator [Berners-Lee, Holle...
User-driven Linked Data (cont.)
•   public data from existing social networks

    •   wrappers for Web 2.0 services [Pass...
Usability and licensing

•   usability (for humans) of Linked Data [Halb,
    Raimond, Hausenblas 08]

    •   current LOD...
Indexing and searching
•   W3C’s TAP semantic search [Guha, McCool 01]

•   Swoogle [Ding, Finin, Joshi, Pan, Cost, Peng, ...
Link discovery


•   Silk link discovery framework [Volz, Bizer, Gaedke,
    Kobilarov 09]

    •   find relationships bet...
Navigation
•   like early Web, it’s easy to get “Lost in Hyperspace”

•   Tabulator generic Linked Data browser [Berners-
...
Navigation (cont.)
•   DBPedia Mobile map view and faceted Linked
    Data browser [Becker, Bizer 08]

    •   explore the...
Navigation (cont.)
•   Fenfire generic Linked Data browser [Hastrup,
    Cyganiak, Bojars 08]

    •   uses graph views ra...
Navigation (cont.)


•   Humboldt [Kobilarov,
    Dickinson 08]

    •   exploratory browsing

    •   faceted views

    ...
Navigation (cont.)
•   zLinks plugin [Bergman, Giasson 08]

    •   WordPress plugin with supporting server

    •   relat...
Navigation (cont.)
•   mapping of Linked Data to a file system model
    [Schandl 09]

    •   enables use of this data wi...
Other applications
•   how to use the data that is out there?

    •   emerging applications which exploit Linked
        ...
The gray area
•   U-P2P framework for peer-to-peer linked data [Davoust,
    Esfandiari 09]

    •   data replication prov...
State of the data web
•   where are we with the Linked Data graph?

    •   size

    •   number and type of links

    • ...
Statistics of the data web

•   today’s Linked Data is very different than the first-
    generation data web [Halpin 09]
...
Query popularity follows a power law




 •   ...




                 34
URI frequency... not so much




•   ...




                  35
Data publishing lacks a “long tail”




•   ...




                 36
A few dominant ontologies are emerging




          # of URIs by vocabulary
                     37
(DBpedia bias)




# of URIs by domain name
           38
Graph analysis for the data web

•   common network analysis techniques can be used
    to investigate interoperability an...
Ranking and clustering of LOD data sets




                   40
•       Original slide show:

    •    http://tw.rpi.edu/proj/portal.wiki/images/f/f0/
         LinkedData.pdf

•       Re...
Upcoming SlideShare
Loading in...5
×

The state of the art in Linked Data

4,068

Published on

A literature survey on Linked Data for a spring 2009 class at the Tetherless World Constellation.

Published in: Technology
3 Comments
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,068
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
156
Comments
3
Likes
10
Embeds 0
No embeds

No notes for slide

Transcript of "The state of the art in Linked Data"

  1. 1. Joshua Shinavier The state of the art in Linked Data Advanced Semantic Web, Spring 2009 Literature Survey
  2. 2. Outline • Linked Data • Linking Open Data • describing linked datasets • growing the data web • keeping Linked Data connected • indexing and searching • applications • navigation • state of the data web 2
  3. 3. Linked Data overview • resource -- an item of interest • URI -- global identifier for a resource • representation -- data corresponding to the state of a resource • information resource -- a “document” containing information • non-information resource -- anything else • associated description -- representation describing a Semantic Web resource 3
  4. 4. The Linking Open Data initiative • “bootstrap” the data web with large, interconnected data sets to reach a critical mass of semantics • strict adherence to W3C standards • identification and transportation (URI, HTTP) of resource descriptions • interpretation (RDF, RDFS, OWL) of resource descriptions • LOD grows as data providers: • publish structured data on the Web • set RDF links between entities in different data sources • transition of the web from a distributed document repository into a universal, ubiquitous database [Erling 09] 4
  5. 5. The LOD cloud 5
  6. 6. LOD data sets 6
  7. 7. Link sets in LOD 7
  8. 8. Describing linked datasets • voiD (Vocabulary of Interlinked Datasets) [Alexander, Cyganiak, Hausenblas, Zhao 09] • describes data sets the link sets between them • DING (Dataset RankING) [Toupikov, Umbrich, Delbru, Hausenblas, Tummarello 09] • ranking of linked datasets using formal descriptions • modeling of the Linked Data domain [Halpin, Presutti 09] 8
  9. 9. Keeping Linked Data connected • network-shaped Entity Name System to enable systematic reuse of URIs [Bouquet, Stoermer, Cordioli, Tummarello 08] • similar to DNS for interlinking hypertext • n2Mate framework [Peterson, Cregan, Atkinson, Brisbin 08] • use social networking principles to facilitate vocabulary and instance reuse • graph-based disambiguation of Semantic Web entities with idMesh [Cudré-Mauroux, Haghani, Jost, Aberer, de Meer 09] 9
  10. 10. Managing co-reference • many conflated resources in DBpedia [Jaffri, Glaser, Millard 08] • representative of LOD as a whole • Co-Reference Resolution Service [Glaser, Jaffri, Millard 09] • when co-reference is context-specific, owl:sameAs is inappropriate • stores co-reference information as a first-class entity • ontology-level alignment should precede data-level alignment [Nikolov, Uren, Motta 09] 10
  11. 11. Growing the data web • how to get data out there? • challenges of the read-write Semantic Web • user awareness of social context of data (e.g. licensing, privacy) • view update problem • is the wiki model applicable? • incentives for posting data on the SW • validating existing Linked Data with Vapour [Berrueta, Fernandez, Frade 08] 11
  12. 12. Examples of LOD data sets • DBpedia [Auer, Bizer, Kobilarov, Lehmann, Cyganiak, Ives 07] • extracts structured information from Wikipedia • linking hub for the LOD cloud • RDF Book Mashup [Bizer, Cyganiak, Gauss 07] • product metadata from Amazon.com 12
  13. 13. Music and movies as Linked Data • Linked Movie Database [Hassanzadeh, Consens 09] • combines data from IMDb, Freebase, OMDB, DBPedia, RottenTomatoes.com, Stanford Movie Database • interlinked music datasets [Raimond, Sutton, Sandler 08] • combines data from Jamendo on DBTune, BBC John Peel sessions, SBSimilarity, Musicbrainz, DBpedia, Geonames • links artists, albums, tracks, personal music collections • generated links based similarity of resources, similarity of neighbors 13
  14. 14. Other sources of data • the hypertext Web itself [Li, Zhao 08] • extraction of semantic links from hypertext links and hierarchical relationships among Web documents • RDF representation of HTML DOM from using SparqPlug [Coetzee, Heath, Motta 08] • multimedia metadata • interlinking multimedia fragments [Hausenblas, Troncy, Bürger, Raimond 09] 14
  15. 15. Other sources of data (cont.) • XML Business Reporting Language (XBRL) [Garcia, Gil 09] • mapping data to RDF and schemas to OWL facilitates interoperability • large thesauri [Neubert 09] • as interlinking hubs for professional communities • enterprise data, e.g. technical documentation [Servant 08] • MARC21 bibliographic records [Styles, Ayers, Shabir 08] 15
  16. 16. Mapping tools • D2R Server for customizable mappings from relational databases to ontologies [Bizer, Cyganiak 06] • browser-based tools for defining RDB-to-RDF mappings [Zhou, Xu, Chen, Idehen 08] • Triplify [Auer, Dietzold, Lehmann, Hellmann, Aumueller 09] • from generic data silos to Linked Data using OpenLink Data Spaces [Idehen, Erling 08] 16
  17. 17. Aggregated resources • Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) • can be made Web-accessible with OAI2LOD Server [Haslhofer, Schandl 08] • Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) [Van de Sompel, Lagoze, Nelson, Warner, Sanderson, Johnston 09] • adheres to Web principles 17
  18. 18. User-driven Linked Data • existing Linked Data datasets are more appropriate for machine than human consumption • template-generated interlinks are of limited quality • data from existing silos quickly becomes out of date • need human involvement to grow the data web organically 18
  19. 19. User-driven Linked Data (cont.) • direct modification using SPARQL/Update • e.g. in Tabulator [Berners-Lee, Hollenbach, Lu, Presbrey, Prud’hommeaux, Schraefel 08] • User Contributed Interlinking [Halb, Raimond, Hausenblas] • semantic wikis • Loomp [Roesch, Heese 09] • semantic annotation of content using a text editor interface 19
  20. 20. User-driven Linked Data (cont.) • public data from existing social networks • wrappers for Web 2.0 services [Passant 08] • unifying personal identity across various networks [Rowe 09] • Semantically Interlinked Online Communities (SIOC) • integrating social media sites (forums, blogs, wikis, etc. with the data web [Bojars, Passant, Cyganiak, Breslin 08] • Meaning of a Tag (MOAT) ontology gives meaning to tags on Web 2.0 [Passant, Laublet 08] 20
  21. 21. Usability and licensing • usability (for humans) of Linked Data [Halb, Raimond, Hausenblas 08] • current LOD datasets are primarily for machine consumption • low semantic strength of current LOD link sets • provenance information for Linked Data [Hartig 09] • Open Data Commons license [Miller, Styles, Heath 08] 21
  22. 22. Indexing and searching • W3C’s TAP semantic search [Guha, McCool 01] • Swoogle [Ding, Finin, Joshi, Pan, Cost, Peng, Reddivari, Doshi, Sachs 04] • adapts PageRank concept to ontologies • SWSE [Hogan, Harth, Umbrich, Decker 07] • MultiCrawler [Harth, Umbrich, Decker 06] • RDF Gateway search • Watson document-based search • Falcons [Cheng, Ge, Wu, Qu 08] • textual search using class hierarchies for query restriction • Sindice Semantic Web index [Tummarello, Delbru, Oren 07] 22
  23. 23. Link discovery • Silk link discovery framework [Volz, Bizer, Gaedke, Kobilarov 09] • find relationships between entities within different data sources • generation of owl:sameAs links • value of Web of Data depends on the amount and quality of links between data sources 23
  24. 24. Navigation • like early Web, it’s easy to get “Lost in Hyperspace” • Tabulator generic Linked Data browser [Berners- Lee, Chen, Chilton, Connolly, Dhanaraj, Hollenbach, Lerer, Sheets 06] • encourage deployment of Linked Data • test, refine and promote Linked Data standards • faceted views over large-scale linked data with Virtuoso Cluster Edition [Erling 09] • Explorator RDF browser [Araujo, Schwabe 09] • exploratory search using direct manipulation 24
  25. 25. Navigation (cont.) • DBPedia Mobile map view and faceted Linked Data browser [Becker, Bizer 08] • explore the geospatial Semantic Web • uses current GPS position as a starting point • potential for Linked Data publishing 25
  26. 26. Navigation (cont.) • Fenfire generic Linked Data browser [Hastrup, Cyganiak, Bojars 08] • uses graph views rather than tables or outlines • shows graph data as directly as possible • related to Fentwine [Fallenstein, Lukka 04] 26
  27. 27. Navigation (cont.) • Humboldt [Kobilarov, Dickinson 08] • exploratory browsing • faceted views • “resource at a time” • uses a “pivot” operation to refocus the view 27
  28. 28. Navigation (cont.) • zLinks plugin [Bergman, Giasson 08] • WordPress plugin with supporting server • relates hypertext links with contextually relevant Linked Data • WOWY (WordNet, OpenCyc, Wikipedia, YAGO) • distinguish between types of resources • disambiguate alternate senses 28
  29. 29. Navigation (cont.) • mapping of Linked Data to a file system model [Schandl 09] • enables use of this data within desktop applications 29
  30. 30. Other applications • how to use the data that is out there? • emerging applications which exploit Linked Data [Hausenblas 09] • integrating data sources related to drug and clinical trials [Jentzsch, Andersson, Hassanzadeh, Stephens, Bizer 09] • mashups • MashQL [Jarrar, Dikaiakos 09] • Internet is a database, mashup is a query over that database • benefit of specialized, independent Linked Data services acting together [Bojars, Passant, Giasson, Breslin 07] 30
  31. 31. The gray area • U-P2P framework for peer-to-peer linked data [Davoust, Esfandiari 09] • data replication provides a measure of popularity • Linked Data with Named Graphs • e.g. interlinks with embedded provenance information [Zhao, Klyne, Shotton 08] • Ripple scripting language [Shinavier 07] • embeds Turing-complete programs in the Web of Data 31
  32. 32. State of the data web • where are we with the Linked Data graph? • size • number and type of links • usefulness to end users • network characteristics • single-point-of-access (e.g. DBpedia, GeoNames) vs. distributed datasets (e.g. FOAF-o-sphere, SIOC-land) • syntactic and semantic analysis of the LOD dataset [Hausenblas, Halb, Raimond, Heath 08] 32
  33. 33. Statistics of the data web • today’s Linked Data is very different than the first- generation data web [Halpin 09] • LOD data accounts for the vast majority of data • power-law distributions are emerging • data web is not growing organically • Web standards are generally adhered to • is Linked Data useful to ordinary users? • sampling of Linked Data using Live.com query logs and FALCON-S semantic search engine 33
  34. 34. Query popularity follows a power law • ... 34
  35. 35. URI frequency... not so much • ... 35
  36. 36. Data publishing lacks a “long tail” • ... 36
  37. 37. A few dominant ontologies are emerging # of URIs by vocabulary 37
  38. 38. (DBpedia bias) # of URIs by domain name 38
  39. 39. Graph analysis for the data web • common network analysis techniques can be used to investigate interoperability and structural patterns of the LOD cloud [Rodriguez 09] • results based on March 2009 statistics of the LOD data set graph: • LOD graph is not strongly connected • diameter of 8 is large given relatively small size of the cloud • data sets have nearly identical incoming and outgoing link patterns (⇒ majority of reciprocal owl:sameAs links) 39
  40. 40. Ranking and clustering of LOD data sets 40
  41. 41. • Original slide show: • http://tw.rpi.edu/proj/portal.wiki/images/f/f0/ LinkedData.pdf • References: • http://tw.rpi.edu/proj/portal.wiki/images/e/e0/ LinkedDataSurvey.pdf • BibTeX: • http://tw.rpi.edu/proj/portal.wiki/images/3/37/ LinkedDataSurvey.bbl 41
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×