Making Use of the Linked Data Cloud: The Role of Index Structures

590 views

Published on

The intensive growth of the Linked Open Data Cloud has spawned a web of data where a multitude of data sources provides huge amounts of valuable information across different domains. Nowadays, when accessing and using Linked Data more and more often the challenging question is not so much whether there is relevant data available, but rather where it can be found and how it is structured. Thus, index structures play an important role for making use of the information in LOD cloud. In this talk I will address three aspects of Linked Data index structures: (1) a high level view and categorization of indices structures and how they can be queried and explored, (2) approaches for building index structures and the need to maintain them and (3) some example applications which greatly benefit from indices over linked data.

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
590
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Making Use of the Linked Data Cloud: The Role of Index Structures

  1. 1. Institute for Web Science & Technologies – WeST Making Use of the Linked Data Cloud: The Role of Index Structures Thomas Gottron March 20th, 2014 FGDB Frühjahrstreffen
  2. 2. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 2Role of Index Structures on LOD Making Use of the Linked Data Cloud ... Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ LOD: a rich, huge, diverse, public and distributed knowledge base on the Web. Pros Cons rich knowledge base diversepublic huge on the Web diversedistributed Shall I?
  3. 3. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 3Role of Index Structures on LOD Challenges Underlying the „Cons“ Volume Semi- structured No schema No central access point Multitude of data sources Quality Dynamics Availability huge
  4. 4. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 4Role of Index Structures on LOD Making Use of the Linked Data Cloud ... Pros Cons rich knowledge base diversepublic huge on the Web diversedistributed Shall I?
  5. 5. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 5Role of Index Structures on LOD 20 years ago ...
  6. 6. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 6Role of Index Structures on LOD Making Use of the World Wide Web... Shall I? Source: Chris 73 / Wikimedia Commons Pros Cons rich document collection diversepublic huge on the Internet diversedistributed Technical solutions to the problems
  7. 7. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 7Role of Index Structures on LOD Making Use of the Linked Data Cloud ... Shall I? Pros Cons rich knowledge base diversepublic huge on the Web diversedistributed Indexstructures Provide: Solutions for the storage, management, organization of, and access to a rich, huge, diverse distributed knowledge base on the Web.
  8. 8. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 8Role of Index Structures on LOD Types of Indices Building Indices Using Indices k1 k2 k3 ... kn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... Search data structure Efficientstorage andretrieval s1 o1p1 c1 s1 o1p2 c1 s2 o2p2 c1 s1 p1 p2 s2 p2 p1 p2 s1 s3 p2 s2 E1 rdf:type dc:creator E2 Bad News ...dc:title foaf:Document swrc:InProceedings rdf:type
  9. 9. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 9Role of Index Structures on LOD Types of Indices Building Indices Using Indices k1 k2 k3 ... kn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... Search data structure Efficientstorage andretrieval s1 o1p1 c1 s1 o1p2 c1 s2 o2p2 c1 s1 p1 p2 s2 p2 p1 p2 s1 s3 p2 s2 E1 rdf:type dc:creator E2 Bad News ...dc:title foaf:Document swrc:InProceedings rdf:type
  10. 10. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 10Role of Index Structures on LOD Data Format §  Linked Data as N-Quads: triple – what is the information? context URI – where does it come from? s op c ( )s op c
  11. 11. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 11Role of Index Structures on LOD Index Models
  12. 12. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 12Role of Index Structures on LOD (Abstract) Index Models w  D : Data elements to be retrieved (payload) w  K : Key elements to access the data (index elements) w  σ : Selection function: How to get data for a key k1 k2 k3 ... kn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... DK σ Searchdata structure Efficientstorage andretrieval ℘( ) Data items / PayloadKeys
  13. 13. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 13Role of Index Structures on LOD Concrete Example: Subject Based Index Model ukob:Gottron ukob:Staab ukob:Schegi ... tud:CGottron (ukob:Gottron, rdf:type, foaf:Person) (ukob:Gottron, foaf:knows, ukob:Staab) ... (ukob:Staab, swrc:institution, ukob:WeST) (ukob:Staab, foaf:name, „Steffen Staab“) ... (ukob:Schegi, rdf:type, foaf:Person) (ukob:Schegi, foaf:name, „Stefan Scheglmann“) (tud:CGottron, swrc:institution, tud:KOM) (tud:CGottron, foaf:knows, ukob:Gottron) ...
  14. 14. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 14Role of Index Structures on LOD Schema-level Indices
  15. 15. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 15Role of Index Structures on LOD Schema Information on the LOD Cloud (No) Schema? Guidelines / best practices Automatic tools Social effects Emerging Schema! Induce from data observations
  16. 16. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 16Role of Index Structures on LOD Examples for Schema Information p1 x p2 p3 {p1, p2, p3} ... x, ... {cA, cB} ... y, ... rdf:type y cB cA rdf:type Property Set Type Set
  17. 17. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 17Role of Index Structures on LOD Indexing „Styles“ for the Payload Full Caching local Web s op c Triples local Web s op Entities local Web s Data Sources local Web c
  18. 18. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 18Role of Index Structures on LOD Schema-based Access to the LOD cloud ? foaf:Document fb:Computer_Scientist dc:creator x swrc:InProceedings SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
  19. 19. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 19Role of Index Structures on LOD Schema-based Access to the LOD cloud Schema- level Index Where? •  ACM •  DBLP SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
  20. 20. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 20Role of Index Structures on LOD Building Indices s1 o1p1 c1 s1 o1p2 c1 s2 o2p2 c1 s1 p1 p2 s2 p2 p1 p2 s1 s3 p2 s2 Types of Indices k1 k2 k3 ... kn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... Search data structure Efficientstorage andretrieval Using Indices E1 rdf:type dc:creator E2 Bad News ...dc:title foaf:Document swrc:InProceedings rdf:type
  21. 21. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 21Role of Index Structures on LOD Index Construction
  22. 22. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 22Role of Index Structures on LOD Building Indices: Operators §  Combination of few simple operations w  Aggregate, Join, Invert §  Example: Property Set index s1 o1p1 c1 s1 o1p2 c1 s2 o2p2 c1 s3 o3p1 c1 s3 o4p2 c1 s4 o1p3 c1 s1 p1 p2 s2 p2 s3 p1 p2 s4 p3 p1 p2 s1 s3 p2 s2 p3 s4 Aggregate Invert
  23. 23. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 23Role of Index Structures on LOD 12 Implemented Index Models §  Triple based w  Subject à Triple w  Predicate à Triple w  Object à Triple §  Meta data w  Keywords à Triple w  Context à Triple w  PLD à Triple §  Schema-level w  RDF Type à Entity w  Type set (TS) à Entity w  Property set (PS) à Entity w  Incoming property set (IPS) à Entity w  Type and properties (ECS) à Entity w  SchemEX à Entity https://github.com/gottron/lod-index-models
  24. 24. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 24Role of Index Structures on LOD Indices over Evolving Data
  25. 25. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 25Role of Index Structures on LOD Index Maintenance 2007 2008 2009 2010 2011 Not just growth, but also deletion and modification of data
  26. 26. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 26Role of Index Structures on LOD How to Measure Accuracy? §  Queries? w  No established query log for data set w  Different key elements require different queries w  Cover all of the index §  Distributions! w  Relevant to several applications w  Established metrics for comparison SPARQL
  27. 27. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 27Role of Index Structures on LOD Quantifying Divergence of Index Accuracy over Time Index construction / Estimation of distributions ... ... T0 (Base) T1 T2 T3 Tn ... Tn-1 T0 „deviation“ T1 T2 T3 TnTn-1
  28. 28. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 28Role of Index Structures on LOD Evolving Data: Normalised Perplexity 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 70 Norm.Perplexity Week of Data Snapshot Subject Predicate Object 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 70 Norm.Perplexity Week of Data Snapshot Context Keywords PLD 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 70 Norm.Perplexity Week of Data Snapshot RDF Type TS PS IPS 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 70 Norm.Perplexity Week of Data Snapshot ECS SchemEX
  29. 29. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 29Role of Index Structures on LOD Evolving Data: Normalised Perplexity (Zoom in) 0.00 0.02 0.04 0.06 0.08 0.10 0 10 20 30 40 50 60 70 Norm.Perplexity Week of Data Snapshot Subject Predicate Object 0.00 0.02 0.04 0.06 0.08 0.10 0 10 20 30 40 50 60 70 Norm.Perplexity Week of Data Snapshot Context Keywords PLD 0.00 0.02 0.04 0.06 0.08 0.10 0 10 20 30 40 50 60 70 Norm.Perplexity Week of Data Snapshot RDF Type TS PS IPS 0.00 0.02 0.04 0.06 0.08 0.10 0 10 20 30 40 50 60 70 Norm.Perplexity Week of Data Snapshot ECS SchemEX
  30. 30. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 30Role of Index Structures on LOD Using Indices E1 rdf:type dc:creator E2 Bad News ...dc:title foaf:Document swrc:InProceedings rdf:type Types of Indices k1 k2 k3 ... kn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... Search data structure Efficientstorage andretrieval Building Indices s1 o1p1 c1 s1 o1p2 c1 s2 o2p2 c1 s1 p1 p2 s2 p2 p1 p2 s1 s3 p2 s2
  31. 31. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 31Role of Index Structures on LOD Programming Support
  32. 32. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 32Role of Index Structures on LOD LITEQ and NPQL §  Support programming with Linked Data sources §  NPQL (Node Path Query Language) w  Intensional queries à class descriptions, properties w  Extensional queries à instance data §  LITEQ w  Implementiation of NPQL (F# type provider) w  Autocompletion
  33. 33. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 33Role of Index Structures on LOD LITEQ and NPQL §  RDF type and property navigation (intension) dC.``http://example.org/ns#creature``↵
 .SubTypeNavigation.````http://example.org/ns#dog`` ``http://example.org/ns#cat`` ``http://example.org/ns#person`` ...
  34. 34. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 34Role of Index Structures on LOD LITEQ and NPQL §  RDF type and property navigation (intension) dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``
  35. 35. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 35Role of Index Structures on LOD LITEQ and NPQL §  RDF type and property navigation (intension) dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵
 .PropNavigation.````http://example.org/ns#hasOwner`` ``http://example.org/ns#hasName`` ``http://example.org/ns#taxNumber`` ...
  36. 36. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 36Role of Index Structures on LOD LITEQ and NPQL §  RDF type and property navigation (intension) dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵
 .PropNavigation.``http://example.org/ns#hasOwner``
  37. 37. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 37Role of Index Structures on LOD LITEQ and NPQL §  Accessing instances (extension) let allDogs = dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``.↵
 .Extension §  Accessing individuals let bello = dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵ .Individuals.``http://example.org/ns#bello``↵ .getRdfObject bello.get_hasName() bello.get_taxNumber()
  38. 38. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 38Role of Index Structures on LOD Exploring Entity Descriptions
  39. 39. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 39Role of Index Structures on LOD Schema-based Access to the LOD cloud Schema- level Index Where? •  ACM •  DBLP SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
  40. 40. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 40Role of Index Structures on LOD Schema-level Search of Relevant Data Sources
  41. 41. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 41Role of Index Structures on LOD Searching for a Suitable Description SELECT ?x WHERE { ?x rdf:type foaf:Document } SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type foaf:PersonalProfileDocument } SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type sioc:Post . } Did you mean ... Related Queries ... So far: gentle, iterative modification
  42. 42. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 42Role of Index Structures on LOD Parallel Indices Over the Data ts1 ts2 ts3 ... tsn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... psA psB psC ... psM dA,1 dA,2 dA,3 ... dB,1 dB,2 dC,1 dM,1 dM,2 dM,3 ...
  43. 43. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 43Role of Index Structures on LOD Parallel Indices Over the Data ts1 ts2 ts3 ... tsn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 psA psB psC ... psM dA,1 dA,2 dA,3 ... dB,1 dB,2 dC,1 dM,1 dM,2 dM,3 ...
  44. 44. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 44Role of Index Structures on LOD General Idea for Mapping Entity Set c1 c2 p3 p4 p5 Approx. Entity Set derive derive approximate description alternative description ts1 ts2 ts3 ... tsn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 psA psB psC ... psM dA,1 dA,2 dA,3 ... dB,1 dB,2 dC,1 dM,1 dM,2 dM,3 ...
  45. 45. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 45Role of Index Structures on LOD Types of Indices Building Indices Using Indices k1 k2 k3 ... kn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... Search data structure Efficientstorage andretrieval s1 o1p1 c1 s1 o1p2 c1 s2 o2p2 c1 s1 p1 p2 s2 p2 p1 p2 s1 s3 p2 s2 E1 rdf:type dc:creator E2 Bad News ...dc:title foaf:Document swrc:InProceedings rdf:type
  46. 46. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 46Role of Index Structures on LOD Summary Pros Cons rich knowledge base diversepublic huge on the Web diversedistributed k1 k2 k3 ... kn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... Technical solutions to some of the problems
  47. 47. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 47Role of Index Structures on LOD Summary Pros Cons rich knowledge base diversepublic huge on the Web diversedistributed
  48. 48. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 48Role of Index Structures on LOD Thank you! Contact: Thomas Gottron WeST – Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.de
  49. 49. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 49Role of Index Structures on LOD References 1.  M. Konrath, T. Gottron, and A. Scherp, “Schemex – web-scale indexed schema extraction of linked open data,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2011. 2.  M. Konrath, T. Gottron, S. Staab, and A. Scherp, “Schemex—efficient construction of a data catalogue by stream-based indexing of linked data,” Journal of Web Semantics, 2012. 3.  T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “Explicit and implicit schema information on the linked open data cloud: Joined forces or antagonists?,” Tech. Rep. 06/2012, Institut WeST, Universität Koblenz-Landau, 2012. 4.  T. Gottron and R. Pickhardt, “A detailed analysis of the quality of stream-based schema construction on linked open data,” in CSWS’12: Proceedings of the Chinese Semantic Web Symposium, 2012. 5.  T. Gottron, A. Scherp, B. Krayer, and A. Peters, “Get the google feeling: Supporting users in finding relevant sources of linked open data at web-scale,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2012. 6.  T. Gottron, A. Scherp, B. Krayer, and A. Peters, “LODatio: Using a Schema-Based Index to Support Users in Finding Relevant Sources of Linked Data,” in K-CAP’13: Proceedings of the Conference on Knowledge Capture, 2013. 7.  T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud,” in ESWC’13: Proceedings of the 10th Extended Semantic Web Conference, 2013. 8.  J. Schaible, T. Gottron, S. Scheglmann, and A. Scherp, “LOVER: Support for Modeling Data Using Linked Open Vocabularies,” in LWDM’13: 3rd International Workshop on Linked Web Data Management, 2013. 9.  R. Dividino, A. Scherp, G. Gröner, and T. Gottron, “Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not?,” in COLD’13: International Workshop on Consuming Linked Data, 2013.
  50. 50. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 50Role of Index Structures on LOD References 10.  T. Gottron, M. Knauf, and A. Scherp, “Analysis of schema structures in the linked open data graph based on unique subject uris, pay-level domains, and vocabulary usage,” Distributed and Parallel Databases, pp. 1–39, 2014. 11.  T. Gottron and C. Gottron, “Perplexity of index models over evolving linked data,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014. 12.  T. Gottron, A. Scherp, and S. Scheglmann, “Providing alternative declarative descriptions for entity sets using parallel concept lattices,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014. 13.  Carothers, G.: Rdf 1.1 n-quads. W3C Recommendation (Feb 2014), http://www.w3. org/TR/2014/REC-n- quads-20140225/, (accessed 14 March 2014) 14.  Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: The Se- mantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 213– 227. Springer Berlin Heidelberg (2013)
  51. 51. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 51Role of Index Structures on LOD Sources •  Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/, This work is available under a CC-BY-SA license. •  WorldWideWeb Around Wikipedia – Wikipedia as part of the world wide web, This Wikipedia and Wikimedia Commons image is from the user Chris 73 and is freely available at //commons.wikimedia.org/ wiki/File:WorldWideWebAroundWikipedia.png under the creative commons CC-BY-SA 3.0 license.

×