Institute for Web Science & Technologies – WeSTA Systematic Investigation ofExplicit and ImplicitSchema Information on the...
Thomas Gottron ESWC, 30.5.2013 2Schema Information on the LOD CloudExplicit and Implicit Schema Information on LODE1rdf:ty...
Thomas Gottron ESWC, 30.5.2013 3Schema Information on the LOD CloudMain questionsa) How much information is encoded in the...
Thomas Gottron ESWC, 30.5.2013 4Schema Information on the LOD CloudAnswers!InformationMeasuresApply onLODAnswers!Informati...
Thomas Gottron ESWC, 30.5.2013 5Schema Information on the LOD Cloude.g.Probabilistic model of schema informationP(T,R) r1 ...
Thomas Gottron ESWC, 30.5.2013 6Schema Information on the LOD CloudEstimating probabilitiesData set Triples TS PSRest 22.3...
Thomas Gottron ESWC, 30.5.2013 7Schema Information on the LOD CloudQuestion a)P(R) 1% 97% 1% 1%P(R) 26% 20% 37% 17%P(R) 25...
Thomas Gottron ESWC, 30.5.2013 8Schema Information on the LOD Clouda) Normalized marginal entropy Tendencies: Entropy of...
Thomas Gottron ESWC, 30.5.2013 9Schema Information on the LOD CloudQuestion b)-c)
Thomas Gottron ESWC, 30.5.2013 10Schema Information on the LOD Cloudb)-c) Expected conditional entropyP(T,R) r1 r2 r3 r4 P...
Thomas Gottron ESWC, 30.5.2013 11Schema Information on the LOD Cloudb)-c) Expected conditional entropyP(T,R) r1 r2 r3 r4 P...
Thomas Gottron ESWC, 30.5.2013 12Schema Information on the LOD Cloudb)-c) Expected conditional entropy Tendencies: The t...
Thomas Gottron ESWC, 30.5.2013 13Schema Information on the LOD CloudQuestion d)
Thomas Gottron ESWC, 30.5.2013 14Schema Information on the LOD Cloudd) Normalized Mutual Information (Redundancy)P(T,R) r1...
Thomas Gottron ESWC, 30.5.2013 15Schema Information on the LOD Cloudd) Normalized Mutual Information (Redundancy)P(T,R) r1...
Thomas Gottron ESWC, 30.5.2013 16Schema Information on the LOD Cloudd) Normalized Mutual Information (Redundancy) Tendenc...
Thomas Gottron ESWC, 30.5.2013 17Schema Information on the LOD CloudSummary Present a method to analyze redundancy of sch...
Thomas Gottron ESWC, 30.5.2013 18Schema Information on the LOD CloudThanks!Contact:Thomas GottronWeST – Institute for Web ...
Thomas Gottron ESWC, 30.5.2013 19Schema Information on the LOD CloudReferences1. M. Konrath, T. Gottron, S. Staab, and A. ...
Upcoming SlideShare
Loading in …5
×

ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud

718 views

Published on

Published in: Science, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
718
On SlideShare
0
From Embeds
0
Number of Embeds
52
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud

  1. 1. Institute for Web Science & Technologies – WeSTA Systematic Investigation ofExplicit and ImplicitSchema Information on theLinked Open Data CloudThomas Gottron, Malte Knauf, Stefan Scheglmann, Ansgar ScherpESWC 2013, Montpellier
  2. 2. Thomas Gottron ESWC, 30.5.2013 2Schema Information on the LOD CloudExplicit and Implicit Schema Information on LODE1rdf:typeExplicitAssigning class typesImplicitModelling attributesdc:creatorE2Bad News ...dc:titlefoaf:Documentswrc:InProceedingsrdf:typeClassEntityrdf:type EntityPropertyEntity 2
  3. 3. Thomas Gottron ESWC, 30.5.2013 3Schema Information on the LOD CloudMain questionsa) How much information is encoded in the type set orproperty set of a resource?b) How much information is still contained in the properties,once we know the types of a resource?c) How much information is still contained in the types, oncewe know the properties of a resource?d) To which degree can one information (either properties ortypes) explain the respective other?
  4. 4. Thomas Gottron ESWC, 30.5.2013 4Schema Information on the LOD CloudAnswers!InformationMeasuresApply onLODAnswers!InformationMeasuresApply onLODOutlineQuestionsa)-d)Answers!ModelSchemaInformationMeasuresApply onLOD
  5. 5. Thomas Gottron ESWC, 30.5.2013 5Schema Information on the LOD Cloude.g.Probabilistic model of schema informationP(T,R) r1 r2 r3 r4 P(T)t1 14% 2% 5% 8% 29%t2 5% 15% 2% 3% 25%t3 7% 3% 30% 5% 45%P(R) 26% 20% 37% 17%foaf:Documentswrc:InProceedingse.g.dc:creatordc:titleMarginalDistributionsMarginalDistributions
  6. 6. Thomas Gottron ESWC, 30.5.2013 6Schema Information on the LOD CloudEstimating probabilitiesData set Triples TS PSRest 22.3M 793 7,522Datahub 910.1M 28,924 14,712Dbpedia 198.1M 1,026,272 391,170Freebase 101.2M 69,732 162,023Timbl 204.8M 4,139 9,619
  7. 7. Thomas Gottron ESWC, 30.5.2013 7Schema Information on the LOD CloudQuestion a)P(R) 1% 97% 1% 1%P(R) 26% 20% 37% 17%P(R) 25% 25% 25% 25%
  8. 8. Thomas Gottron ESWC, 30.5.2013 8Schema Information on the LOD Clouda) Normalized marginal entropy Tendencies: Entropy of property sets is higher No very high values No values close to zeroData set H0(T) H0(R)Rest 0.252 0.366Datahub 0.263 0.250Dbpedia 0.093 0.324Freebase 0.127 0.166Timbl 0.214 0.276<><<<
  9. 9. Thomas Gottron ESWC, 30.5.2013 9Schema Information on the LOD CloudQuestion b)-c)
  10. 10. Thomas Gottron ESWC, 30.5.2013 10Schema Information on the LOD Cloudb)-c) Expected conditional entropyP(T,R) r1 r2 r3 r4 P(T)t1 14% 2% 5% 8% 29%t2 5% 15% 2% 3% 25%t3 7% 3% 30% 5% 45%P(R) 26% 20% 37% 17%1.7261.6101.4410.5010,4030,648
  11. 11. Thomas Gottron ESWC, 30.5.2013 11Schema Information on the LOD Cloudb)-c) Expected conditional entropyP(T,R) r1 r2 r3 r4 P(T)t1 22% 1% 0% 1% 24%t2 1% 23% 1% 0% 26%t3 1% 0% 48% 1% 50%P(R) 24% 24% 49% 2%
  12. 12. Thomas Gottron ESWC, 30.5.2013 12Schema Information on the LOD Cloudb)-c) Expected conditional entropy Tendencies: The types of a resource tell little about its properties Properties tell more about types Given information reduces the entropyData set H(T|R) H(R|T)Rest 0.289 2.568Datahub 1.319 0.876Dbpedia 0.688 4.856Freebase 0.286 1.117Timbl 0.386 1.464<><<<H(T) H(R)2.428 4.7083.904 3.4601.856 6.0272.037 2.8682.568 3.646<
  13. 13. Thomas Gottron ESWC, 30.5.2013 13Schema Information on the LOD CloudQuestion d)
  14. 14. Thomas Gottron ESWC, 30.5.2013 14Schema Information on the LOD Cloudd) Normalized Mutual Information (Redundancy)P(T,R) r1 r2 r3 r4 P(T)t1 14% 2% 5% 8% 29%t2 5% 15% 2% 3% 25%t3 7% 3% 30% 5% 45%P(R) 26% 20% 37% 17%
  15. 15. Thomas Gottron ESWC, 30.5.2013 15Schema Information on the LOD Cloudd) Normalized Mutual Information (Redundancy)P(T,R) r1 r2 r3 r4 P(T)t1 22% 1% 0% 1% 24%t2 1% 23% 1% 0% 26%t3 1% 0% 48% 1% 50%P(R) 24% 24% 49% 2%
  16. 16. Thomas Gottron ESWC, 30.5.2013 16Schema Information on the LOD Cloudd) Normalized Mutual Information (Redundancy) Tendencies: Relatively high redundancy Freebase: (weakly) pre-defined schema Timbl: narrow domain (FOAF profiles) DBpedia: de-centralized schemaData set H0(T,R)Rest 0.881Datahub 0.747Dbpedia 0.635Freebase 0.860Timbl 0.85014523
  17. 17. Thomas Gottron ESWC, 30.5.2013 17Schema Information on the LOD CloudSummary Present a method to analyze redundancy of schemainformation on LOD Observations Schema not dominated by few TS or PS combinations Attributes provide more information than types Attributes indicate types better than vice versa High redundancy of 63 to 88% on the analyzed segmentsof the LOD cloud Future Work Analysis on data provider level Temporal evolution
  18. 18. Thomas Gottron ESWC, 30.5.2013 18Schema Information on the LOD CloudThanks!Contact:Thomas GottronWeST – Institute for Web Science and TechnologiesUniversität Koblenz-Landaugottron@uni-koblenz.de
  19. 19. Thomas Gottron ESWC, 30.5.2013 19Schema Information on the LOD CloudReferences1. M. Konrath, T. Gottron, S. Staab, and A. Scherp, ―Schemex—efficient construction of a data catalogue bystream-based indexing of linked data,‖ Journal of Web Semantics, 2012.2. T. Gottron and R. Pickhardt, ―A detailed analysis of the quality of stream-based schema construction onlinked open data,‖ in CSWS’12: Proceedings of the Chinese Semantic Web Symposium, 2012. toappear.3. T. Gottron, A. Scherp, B. Krayer, and A. Peters, ―LODatio: Using a Schema-Based Index to SupportUsers in Finding Relevant Sources of Linked Data,‖ in K-CAP’13: Proceedings of the Conference onKnowledge Capture, 2013.

×