Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The RDF Report Card: Beyond the Triple Count

13,000 views

Published on

My talk from the Semtech Biz conference in London.

I argued that it is time to move beyond discussing size of datasets and encourage a more nuanced view to understand quality and utility.

The RDF Report Card is offered as one simple, high-level visualization.

Published in: Technology, Education

The RDF Report Card: Beyond the Triple Count

  1. 1. The RDF Report Card Beyond the Triple Count 26th September 2011 SemTechBiz 2011Leigh Dodds@ldoddshttp://kasabi.comhttp://slideshare.net/ldodds
  2. 2. Triple counts tell us nothing
  3. 3. Triple counts are not a quality indicator
  4. 4. http://dbpedia.org/resource/London
  5. 5. 6 triples for Population DensityProperty Count Valuehttp://dbpedia.org/ontology/PopulatedPlace/populationDensity 2 4807.0 4806.971873853451http://dbpedia.org/ontology/populationDensity 2 4806.971874 4807.000000http://dbpedia.org/property/populationDensityKm 1 4807http://dbpedia.org/property/populationDensitySqMi 1 12450
  6. 6. 12 triples for Location (1)Property Count Valuegeorss:point 1 51.507222222222225 -0.1275geo:geometry 1 POINT(-0.1275 51.5072)geo:lat 1 51.507221geo:long 1 -0.127500
  7. 7. 12 triples for Location (2)Property Count Valuedbpprop:latd 1 51dbpprop:latm 1 30dbpprop:lats 1 26dbpprop:latns 1 Ndbpprop:longd 1 0dbpprop:longm 1 7dbpprop:longs 1 39dbpprop:longew 1 W
  8. 8. ~4.6m redundant triples
  9. 9. Triple counts dont indicate utility
  10. 10. http://bbc.co.uk/programmes2.5 million unique users per week, 60 req/s * * http://www.guardian.co.uk/media/pda/2011/apr/06/bbc-yves-raimond
  11. 11. http://bbc.co.uk/programmes Dataset is less than 50 million triples
  12. 12. Beyond the Triple Count
  13. 13. Dataset Information SpectrumLow Detail High DetailSummary and overview Detailed data modelof dataset content documentation & guides
  14. 14. Dataset Information SpectrumLow Detail High DetailSummary and overview Detailed data modelof dataset content documentation & guides More Information
  15. 15. Dataset Information SpectrumLow Detail High DetailMetadata ● Title, Description ● Provenance ● Publication dates ● Licensing ● Usage cues ● Related datasets
  16. 16. Dataset Information SpectrumLow Detail High DetailScope ● What types of entity? ● How many of each type? ● Coverage ● Geographic ● Events (time)
  17. 17. Dataset Information SpectrumLow Detail High DetailStructure ● URI Scheme ● Vocabulary meshing ● How is a person described?
  18. 18. Dataset Information SpectrumLow Detail High DetailInternals ● List of Schemas & RDF terms ● Class/property usage counts ● Triple counts ● Named graph structure ● Source files
  19. 19. RDF Report Card Example
  20. 20. Summarising Content of a Dataset● Find all classes in all datasets in Kasabi● Tag each class against a pre-defined set of categories ● Customized version of top-level schema.org classes● Generate a report card for each dataset listing types of entity
  21. 21. Report Card Categories
  22. 22. Ordnance Surveyhttp://beta.kasabi.com/dataset/ordnance-survey-linked-data
  23. 23. BBC Musichttp://beta.kasabi.com/dataset/bbc-music
  24. 24. British National Bibliographyhttp://beta.kasabi.com/dataset/british-national-bibliography-bnb
  25. 25. NHS Performance Datahttp://beta.kasabi.com/dataset/nhs-performance-data
  26. 26. Summary● Triple counts tell us nothing● Vital to present the quality & utility of our data ● Data publishing platforms should support this● "Progressive disclosure" ● Right detail at the right time● Dataset analysis can generate useful summaries ● e.g. an RDF report card

×