0
The RDF Report                                           Card                               Beyond the Triple Count       ...
Triple counts tell us nothing
Triple counts are not a quality           indicator
http://dbpedia.org/resource/London
6 triples for Population DensityProperty                                                       Count   Valuehttp://dbpedia...
12 triples for Location (1)Property                Count   Valuegeorss:point            1        51.507222222222225       ...
12 triples for Location (2)Property                Count   Valuedbpprop:latd            1       51dbpprop:latm            ...
~4.6m redundant triples
Triple counts dont indicate utility
http://bbc.co.uk/programmes2.5 million unique users per week, 60 req/s * *  http://www.guardian.co.uk/media/pda/2011/apr/0...
http://bbc.co.uk/programmes  Dataset is less than 50 million triples
Beyond the Triple Count
Dataset Information SpectrumLow Detail                         High DetailSummary and overview      Detailed data modelof ...
Dataset Information SpectrumLow Detail                                        High DetailSummary and overview             ...
Dataset Information SpectrumLow Detail                          High DetailMetadata     ● Title, Description             ●...
Dataset Information SpectrumLow Detail                          High DetailScope        ● What types of entity?           ...
Dataset Information SpectrumLow Detail                           High DetailStructure    ● URI Scheme             ● Vocabu...
Dataset Information SpectrumLow Detail                          High DetailInternals    ● List of Schemas & RDF terms     ...
RDF Report Card Example
Summarising Content of a Dataset●   Find all classes in all datasets in Kasabi●   Tag each class against a pre-defined set...
Report Card Categories
Ordnance Surveyhttp://beta.kasabi.com/dataset/ordnance-survey-linked-data
BBC Musichttp://beta.kasabi.com/dataset/bbc-music
British National Bibliographyhttp://beta.kasabi.com/dataset/british-national-bibliography-bnb
NHS Performance Datahttp://beta.kasabi.com/dataset/nhs-performance-data
Summary●   Triple counts tell us nothing●   Vital to present the quality & utility of our data    ●   Data publishing plat...
The RDF Report Card: Beyond the Triple Count
Upcoming SlideShare
Loading in...5
×

The RDF Report Card: Beyond the Triple Count

11,761

Published on

My talk from the Semtech Biz conference in London.

I argued that it is time to move beyond discussing size of datasets and encourage a more nuanced view to understand quality and utility.

The RDF Report Card is offered as one simple, high-level visualization.

Published in: Technology, Education
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
11,761
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
19
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "The RDF Report Card: Beyond the Triple Count"

  1. 1. The RDF Report Card Beyond the Triple Count 26th September 2011 SemTechBiz 2011Leigh Dodds@ldoddshttp://kasabi.comhttp://slideshare.net/ldodds
  2. 2. Triple counts tell us nothing
  3. 3. Triple counts are not a quality indicator
  4. 4. http://dbpedia.org/resource/London
  5. 5. 6 triples for Population DensityProperty Count Valuehttp://dbpedia.org/ontology/PopulatedPlace/populationDensity 2 4807.0 4806.971873853451http://dbpedia.org/ontology/populationDensity 2 4806.971874 4807.000000http://dbpedia.org/property/populationDensityKm 1 4807http://dbpedia.org/property/populationDensitySqMi 1 12450
  6. 6. 12 triples for Location (1)Property Count Valuegeorss:point 1 51.507222222222225 -0.1275geo:geometry 1 POINT(-0.1275 51.5072)geo:lat 1 51.507221geo:long 1 -0.127500
  7. 7. 12 triples for Location (2)Property Count Valuedbpprop:latd 1 51dbpprop:latm 1 30dbpprop:lats 1 26dbpprop:latns 1 Ndbpprop:longd 1 0dbpprop:longm 1 7dbpprop:longs 1 39dbpprop:longew 1 W
  8. 8. ~4.6m redundant triples
  9. 9. Triple counts dont indicate utility
  10. 10. http://bbc.co.uk/programmes2.5 million unique users per week, 60 req/s * * http://www.guardian.co.uk/media/pda/2011/apr/06/bbc-yves-raimond
  11. 11. http://bbc.co.uk/programmes Dataset is less than 50 million triples
  12. 12. Beyond the Triple Count
  13. 13. Dataset Information SpectrumLow Detail High DetailSummary and overview Detailed data modelof dataset content documentation & guides
  14. 14. Dataset Information SpectrumLow Detail High DetailSummary and overview Detailed data modelof dataset content documentation & guides More Information
  15. 15. Dataset Information SpectrumLow Detail High DetailMetadata ● Title, Description ● Provenance ● Publication dates ● Licensing ● Usage cues ● Related datasets
  16. 16. Dataset Information SpectrumLow Detail High DetailScope ● What types of entity? ● How many of each type? ● Coverage ● Geographic ● Events (time)
  17. 17. Dataset Information SpectrumLow Detail High DetailStructure ● URI Scheme ● Vocabulary meshing ● How is a person described?
  18. 18. Dataset Information SpectrumLow Detail High DetailInternals ● List of Schemas & RDF terms ● Class/property usage counts ● Triple counts ● Named graph structure ● Source files
  19. 19. RDF Report Card Example
  20. 20. Summarising Content of a Dataset● Find all classes in all datasets in Kasabi● Tag each class against a pre-defined set of categories ● Customized version of top-level schema.org classes● Generate a report card for each dataset listing types of entity
  21. 21. Report Card Categories
  22. 22. Ordnance Surveyhttp://beta.kasabi.com/dataset/ordnance-survey-linked-data
  23. 23. BBC Musichttp://beta.kasabi.com/dataset/bbc-music
  24. 24. British National Bibliographyhttp://beta.kasabi.com/dataset/british-national-bibliography-bnb
  25. 25. NHS Performance Datahttp://beta.kasabi.com/dataset/nhs-performance-data
  26. 26. Summary● Triple counts tell us nothing● Vital to present the quality & utility of our data ● Data publishing platforms should support this● "Progressive disclosure" ● Right detail at the right time● Dataset analysis can generate useful summaries ● e.g. an RDF report card
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×