The RDF Report Card: Beyond the Triple Count
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

The RDF Report Card: Beyond the Triple Count

  • 13,608 views
Uploaded on

My talk from the Semtech Biz conference in London....

My talk from the Semtech Biz conference in London.

I argued that it is time to move beyond discussing size of datasets and encourage a more nuanced view to understand quality and utility.

The RDF Report Card is offered as one simple, high-level visualization.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
13,608
On Slideshare
12,255
From Embeds
1,353
Number of Embeds
11

Actions

Shares
Downloads
19
Comments
0
Likes
6

Embeds 1,353

http://blog.kasabi.com 708
http://blog.ldodds.com 538
http://paper.li 65
http://www.semanticaweb.info 22
http://a0.twimg.com 4
http://a0.twimg.com 4
http://iricelino.org 3
https://twitter.com 3
http://us-w1.rockmelt.com 3
https://www.linkedin.com 2
http://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The RDF Report Card Beyond the Triple Count 26th September 2011 SemTechBiz 2011Leigh Dodds@ldoddshttp://kasabi.comhttp://slideshare.net/ldodds
  • 2. Triple counts tell us nothing
  • 3. Triple counts are not a quality indicator
  • 4. http://dbpedia.org/resource/London
  • 5. 6 triples for Population DensityProperty Count Valuehttp://dbpedia.org/ontology/PopulatedPlace/populationDensity 2 4807.0 4806.971873853451http://dbpedia.org/ontology/populationDensity 2 4806.971874 4807.000000http://dbpedia.org/property/populationDensityKm 1 4807http://dbpedia.org/property/populationDensitySqMi 1 12450
  • 6. 12 triples for Location (1)Property Count Valuegeorss:point 1 51.507222222222225 -0.1275geo:geometry 1 POINT(-0.1275 51.5072)geo:lat 1 51.507221geo:long 1 -0.127500
  • 7. 12 triples for Location (2)Property Count Valuedbpprop:latd 1 51dbpprop:latm 1 30dbpprop:lats 1 26dbpprop:latns 1 Ndbpprop:longd 1 0dbpprop:longm 1 7dbpprop:longs 1 39dbpprop:longew 1 W
  • 8. ~4.6m redundant triples
  • 9. Triple counts dont indicate utility
  • 10. http://bbc.co.uk/programmes2.5 million unique users per week, 60 req/s * * http://www.guardian.co.uk/media/pda/2011/apr/06/bbc-yves-raimond
  • 11. http://bbc.co.uk/programmes Dataset is less than 50 million triples
  • 12. Beyond the Triple Count
  • 13. Dataset Information SpectrumLow Detail High DetailSummary and overview Detailed data modelof dataset content documentation & guides
  • 14. Dataset Information SpectrumLow Detail High DetailSummary and overview Detailed data modelof dataset content documentation & guides More Information
  • 15. Dataset Information SpectrumLow Detail High DetailMetadata ● Title, Description ● Provenance ● Publication dates ● Licensing ● Usage cues ● Related datasets
  • 16. Dataset Information SpectrumLow Detail High DetailScope ● What types of entity? ● How many of each type? ● Coverage ● Geographic ● Events (time)
  • 17. Dataset Information SpectrumLow Detail High DetailStructure ● URI Scheme ● Vocabulary meshing ● How is a person described?
  • 18. Dataset Information SpectrumLow Detail High DetailInternals ● List of Schemas & RDF terms ● Class/property usage counts ● Triple counts ● Named graph structure ● Source files
  • 19. RDF Report Card Example
  • 20. Summarising Content of a Dataset● Find all classes in all datasets in Kasabi● Tag each class against a pre-defined set of categories ● Customized version of top-level schema.org classes● Generate a report card for each dataset listing types of entity
  • 21. Report Card Categories
  • 22. Ordnance Surveyhttp://beta.kasabi.com/dataset/ordnance-survey-linked-data
  • 23. BBC Musichttp://beta.kasabi.com/dataset/bbc-music
  • 24. British National Bibliographyhttp://beta.kasabi.com/dataset/british-national-bibliography-bnb
  • 25. NHS Performance Datahttp://beta.kasabi.com/dataset/nhs-performance-data
  • 26. Summary● Triple counts tell us nothing● Vital to present the quality & utility of our data ● Data publishing platforms should support this● "Progressive disclosure" ● Right detail at the right time● Dataset analysis can generate useful summaries ● e.g. an RDF report card