Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using linked data for dataset publication

1,021 views

Published on

Presentation to Liber IDCC workshop on metadata for dataset reuse.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Using linked data for dataset publication

  1. 1. Semantic web and linked data for data set publication Dave Reynolds, Epimorphics Ltd @der42
  2. 2. Outline Background on linked data Roles in data set publishing Case study: Environment Agency Lessons
  3. 3. Linked data background
  4. 4. Linked data ... publishing data on the web ... ... to enable integration, linking and reuse across silos
  5. 5. Linked dataApply the principles to the web to publication of dataThe linked data web:  is a global network of things  each identified by a URI  fetching a URI gives a set of statements in RDF  things connected by typed links  open, anyone can say anything about anything elseLinked data is “data you can click on”
  6. 6. Example schools information http://education.data.gov.uk/id/school/401874
  7. 7. Example schools information http://education.data.gov.uk/id/school/401874 a School label phase district “Secondary”“Cardiff High School” “Cardiff”
  8. 8. Example schools information http://education.data.gov.uk/id/school/401874 a school:School phase label district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff”
  9. 9. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT rdfs:label “Cardiff”
  10. 10. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff” http://data.ordnancesurvey.co.uk/id/7000000000025484 admingeo:ward spatial:extent admingeo:parish GML: 310499.4 184176.6 310476.5 ...
  11. 11. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff” owl:sameAs http://data.ordnancesurvey.co.uk/id/7000000000025484 admingeo:ward spatial:extent admingeo:parish GML: 310499.4 184176.6 310476.5 ...
  12. 12. Role in data set publication well suited to describing things  schools, companies, animal species, music tracks, tv programmes ... what about datasets?  environmental measurements, experimental results, statistical analyses ...
  13. 13. Approach 1 : Data catalogues treat the dataset as a single resource, identify with a URI provide metadata as linked data  descriptive  categorical  technical and structuralBenefits?  separate of metadata from resource & repository  easy aggregation of metadata into catalogues  schema-less enables use-specific annotations and links  use of sharable category schemes and reference data=> support for discovery
  14. 14. Approach 2 : Fine grain publication publish the data set itself as linked data  entities, terms, individual records in data identified by URIs  data set structure and ontologies linked from data  still include dataset metadataBenefits?  all benefits of approach 1 to support discovery  self-describing  data slices addressable (trace back, provenance, annotation)  integration across sets - reuse of terms for dimensions, units, values  fine grained access=> integration, comparison, context, data as a service
  15. 15. bathing water quality what we do... start of season 15th May Press interest bathing seasonwhat information 20-22 samples in 22weeksis relevant to the publicabout beaches 30th Sept annual report what November we do December
  16. 16. how linkable data helps Tenby Tourist Information Centre Unit 2 , The Gateway Complex Tenby. Wales , SA70 7LT Tel: 01834 842 402 Fax: 01834 845 439 Email: tenby.tic@pembrokeshire.gov.uk Photo by Skellig2008 (flickr)
  17. 17. Publishing the Bathing Water Quality data set Bathing Sampling Zones Of Assessment Vocabularies Waters Points Influence s e.g. http://location.data.gov.uk/def/ef/SampingPoint URI Set Bathing Sampling Zone OfReference Data Waters Points Influence e.g. http://location.data.gov.uk/so/ef/SamplingPoint/bwsp.eaew Assessme http://environment.data.gov.uk/data/bathing-water-quality Observation nt Datasets void:subset void:subset In-season Annual Weekly .../compliance Complianc .../in-season Assessme e nt
  18. 18. Data cube vocabulary collaborative development sponsored by data.gov.uk simple, flexible vocabulary mirrors core information models from:  SDMX (Statistical Data and Metadata eXchange)  DDI (Data Documentation Initiative) extension to SCOVO vocabularyimage: dullhunk @ flickr
  19. 19. Data cube model A set of observations  indexed by dimensions  describing measures  interpreted according to attributes(e.g. region) dimension measure(s) attributes • population unit of measure = count = 32,567 status = preliminary ... dimension (e.g. time)
  20. 20. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure qb:DataSet qb:SliceKey qb:slice qb:sliceStructure qb:dataset qb:Slice qb:subSlice qb:observation qb:Observation dimension values measure value(s) attribute values
  21. 21. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure Observation qb:DataSet qb:SliceKey  measured values, at dimensions qb:slice qb:sliceStructure qb:dataset with attributes qb:Slice  direct link to DataSet qb:subSlice qb:observation qb:Observation dimension values measure value(s) attribute values
  22. 22. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure Observation qb:DataSet qb:SliceKey  measured values, at dimensions qb:slice qb:sliceStructure qb:dataset with attributes qb:Slice  direct link to DataSet qb:subSlice Slice qb:observation qb:Observation  optional grouping by fixing dimensions dimension values measure value(s) attribute values  guide to presentation  allows for abbreviated data
  23. 23. Data cube vocabulary2. Data Structure Definition explicit definition of cube qb:DataSet structure, inline in the data qb:structure enables qb:DataStructureDefinition  validation qb:component  visualization  discovery qb:ComponentSpecification  abbreviation qb:componentRequired qb:componentAttachment qb:order qb:dimension qb:measure qb:attribute
  24. 24. Bathing Water Quality cubes measures  total coliform count, entero virus count, ...  sample classification dimensions  sampling point  sampling week  sampling year attributes  abnormal weather
  25. 25. Everything has a URI  Selected Lists and Individual Bathing Waters  Lists and Individual Assessments  In-Season or Annual Compliance  Vocabulary Terms  Datasets (and subsets)  Presented as:  HTML, (for people)  JSON, XML, RDF and CSV (for programs)
  26. 26. Data Platform and Applications Web of Linked Data http://environment.data.gov.uk/lab/bwq-os.html
  27. 27. Outcomes bathing water quality information available  as both data set and set of web APIs  updated weekly (in season) third party applications to use and combine the data seed a web of environmental and location data  reference identifiers can be reused for related information  URI patterns designed to be compatible with INSPIRE
  28. 28. Wrapping upimage: erika g. @ flickr.com
  29. 29. Lessons importance of reference identifiers developer accessibility  linked data API publish once, consume many ways importance of maintenance and QoS expectation reusable patterns:  reusable vocabularies - Data Cube, org ...  URI patterns  provenance – OMPV and specializations incremental approach
  30. 30. Acknowledgements Alex Coley (Environment Agency)  for slides 17, 18, and for sponsoring the bathing water quality data publication Stuart Williams  developer of the bathing water application and slides 19,27,28 John Sheridan (The National Archive)  for sponsoring the development of data cube Richard Cyganiak, Jeni Tennison  co-developers of the data cube vocabulary
  31. 31. fin. fin.image: Christian Haugen @ flickr.com
  32. 32. Spare
  33. 33. Linked data principles Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover more things Pattern of application of semantic web stack
  34. 34. Linked open data cloud: 2007Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  35. 35. Linked open data cloud: 2009Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  36. 36. Linked open data cloud: 2010Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  37. 37. Accessing all this data link following  HTTP GET, follow links, aggregate relevant statements query  SPARQL
  38. 38. SPARQL core idea is pattern matching  graph patterns with variables  any subgraph which matches yields row of bindings ont:districtAdministrative rdfs:label ?school [] “Cardiff” syntax based on Turtle syntax for RDF web API endpoints lots of power  filters  sub-queries  federated query  optionals  property chains  update  named graphs  aggregation  construct
  39. 39. Accessing all this data link following  HTTP GET, follow links, aggregate relevant statements query  SPARQL linked data API  RESTful API onto linked data resources  simple query, usable without RDF stack, web dev friendly  easy to layer visualizations and UIs on top third parties  search engines and aggregators e.g. Sindice, sameAs.org
  40. 40. Semantic web layer cake
  41. 41. Data.gov.ukvisualizations on top of linked data
  42. 42. Data.gov.uk – linked datasets and APIs

×