Using linked data for dataset publication

  • 557 views
Uploaded on

Presentation to Liber IDCC workshop on metadata for dataset reuse.

Presentation to Liber IDCC workshop on metadata for dataset reuse.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
557
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
16
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Context about bathing water quality
  • context

Transcript

  • 1. Semantic web and linked data for data set publication Dave Reynolds, Epimorphics Ltd @der42
  • 2. Outline Background on linked data Roles in data set publishing Case study: Environment Agency Lessons
  • 3. Linked data background
  • 4. Linked data ... publishing data on the web ... ... to enable integration, linking and reuse across silos
  • 5. Linked dataApply the principles to the web to publication of dataThe linked data web:  is a global network of things  each identified by a URI  fetching a URI gives a set of statements in RDF  things connected by typed links  open, anyone can say anything about anything elseLinked data is “data you can click on”
  • 6. Example schools information http://education.data.gov.uk/id/school/401874
  • 7. Example schools information http://education.data.gov.uk/id/school/401874 a School label phase district “Secondary”“Cardiff High School” “Cardiff”
  • 8. Example schools information http://education.data.gov.uk/id/school/401874 a school:School phase label district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff”
  • 9. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT rdfs:label “Cardiff”
  • 10. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff” http://data.ordnancesurvey.co.uk/id/7000000000025484 admingeo:ward spatial:extent admingeo:parish GML: 310499.4 184176.6 310476.5 ...
  • 11. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff” owl:sameAs http://data.ordnancesurvey.co.uk/id/7000000000025484 admingeo:ward spatial:extent admingeo:parish GML: 310499.4 184176.6 310476.5 ...
  • 12. Role in data set publication well suited to describing things  schools, companies, animal species, music tracks, tv programmes ... what about datasets?  environmental measurements, experimental results, statistical analyses ...
  • 13. Approach 1 : Data catalogues treat the dataset as a single resource, identify with a URI provide metadata as linked data  descriptive  categorical  technical and structuralBenefits?  separate of metadata from resource & repository  easy aggregation of metadata into catalogues  schema-less enables use-specific annotations and links  use of sharable category schemes and reference data=> support for discovery
  • 14. Approach 2 : Fine grain publication publish the data set itself as linked data  entities, terms, individual records in data identified by URIs  data set structure and ontologies linked from data  still include dataset metadataBenefits?  all benefits of approach 1 to support discovery  self-describing  data slices addressable (trace back, provenance, annotation)  integration across sets - reuse of terms for dimensions, units, values  fine grained access=> integration, comparison, context, data as a service
  • 15. bathing water quality what we do... start of season 15th May Press interest bathing seasonwhat information 20-22 samples in 22weeksis relevant to the publicabout beaches 30th Sept annual report what November we do December
  • 16. how linkable data helps Tenby Tourist Information Centre Unit 2 , The Gateway Complex Tenby. Wales , SA70 7LT Tel: 01834 842 402 Fax: 01834 845 439 Email: tenby.tic@pembrokeshire.gov.uk Photo by Skellig2008 (flickr)
  • 17. Publishing the Bathing Water Quality data set Bathing Sampling Zones Of Assessment Vocabularies Waters Points Influence s e.g. http://location.data.gov.uk/def/ef/SampingPoint URI Set Bathing Sampling Zone OfReference Data Waters Points Influence e.g. http://location.data.gov.uk/so/ef/SamplingPoint/bwsp.eaew Assessme http://environment.data.gov.uk/data/bathing-water-quality Observation nt Datasets void:subset void:subset In-season Annual Weekly .../compliance Complianc .../in-season Assessme e nt
  • 18. Data cube vocabulary collaborative development sponsored by data.gov.uk simple, flexible vocabulary mirrors core information models from:  SDMX (Statistical Data and Metadata eXchange)  DDI (Data Documentation Initiative) extension to SCOVO vocabularyimage: dullhunk @ flickr
  • 19. Data cube model A set of observations  indexed by dimensions  describing measures  interpreted according to attributes(e.g. region) dimension measure(s) attributes • population unit of measure = count = 32,567 status = preliminary ... dimension (e.g. time)
  • 20. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure qb:DataSet qb:SliceKey qb:slice qb:sliceStructure qb:dataset qb:Slice qb:subSlice qb:observation qb:Observation dimension values measure value(s) attribute values
  • 21. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure Observation qb:DataSet qb:SliceKey  measured values, at dimensions qb:slice qb:sliceStructure qb:dataset with attributes qb:Slice  direct link to DataSet qb:subSlice qb:observation qb:Observation dimension values measure value(s) attribute values
  • 22. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure Observation qb:DataSet qb:SliceKey  measured values, at dimensions qb:slice qb:sliceStructure qb:dataset with attributes qb:Slice  direct link to DataSet qb:subSlice Slice qb:observation qb:Observation  optional grouping by fixing dimensions dimension values measure value(s) attribute values  guide to presentation  allows for abbreviated data
  • 23. Data cube vocabulary2. Data Structure Definition explicit definition of cube qb:DataSet structure, inline in the data qb:structure enables qb:DataStructureDefinition  validation qb:component  visualization  discovery qb:ComponentSpecification  abbreviation qb:componentRequired qb:componentAttachment qb:order qb:dimension qb:measure qb:attribute
  • 24. Bathing Water Quality cubes measures  total coliform count, entero virus count, ...  sample classification dimensions  sampling point  sampling week  sampling year attributes  abnormal weather
  • 25. Everything has a URI  Selected Lists and Individual Bathing Waters  Lists and Individual Assessments  In-Season or Annual Compliance  Vocabulary Terms  Datasets (and subsets)  Presented as:  HTML, (for people)  JSON, XML, RDF and CSV (for programs)
  • 26. Data Platform and Applications Web of Linked Data http://environment.data.gov.uk/lab/bwq-os.html
  • 27. Outcomes bathing water quality information available  as both data set and set of web APIs  updated weekly (in season) third party applications to use and combine the data seed a web of environmental and location data  reference identifiers can be reused for related information  URI patterns designed to be compatible with INSPIRE
  • 28. Wrapping upimage: erika g. @ flickr.com
  • 29. Lessons importance of reference identifiers developer accessibility  linked data API publish once, consume many ways importance of maintenance and QoS expectation reusable patterns:  reusable vocabularies - Data Cube, org ...  URI patterns  provenance – OMPV and specializations incremental approach
  • 30. Acknowledgements Alex Coley (Environment Agency)  for slides 17, 18, and for sponsoring the bathing water quality data publication Stuart Williams  developer of the bathing water application and slides 19,27,28 John Sheridan (The National Archive)  for sponsoring the development of data cube Richard Cyganiak, Jeni Tennison  co-developers of the data cube vocabulary
  • 31. fin. fin.image: Christian Haugen @ flickr.com
  • 32. Spare
  • 33. Linked data principles Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover more things Pattern of application of semantic web stack
  • 34. Linked open data cloud: 2007Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 35. Linked open data cloud: 2009Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 36. Linked open data cloud: 2010Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 37. Accessing all this data link following  HTTP GET, follow links, aggregate relevant statements query  SPARQL
  • 38. SPARQL core idea is pattern matching  graph patterns with variables  any subgraph which matches yields row of bindings ont:districtAdministrative rdfs:label ?school [] “Cardiff” syntax based on Turtle syntax for RDF web API endpoints lots of power  filters  sub-queries  federated query  optionals  property chains  update  named graphs  aggregation  construct
  • 39. Accessing all this data link following  HTTP GET, follow links, aggregate relevant statements query  SPARQL linked data API  RESTful API onto linked data resources  simple query, usable without RDF stack, web dev friendly  easy to layer visualizations and UIs on top third parties  search engines and aggregators e.g. Sindice, sameAs.org
  • 40. Semantic web layer cake
  • 41. Data.gov.ukvisualizations on top of linked data
  • 42. Data.gov.uk – linked datasets and APIs