Semantic web and linked data     for data set publication         Dave Reynolds, Epimorphics Ltd                          ...
Outline   Background on linked data   Roles in data set publishing   Case study: Environment Agency   Lessons
Linked data background
Linked data ...    publishing data on the web ...   ... to enable integration, linking and reuse       across silos
Linked dataApply the principles to the web to publication of dataThe linked data web:     is a global network of things  ...
Example schools information         http://education.data.gov.uk/id/school/401874
Example schools information                 http://education.data.gov.uk/id/school/401874   a        School               ...
Example schools information                  http://education.data.gov.uk/id/school/401874               a        school:S...
Example schools information                  http://education.data.gov.uk/id/school/401874               rdf:type      sch...
Example schools information                  http://education.data.gov.uk/id/school/401874               rdf:type      sch...
Example schools information                  http://education.data.gov.uk/id/school/401874               rdf:type      sch...
Role in data set publication   well suited to describing things       schools, companies, animal species, music tracks, ...
Approach 1 : Data catalogues   treat the dataset as a single resource, identify with a URI   provide metadata as linked ...
Approach 2 : Fine grain publication   publish the data set itself as linked data       entities, terms, individual recor...
bathing water quality                                              what we do...                            start of seaso...
how linkable data helps             Tenby             Tourist Information Centre             Unit 2 , The Gateway Complex ...
Publishing the Bathing Water Quality data set                             Bathing           Sampling          Zones Of    ...
Data cube vocabulary   collaborative development    sponsored by data.gov.uk   simple, flexible vocabulary   mirrors co...
Data cube model    A set of observations     indexed by dimensions     describing measures     interpreted according to...
Data cube vocabulary1. Top level   DataSet                        qb:DataStructureDefinition                             ...
Data cube vocabulary1. Top level   DataSet                               qb:DataStructureDefinition                      ...
Data cube vocabulary1. Top level   DataSet                               qb:DataStructureDefinition                      ...
Data cube vocabulary2. Data Structure Definition   explicit definition of cube                                       qb:D...
Bathing Water Quality cubes   measures       total coliform count, entero virus count, ...       sample classification...
Everything has a URI                      Selected Lists and                       Individual Bathing Waters             ...
Data Platform and Applications  Web of Linked Data                       http://environment.data.gov.uk/lab/bwq-os.html
Outcomes   bathing water quality information available       as both data set and set of web APIs       updated weekly ...
Wrapping upimage: erika g. @ flickr.com
Lessons   importance of reference identifiers   developer accessibility       linked data API   publish once, consume ...
Acknowledgements   Alex Coley (Environment Agency)       for slides 17, 18, and for sponsoring the bathing water quality...
fin.                                         fin.image: Christian Haugen @ flickr.com
Spare
Linked data principles   Use URIs as names for things   Use HTTP URIs so that people can look up those names   When som...
Linked open data cloud: 2007Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked open data cloud: 2009Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked open data cloud: 2010Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Accessing all this data   link following       HTTP GET, follow links, aggregate relevant statements   query       SPA...
SPARQL   core idea is pattern matching       graph patterns with variables       any subgraph which matches yields row ...
Accessing all this data   link following       HTTP GET, follow links, aggregate relevant statements   query       SPA...
Semantic web layer cake
Data.gov.ukvisualizations on top of linked data
Data.gov.uk – linked datasets and APIs
Using linked data for dataset publication
Using linked data for dataset publication
Upcoming SlideShare
Loading in …5
×

Using linked data for dataset publication

918 views
834 views

Published on

Presentation to Liber IDCC workshop on metadata for dataset reuse.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
918
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Context about bathing water quality
  • context
  • Using linked data for dataset publication

    1. 1. Semantic web and linked data for data set publication Dave Reynolds, Epimorphics Ltd @der42
    2. 2. Outline Background on linked data Roles in data set publishing Case study: Environment Agency Lessons
    3. 3. Linked data background
    4. 4. Linked data ... publishing data on the web ... ... to enable integration, linking and reuse across silos
    5. 5. Linked dataApply the principles to the web to publication of dataThe linked data web:  is a global network of things  each identified by a URI  fetching a URI gives a set of statements in RDF  things connected by typed links  open, anyone can say anything about anything elseLinked data is “data you can click on”
    6. 6. Example schools information http://education.data.gov.uk/id/school/401874
    7. 7. Example schools information http://education.data.gov.uk/id/school/401874 a School label phase district “Secondary”“Cardiff High School” “Cardiff”
    8. 8. Example schools information http://education.data.gov.uk/id/school/401874 a school:School phase label district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff”
    9. 9. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT rdfs:label “Cardiff”
    10. 10. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff” http://data.ordnancesurvey.co.uk/id/7000000000025484 admingeo:ward spatial:extent admingeo:parish GML: 310499.4 184176.6 310476.5 ...
    11. 11. Example schools information http://education.data.gov.uk/id/school/401874 rdf:type school:School rdfs:label school:phase school:district school:PhaseOfEducation_Secondary“Cardiff High School” http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff” owl:sameAs http://data.ordnancesurvey.co.uk/id/7000000000025484 admingeo:ward spatial:extent admingeo:parish GML: 310499.4 184176.6 310476.5 ...
    12. 12. Role in data set publication well suited to describing things  schools, companies, animal species, music tracks, tv programmes ... what about datasets?  environmental measurements, experimental results, statistical analyses ...
    13. 13. Approach 1 : Data catalogues treat the dataset as a single resource, identify with a URI provide metadata as linked data  descriptive  categorical  technical and structuralBenefits?  separate of metadata from resource & repository  easy aggregation of metadata into catalogues  schema-less enables use-specific annotations and links  use of sharable category schemes and reference data=> support for discovery
    14. 14. Approach 2 : Fine grain publication publish the data set itself as linked data  entities, terms, individual records in data identified by URIs  data set structure and ontologies linked from data  still include dataset metadataBenefits?  all benefits of approach 1 to support discovery  self-describing  data slices addressable (trace back, provenance, annotation)  integration across sets - reuse of terms for dimensions, units, values  fine grained access=> integration, comparison, context, data as a service
    15. 15. bathing water quality what we do... start of season 15th May Press interest bathing seasonwhat information 20-22 samples in 22weeksis relevant to the publicabout beaches 30th Sept annual report what November we do December
    16. 16. how linkable data helps Tenby Tourist Information Centre Unit 2 , The Gateway Complex Tenby. Wales , SA70 7LT Tel: 01834 842 402 Fax: 01834 845 439 Email: tenby.tic@pembrokeshire.gov.uk Photo by Skellig2008 (flickr)
    17. 17. Publishing the Bathing Water Quality data set Bathing Sampling Zones Of Assessment Vocabularies Waters Points Influence s e.g. http://location.data.gov.uk/def/ef/SampingPoint URI Set Bathing Sampling Zone OfReference Data Waters Points Influence e.g. http://location.data.gov.uk/so/ef/SamplingPoint/bwsp.eaew Assessme http://environment.data.gov.uk/data/bathing-water-quality Observation nt Datasets void:subset void:subset In-season Annual Weekly .../compliance Complianc .../in-season Assessme e nt
    18. 18. Data cube vocabulary collaborative development sponsored by data.gov.uk simple, flexible vocabulary mirrors core information models from:  SDMX (Statistical Data and Metadata eXchange)  DDI (Data Documentation Initiative) extension to SCOVO vocabularyimage: dullhunk @ flickr
    19. 19. Data cube model A set of observations  indexed by dimensions  describing measures  interpreted according to attributes(e.g. region) dimension measure(s) attributes • population unit of measure = count = 32,567 status = preliminary ... dimension (e.g. time)
    20. 20. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure qb:DataSet qb:SliceKey qb:slice qb:sliceStructure qb:dataset qb:Slice qb:subSlice qb:observation qb:Observation dimension values measure value(s) attribute values
    21. 21. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure Observation qb:DataSet qb:SliceKey  measured values, at dimensions qb:slice qb:sliceStructure qb:dataset with attributes qb:Slice  direct link to DataSet qb:subSlice qb:observation qb:Observation dimension values measure value(s) attribute values
    22. 22. Data cube vocabulary1. Top level DataSet qb:DataStructureDefinition qb:component  provenance and metadata qb:sliceKey  structure qb:structure Observation qb:DataSet qb:SliceKey  measured values, at dimensions qb:slice qb:sliceStructure qb:dataset with attributes qb:Slice  direct link to DataSet qb:subSlice Slice qb:observation qb:Observation  optional grouping by fixing dimensions dimension values measure value(s) attribute values  guide to presentation  allows for abbreviated data
    23. 23. Data cube vocabulary2. Data Structure Definition explicit definition of cube qb:DataSet structure, inline in the data qb:structure enables qb:DataStructureDefinition  validation qb:component  visualization  discovery qb:ComponentSpecification  abbreviation qb:componentRequired qb:componentAttachment qb:order qb:dimension qb:measure qb:attribute
    24. 24. Bathing Water Quality cubes measures  total coliform count, entero virus count, ...  sample classification dimensions  sampling point  sampling week  sampling year attributes  abnormal weather
    25. 25. Everything has a URI  Selected Lists and Individual Bathing Waters  Lists and Individual Assessments  In-Season or Annual Compliance  Vocabulary Terms  Datasets (and subsets)  Presented as:  HTML, (for people)  JSON, XML, RDF and CSV (for programs)
    26. 26. Data Platform and Applications Web of Linked Data http://environment.data.gov.uk/lab/bwq-os.html
    27. 27. Outcomes bathing water quality information available  as both data set and set of web APIs  updated weekly (in season) third party applications to use and combine the data seed a web of environmental and location data  reference identifiers can be reused for related information  URI patterns designed to be compatible with INSPIRE
    28. 28. Wrapping upimage: erika g. @ flickr.com
    29. 29. Lessons importance of reference identifiers developer accessibility  linked data API publish once, consume many ways importance of maintenance and QoS expectation reusable patterns:  reusable vocabularies - Data Cube, org ...  URI patterns  provenance – OMPV and specializations incremental approach
    30. 30. Acknowledgements Alex Coley (Environment Agency)  for slides 17, 18, and for sponsoring the bathing water quality data publication Stuart Williams  developer of the bathing water application and slides 19,27,28 John Sheridan (The National Archive)  for sponsoring the development of data cube Richard Cyganiak, Jeni Tennison  co-developers of the data cube vocabulary
    31. 31. fin. fin.image: Christian Haugen @ flickr.com
    32. 32. Spare
    33. 33. Linked data principles Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover more things Pattern of application of semantic web stack
    34. 34. Linked open data cloud: 2007Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
    35. 35. Linked open data cloud: 2009Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
    36. 36. Linked open data cloud: 2010Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
    37. 37. Accessing all this data link following  HTTP GET, follow links, aggregate relevant statements query  SPARQL
    38. 38. SPARQL core idea is pattern matching  graph patterns with variables  any subgraph which matches yields row of bindings ont:districtAdministrative rdfs:label ?school [] “Cardiff” syntax based on Turtle syntax for RDF web API endpoints lots of power  filters  sub-queries  federated query  optionals  property chains  update  named graphs  aggregation  construct
    39. 39. Accessing all this data link following  HTTP GET, follow links, aggregate relevant statements query  SPARQL linked data API  RESTful API onto linked data resources  simple query, usable without RDF stack, web dev friendly  easy to layer visualizations and UIs on top third parties  search engines and aggregators e.g. Sindice, sameAs.org
    40. 40. Semantic web layer cake
    41. 41. Data.gov.ukvisualizations on top of linked data
    42. 42. Data.gov.uk – linked datasets and APIs

    ×