Industrialized Linked Data     Dave Reynolds, Epimorphics Ltd                            @der42
Context: public sector Linked Data
Linked Data journey ...    explore   what is linked data?   what use it is for us?
Linked Data journey ...    explore   what is linked data?   what use it is for us?                      self-describing...
Linked Data journey ...    explore   what is linked data?   what use it is for us?                      self-describing...
Linked Data journey ...      explore                                        pilot     data                     model      ...
Linked Data journey ...  explore                 pilot              routine?Great pilot but ... can we reduce the time an...
Example case study: Environment Agency   monitoring of bathing    water quality   static pilot   live pilot       hist...
From pilot to practice   reduce modelling costs       patterns                  dive1       reuse   handling change an...
Reduce costs - modelling1. Don’t do it     map source data into isomorphic RDF, synthesize URIs     loses some of the va...
Reusable patterns: Data cube   Much public sector data has regularities       set of measures            observations, ...
Reusable patterns: Data cube   Much public sector data has regularities       sets of measures           observations, ...
Reusable patterns: Data cube   Much public sector data has regularities       sets of measures           observations, ...
Data cube vocabulary
Data cube pattern   Pattern, not a fixed ontology       customize by selecting measures, dimensions and attributes     ...
Application to case study   Data Cubes for water quality measurement       in-season weekly assessments       end of se...
From pilot to practice   reduce modelling costs       patterns       reuse   handling change and update       pattern...
Handling change   critical challenge       most initial pilots choose a snapshot dataset           and go stale, fast  ...
Modelling change1. Individual data items relate to new time periodPattern: n-ary relation        observation resource rel...
Modelling change2. Corrections   patterns        silent change (!)        explicit replacement             API level h...
Modelling change3. Mutation   Infrequent change of properties, essential identity remains     e.g. renaming a school, ad...
Modelling change3. Mutationexplicit versioning with open periods                       dct:hasVersion                   dc...
Application to case study   weekly and annual samples       use Data Cube pattern (n-ary relation)   withdrawn samples ...
From pilot to practice   reduce modelling costs       patterns       reuse   handling change and update       pattern...
AutomationTransform and publish data feed increments    transformation engine service    reusable mappings, low cost to ...
Transformation service   declarative specification of transform       single service support range of transformations   ...
Small D2RML example:dataSource a dr:CSVDataSource ;  rdfs:label "dataSource" .:bathingWaterTermMap a dr:SubjectMap;  dr:te...
Using patterns   problems with verbosity, increases reuse costs   extend to support modelling patterns   Data Cube    ...
D2RML cube map example:dataCubeMap a dr:DataCubeMap ;    rr:logicalTable “dataSource”;    dr:datasetIRI “http://example.or...
But what about linking?   connect observations to reference data       a core value of linked data   R2RML has Term Map...
AutomationTransform and publish data feed increments    transformation engine service     reusable mappings, low cost t...
Publication service   goals       cope with non-monotonic effects of change representation       so replication is robu...
Sample update scriptDELETE {  ?bw bwq:latestComplianceAssessment ?o .} WHERE {  ?bw bwq:latestComplianceAssessment ?o .}IN...
AutomationTransform and publish data feed increments    transformation engine service     reusable mappings, low cost t...
Application to case study   Update server       transforms based on scripts (earlier scripting utility)       linking t...
From pilot to practice   reduce modelling costs       patterns       reuse   handling change and update       pattern...
Embed in business process embedding is critical to ensure data kept up to date in turn needs usage=> lower barrier to us...
Lowering barrier to use   simple REST APIs       use Linked Data API specification       rich query without learning SP...
Application to case study   embedded in process for weekly/daily updates   infrastructure to automate conversion and pub...
The next stage   grow range of data publications and uses   range of reference data and sets brings new challenges     ...
Conclusions   illustrated how public sector users of linked are moving    from static pilots to operational systems   ke...
Upcoming SlideShare
Loading in...5
×

Industrialized Linked Data

1,612

Published on

Presentation for SemTechBiz 2012 on use of Linked Data in UK public sector, using Environment Agency as a reference use case.

Published in: Technology
2 Comments
5 Likes
Statistics
Notes
  • stephaineweah@live.com

    Hello dear,my name is stephaine weah i am so happy to read your profile at (www.slideshare.net),please i will like us to know each other and to establish a strong relationship.contact me on my email (stephaineweah@live.com) to enable me share my picture with you and for you to know me more in further comminication.have a nice day.

    miss stephaine.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Very nice! I just added it to my resource page for linked data in enterprises: http://kerfors.blogspot.se/p/linked-data-resources_10.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,612
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
2
Likes
5
Embeds 0
No embeds

No notes for slide

Industrialized Linked Data

  1. 1. Industrialized Linked Data Dave Reynolds, Epimorphics Ltd @der42
  2. 2. Context: public sector Linked Data
  3. 3. Linked Data journey ... explore what is linked data? what use it is for us?
  4. 4. Linked Data journey ... explore what is linked data? what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ...
  5. 5. Linked Data journey ... explore what is linked data? what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ... what’s involved?
  6. 6. Linked Data journey ... explore pilot data model convert publish applyPhoto of The Thinker © dSeneste.dk@flicker CC BY
  7. 7. Linked Data journey ... explore pilot routine?Great pilot but ... can we reduce the time and cost? how do we handle changes and updates? how can we make the published data easier to use?How do we make Linked Data “business as usual”?
  8. 8. Example case study: Environment Agency monitoring of bathing water quality static pilot live pilot  historic annual assessments  weekly assessments operational system  additional data feeds  live update  integrated API  data explorer
  9. 9. From pilot to practice reduce modelling costs  patterns dive1  reuse handling change and update  patterns  publication process automation  conversion  publication embed in the business process  use internally as well as externally  publish once, use many  data platform
  10. 10. Reduce costs - modelling1. Don’t do it  map source data into isomorphic RDF, synthesize URIs  loses some of the value proposition2. Reuse existing ontologies intact or mix-and-match  best solution when available  W3C GLD work on vocabularies – people, organizations, datasets ...3. Reusable vocabulary patterns  example:  Data cube plus reference URI sets  adaptable to broad range of data – environmental, statistical, financial ...
  11. 11. Reusable patterns: Data cube Much public sector data has regularities  set of measures  observations, forecasts, budgets, assessments, statistics ... >0.1 34 27 good excellent poor good 125
  12. 12. Reusable patterns: Data cube Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ... objective code cost centre 12 15 25measure: spend 8 9 11 120 130 180 time
  13. 13. Reusable patterns: Data cube Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ...  interpreted according to attributes  units, multipliers, status objective code cost centre provisional $12k $15k $25kmeasure: spend $8k $9k $11k final $120k $130k $180k time
  14. 14. Data cube vocabulary
  15. 15. Data cube pattern Pattern, not a fixed ontology  customize by selecting measures, dimensions and attributes  originated in publishing of statistics  applied to environment measurements, weather forecasts, budgets and spend, quality assessments, regional demographics ... Supports reuse  widely reusable URI sets – geography, time periods, agencies, units  organization-wide sets  modelling often only requires small increments on top of core pattern and reusable components opens door for reusable visualization tools standardization through W3C GLD
  16. 16. Application to case study Data Cubes for water quality measurement  in-season weekly assessments  end of season annual assessments dimensions:  time intervals – UK reference time service  location - reference URI set for bathing waters and sample pts cubes can reuse these dimensions  just need to define specific measures
  17. 17. From pilot to practice reduce modelling costs  patterns  reuse handling change and update  patterns dive 2  publication process automation  conversion  publication embed in the business process  use internally as well as externally  publish once, use many  data platform
  18. 18. Handling change critical challenge  most initial pilots choose a snapshot dataset  and go stale, fast  understanding the nature of data updates and how to handle them is critical to successful scaling to business as usual types of change  new data related to different time period  corrections to data  entities change  properties  identity
  19. 19. Modelling change1. Individual data items relate to new time periodPattern: n-ary relation  observation resource relates value to time period and other context  use Data Cube dimensions for this bwq:sampleYear bwq:bathingWater http://reference.data.gov.uk/id/year/2009http://environment.data.gov. uk/id/bathing- bwq:classification Higher water/ukk1202-36000 bwq:sampleYear Clevedon Beach http://reference.data.gov.uk/id/year/2010 bwq:classification Minimum bwq:sampleYear http://reference.data.gov.uk/id/year/2011 bwq:classification HigherHistory or latest?  latest is non-monotonic but helpful for many practical uses  materialize (SPARQL Update), implement in query, implement in API  choice whether to keep history as well  water quality v. weather forecasts
  20. 20. Modelling change2. Corrections patterns  silent change (!)  explicit replacement  API level hides replaced values but SPARQL query can retrieve & trace  explicit change event bwq:sampleYearhttp://environment.data.gov. bwq:bathingWater classification : Higher http://reference.data.gov.uk/id/year/2011 uk/id/bathing- water/ukk1202-36000 dct:isReplacedBy ev:after Clevedon Beach dct:replaces ev:occuredOn classification : Minimum status: replaced analysis event reason: reanalysis ev:before ev:agent
  21. 21. Modelling change3. Mutation Infrequent change of properties, essential identity remains  e.g. renaming a school, adding another building  routine accesses see property value, not function of time patterns  in place update  named graphs  current graph + graphs for each previous state + meta-graph  explicit versioning with open periods
  22. 22. Modelling change3. Mutationexplicit versioning with open periods dct:hasVersion dct:hasVersion endurant “Clevedon Beach” “Clevedon Sands” time:intervalStarts time:intervalStarts dct:valid 2003 dct:valid 2011 2011 time:intervalFinishes  find right version by query on validity interval  simplify use through  non-monotonic “latest value” link  API to implement query filters automatically
  23. 23. Application to case study weekly and annual samples  use Data Cube pattern (n-ary relation) withdrawn samples  replacement pattern (no explicit change event)  Data Cube slice for “latest valid assessment”  generated by a SPARQL Update query  API gives easy access to the latest valid values  linked data following or raw SPARQL query allows drilling into changes changes to bathing water profile  versioning pattern  bathing water entity points to latest profile (SPARQL Update again)
  24. 24. From pilot to practice reduce modelling costs  patterns  reuse handling change and update  patterns  publication process automation  conversion dive 3  publication embed in the business process  use internally as well as externally  publish once, use many  data platform
  25. 25. AutomationTransform and publish data feed increments  transformation engine service  reusable mappings, low cost to adapt to new feeds  linking to reference data  publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  26. 26. Transformation service declarative specification of transform  single service support range of transformations  easy to adapt transformation to new feeds and modelling changes R2RML – RDB to RDF Mapping Language  specify mapping from database tables to RDF triples  W3C candidate recommendation D2RML  R2RML extension to treat CSV feed as a database table
  27. 27. Small D2RML example:dataSource a dr:CSVDataSource ; rdfs:label "dataSource" .:bathingWaterTermMap a dr:SubjectMap; dr:template "http://environment.data.gov.uk/id/bathing-water/{EUBWID2}" ; dr:class def-bw:BathingWater .:bathingWaterMap dr:logicalTable :dataSource ; dr:subjectMap :bathingWaterTermMap ; dr:predicateObjectMap [ dr:predicate rdfs:label ; dr:objectMap [dr:column "description_english" ; dr:language "en" ] ] dr:predicateObjectMap [ dr:predicate def-bw:eubwidNotation; dr:objectMap [ dr:column "EUBWID2"; dr:datatype def-bw:eubwid ] ] .
  28. 28. Using patterns problems with verbosity, increases reuse costs extend to support modelling patterns Data Cube  specify mapping to observation with measures and dimensions  engine generates Data Set and Data Structure Definition automatically
  29. 29. D2RML cube map example:dataCubeMap a dr:DataCubeMap ; rr:logicalTable “dataSource”; dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ; dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ; Instances will dr:observationMap [ automatically link to rr:subjectMap [ base Data Set rr:termType rr:IRI ; rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ; rr:componentMap [ Implies an entry in the Data dr:componentType qb:measure ; Structure Definition which is rr:predicate aq:concentration ; auto-generated rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ] ] ; ... Define how measure value is to be represented
  30. 30. But what about linking? connect observations to reference data  a core value of linked data R2RML has Term Maps to create values  constants and templates extend to allow maps based on other data sources  Lookup map  lookup resource in a store, fetch predicate  Reconcile  specify lookup in a remote service  use Google Refine reconciliation API
  31. 31. AutomationTransform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  32. 32. Publication service goals  cope with non-monotonic effects of change representation  so replication is robust and cheap (=> make it idempotent) solution  SPARQL Update  publish transformed increment as a simple DATA INSERT  then run SPARQL Update script for non-monotonic links  dct:replacedBy links  lastest value slices
  33. 33. Sample update scriptDELETE { ?bw bwq:latestComplianceAssessment ?o .} WHERE { ?bw bwq:latestComplianceAssessment ?o .}INSERT { ?bw bwq:latestComplianceAssessment ?o .} WHERE { { ?slice a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year]. OPTIONAL { ?slice2 a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year2]. FILTER (?year2 > ?year) } FILTER ( !bound(?slice2) ) } ?slice qb:observation ?o . ?o bwq:bathingWater ?bw.}
  34. 34. AutomationTransform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes  publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  35. 35. Application to case study Update server  transforms based on scripts (earlier scripting utility)  linking to reference data  distributed publication via SPARQL Update  extensible range of data sets  annual assessments  in-season assessments  bathing water profile  features (e.g. pollution sources)  reference data
  36. 36. From pilot to practice reduce modelling costs  patterns  reuse handling change and update  patterns  publication process automation  conversion  publication embed in the business process dive 4  use internally as well as externally  publish once, use many  data platform
  37. 37. Embed in business process embedding is critical to ensure data kept up to date in turn needs usage=> lower barrier to use external use data not used rich, up to date invest data data goes hard to stale justify internal use
  38. 38. Lowering barrier to use simple REST APIs  use Linked Data API specification  rich query without learning SPARQL  easy consumption as JSON, XML  gets developers used to data and data model publication LD API service transform service
  39. 39. Application to case study embedded in process for weekly/daily updates infrastructure to automate conversion and publishing API plus extensive developer documentation third party and in-house applications built over API publish once, use many information products as applications over a data platform, usable externally as well as internally
  40. 40. The next stage grow range of data publications and uses range of reference data and sets brings new challenges  discover reference terms and models to reuse  discover datasets to use for application  discover models and links between sets needs a coordination or registry service story for another day ...
  41. 41. Conclusions illustrated how public sector users of linked are moving from static pilots to operational systems keys are:  reduce modelling costs through patterns and reuse  design for continuous update  automation of publication using declarative mappings and SPARQL Update  lower barrier to use through API design and documentation  embed in organization’s process so the data is used and usefulAcknowledgementsOnly possible thanks to many smart colleagues: StuartWilliams, Andy Seaborne, Ian Dickinson, Brian McBride,Chris Dollinplus Alex Coley and team from the Environment Agency
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×