Successfully reported this slideshow.
Your SlideShare is downloading. ×

Industrialized Linked Data

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 41 Ad
Advertisement

More Related Content

Similar to Industrialized Linked Data (20)

Recently uploaded (20)

Advertisement

Industrialized Linked Data

  1. 1. Industrialized Linked Data Dave Reynolds, Epimorphics Ltd @der42
  2. 2. Context: public sector Linked Data
  3. 3. Linked Data journey ... explore  what is linked data?  what use it is for us?
  4. 4. Linked Data journey ... explore  what is linked data?  what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ...
  5. 5. Linked Data journey ... explore  what is linked data?  what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ...  what’s involved?
  6. 6. Linked Data journey ... explore pilot data model convert publish apply Photo of The Thinker © dSeneste.dk@flicker CC BY
  7. 7. Linked Data journey ... explore pilot routine? Great pilot but ...  can we reduce the time and cost?  how do we handle changes and updates?  how can we make the published data easier to use? How do we make Linked Data “business as usual”?
  8. 8. Example case study: Environment Agency  monitoring of bathing water quality  static pilot  live pilot  historic annual assessments  weekly assessments  operational system  additional data feeds  live update  integrated API  data explorer
  9. 9. From pilot to practice  reduce modelling costs  patterns dive1  reuse  handling change and update  patterns  publication process  automation  conversion  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  10. 10. Reduce costs - modelling 1. Don’t do it  map source data into isomorphic RDF, synthesize URIs  loses some of the value proposition 2. Reuse existing ontologies intact or mix-and-match  best solution when available  W3C GLD work on vocabularies – people, organizations, datasets ... 3. Reusable vocabulary patterns  example:  Data cube plus reference URI sets  adaptable to broad range of data – environmental, statistical, financial ...
  11. 11. Reusable patterns: Data cube  Much public sector data has regularities  set of measures  observations, forecasts, budgets, assessments, statistics ... >0.1 34 27 good excellent poor good 125
  12. 12. Reusable patterns: Data cube  Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ... objective code cost centre 12 15 25 measure: spend 8 9 11 120 130 180 time
  13. 13. Reusable patterns: Data cube  Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ...  interpreted according to attributes  units, multipliers, status objective code cost centre provisional $12k $15k $25k measure: spend $8k $9k $11k final $120k $130k $180k time
  14. 14. Data cube vocabulary
  15. 15. Data cube pattern  Pattern, not a fixed ontology  customize by selecting measures, dimensions and attributes  originated in publishing of statistics  applied to environment measurements, weather forecasts, budgets and spend, quality assessments, regional demographics ...  Supports reuse  widely reusable URI sets – geography, time periods, agencies, units  organization-wide sets  modelling often only requires small increments on top of core pattern and reusable components  opens door for reusable visualization tools  standardization through W3C GLD
  16. 16. Application to case study  Data Cubes for water quality measurement  in-season weekly assessments  end of season annual assessments  dimensions:  time intervals – UK reference time service  location - reference URI set for bathing waters and sample pts  cubes can reuse these dimensions  just need to define specific measures
  17. 17. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns dive 2  publication process  automation  conversion  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  18. 18. Handling change  critical challenge  most initial pilots choose a snapshot dataset  and go stale, fast  understanding the nature of data updates and how to handle them is critical to successful scaling to business as usual  types of change  new data related to different time period  corrections to data  entities change  properties  identity
  19. 19. Modelling change 1. Individual data items relate to new time period Pattern: n-ary relation  observation resource relates value to time period and other context  use Data Cube dimensions for this bwq:sampleYear bwq:bathingWater http://reference.data.gov.uk/id/year/2009 http://environment.data.gov. uk/id/bathing- bwq:classification Higher water/ukk1202-36000 bwq:sampleYear Clevedon Beach http://reference.data.gov.uk/id/year/2010 bwq:classification Minimum bwq:sampleYear http://reference.data.gov.uk/id/year/2011 bwq:classification Higher History or latest?  latest is non-monotonic but helpful for many practical uses  materialize (SPARQL Update), implement in query, implement in API  choice whether to keep history as well  water quality v. weather forecasts
  20. 20. Modelling change 2. Corrections  patterns  silent change (!)  explicit replacement  API level hides replaced values but SPARQL query can retrieve & trace  explicit change event bwq:sampleYear http://environment.data.gov. bwq:bathingWater classification : Higher http://reference.data.gov.uk/id/year/2011 uk/id/bathing- water/ukk1202-36000 dct:isReplacedBy ev:after Clevedon Beach dct:replaces ev:occuredOn classification : Minimum status: replaced analysis event reason: reanalysis ev:before ev:agent
  21. 21. Modelling change 3. Mutation  Infrequent change of properties, essential identity remains  e.g. renaming a school, adding another building  routine accesses see property value, not function of time  patterns  in place update  named graphs  current graph + graphs for each previous state + meta-graph  explicit versioning with open periods
  22. 22. Modelling change 3. Mutation explicit versioning with open periods dct:hasVersion dct:hasVersion endurant “Clevedon Beach” “Clevedon Sands” time:intervalStarts time:intervalStarts dct:valid 2003 dct:valid 2011 2011 time:intervalFinishes  find right version by query on validity interval  simplify use through  non-monotonic “latest value” link  API to implement query filters automatically
  23. 23. Application to case study  weekly and annual samples  use Data Cube pattern (n-ary relation)  withdrawn samples  replacement pattern (no explicit change event)  Data Cube slice for “latest valid assessment”  generated by a SPARQL Update query  API gives easy access to the latest valid values  linked data following or raw SPARQL query allows drilling into changes  changes to bathing water profile  versioning pattern  bathing water entity points to latest profile (SPARQL Update again)
  24. 24. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns  publication process  automation  conversion dive 3  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  25. 25. Automation Transform and publish data feed increments  transformation engine service  reusable mappings, low cost to adapt to new feeds  linking to reference data  publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  26. 26. Transformation service  declarative specification of transform  single service support range of transformations  easy to adapt transformation to new feeds and modelling changes  R2RML – RDB to RDF Mapping Language  specify mapping from database tables to RDF triples  W3C candidate recommendation  D2RML  R2RML extension to treat CSV feed as a database table
  27. 27. Small D2RML example :dataSource a dr:CSVDataSource ; rdfs:label "dataSource" . :bathingWaterTermMap a dr:SubjectMap; dr:template "http://environment.data.gov.uk/id/bathing-water/{EUBWID2}" ; dr:class def-bw:BathingWater . :bathingWaterMap dr:logicalTable :dataSource ; dr:subjectMap :bathingWaterTermMap ; dr:predicateObjectMap [ dr:predicate rdfs:label ; dr:objectMap [dr:column "description_english" ; dr:language "en" ] ] dr:predicateObjectMap [ dr:predicate def-bw:eubwidNotation; dr:objectMap [ dr:column "EUBWID2"; dr:datatype def-bw:eubwid ] ] .
  28. 28. Using patterns  problems with verbosity, increases reuse costs  extend to support modelling patterns  Data Cube  specify mapping to observation with measures and dimensions  engine generates Data Set and Data Structure Definition automatically
  29. 29. D2RML cube map example :dataCubeMap a dr:DataCubeMap ; rr:logicalTable “dataSource”; dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ; dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ; Instances will dr:observationMap [ automatically link to rr:subjectMap [ base Data Set rr:termType rr:IRI ; rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ; rr:componentMap [ Implies an entry in the Data dr:componentType qb:measure ; Structure Definition which is rr:predicate aq:concentration ; auto-generated rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ] ] ; ... Define how measure value is to be represented
  30. 30. But what about linking?  connect observations to reference data  a core value of linked data  R2RML has Term Maps to create values  constants and templates  extend to allow maps based on other data sources  Lookup map  lookup resource in a store, fetch predicate  Reconcile  specify lookup in a remote service  use Google Refine reconciliation API
  31. 31. Automation Transform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  32. 32. Publication service  goals  cope with non-monotonic effects of change representation  so replication is robust and cheap (=> make it idempotent)  solution  SPARQL Update  publish transformed increment as a simple DATA INSERT  then run SPARQL Update script for non-monotonic links  dct:replacedBy links  lastest value slices
  33. 33. Sample update script DELETE { ?bw bwq:latestComplianceAssessment ?o . } WHERE { ?bw bwq:latestComplianceAssessment ?o . } INSERT { ?bw bwq:latestComplianceAssessment ?o . } WHERE { { ?slice a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year]. OPTIONAL { ?slice2 a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year2]. FILTER (?year2 > ?year) } FILTER ( !bound(?slice2) ) } ?slice qb:observation ?o . ?o bwq:bathingWater ?bw. }
  34. 34. Automation Transform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes  publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  35. 35. Application to case study  Update server  transforms based on scripts (earlier scripting utility)  linking to reference data  distributed publication via SPARQL Update  extensible range of data sets  annual assessments  in-season assessments  bathing water profile  features (e.g. pollution sources)  reference data
  36. 36. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns  publication process  automation  conversion  publication  embed in the business process dive 4  use internally as well as externally  publish once, use many  data platform
  37. 37. Embed in business process  embedding is critical to ensure data kept up to date  in turn needs usage => lower barrier to use external use data not used rich, up to date invest data data goes hard to stale justify internal use
  38. 38. Lowering barrier to use  simple REST APIs  use Linked Data API specification  rich query without learning SPARQL  easy consumption as JSON, XML  gets developers used to data and data model publication LD API service transform service
  39. 39. Application to case study  embedded in process for weekly/daily updates  infrastructure to automate conversion and publishing  API plus extensive developer documentation  third party and in-house applications built over API  publish once, use many  information products as applications over a data platform, usable externally as well as internally
  40. 40. The next stage  grow range of data publications and uses  range of reference data and sets brings new challenges  discover reference terms and models to reuse  discover datasets to use for application  discover models and links between sets  needs a coordination or registry service  story for another day ...
  41. 41. Conclusions  illustrated how public sector users of linked are moving from static pilots to operational systems  keys are:  reduce modelling costs through patterns and reuse  design for continuous update  automation of publication using declarative mappings and SPARQL Update  lower barrier to use through API design and documentation  embed in organization’s process so the data is used and useful Acknowledgements Only possible thanks to many smart colleagues: Stuart Williams, Andy Seaborne, Ian Dickinson, Brian McBride, Chris Dollin plus Alex Coley and team from the Environment Agency

×