Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Industrialized Linked Data

2,174 views

Published on

Presentation for SemTechBiz 2012 on use of Linked Data in UK public sector, using Environment Agency as a reference use case.

Published in: Technology
  • stephaineweah@live.com

    Hello dear,my name is stephaine weah i am so happy to read your profile at (www.slideshare.net),please i will like us to know each other and to establish a strong relationship.contact me on my email (stephaineweah@live.com) to enable me share my picture with you and for you to know me more in further comminication.have a nice day.

    miss stephaine.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Very nice! I just added it to my resource page for linked data in enterprises: http://kerfors.blogspot.se/p/linked-data-resources_10.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Industrialized Linked Data

  1. 1. Industrialized Linked Data Dave Reynolds, Epimorphics Ltd @der42
  2. 2. Context: public sector Linked Data
  3. 3. Linked Data journey ... explore what is linked data? what use it is for us?
  4. 4. Linked Data journey ... explore what is linked data? what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ...
  5. 5. Linked Data journey ... explore what is linked data? what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ... what’s involved?
  6. 6. Linked Data journey ... explore pilot data model convert publish applyPhoto of The Thinker © dSeneste.dk@flicker CC BY
  7. 7. Linked Data journey ... explore pilot routine?Great pilot but ... can we reduce the time and cost? how do we handle changes and updates? how can we make the published data easier to use?How do we make Linked Data “business as usual”?
  8. 8. Example case study: Environment Agency monitoring of bathing water quality static pilot live pilot  historic annual assessments  weekly assessments operational system  additional data feeds  live update  integrated API  data explorer
  9. 9. From pilot to practice reduce modelling costs  patterns dive1  reuse handling change and update  patterns  publication process automation  conversion  publication embed in the business process  use internally as well as externally  publish once, use many  data platform
  10. 10. Reduce costs - modelling1. Don’t do it  map source data into isomorphic RDF, synthesize URIs  loses some of the value proposition2. Reuse existing ontologies intact or mix-and-match  best solution when available  W3C GLD work on vocabularies – people, organizations, datasets ...3. Reusable vocabulary patterns  example:  Data cube plus reference URI sets  adaptable to broad range of data – environmental, statistical, financial ...
  11. 11. Reusable patterns: Data cube Much public sector data has regularities  set of measures  observations, forecasts, budgets, assessments, statistics ... >0.1 34 27 good excellent poor good 125
  12. 12. Reusable patterns: Data cube Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ... objective code cost centre 12 15 25measure: spend 8 9 11 120 130 180 time
  13. 13. Reusable patterns: Data cube Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ...  interpreted according to attributes  units, multipliers, status objective code cost centre provisional $12k $15k $25kmeasure: spend $8k $9k $11k final $120k $130k $180k time
  14. 14. Data cube vocabulary
  15. 15. Data cube pattern Pattern, not a fixed ontology  customize by selecting measures, dimensions and attributes  originated in publishing of statistics  applied to environment measurements, weather forecasts, budgets and spend, quality assessments, regional demographics ... Supports reuse  widely reusable URI sets – geography, time periods, agencies, units  organization-wide sets  modelling often only requires small increments on top of core pattern and reusable components opens door for reusable visualization tools standardization through W3C GLD
  16. 16. Application to case study Data Cubes for water quality measurement  in-season weekly assessments  end of season annual assessments dimensions:  time intervals – UK reference time service  location - reference URI set for bathing waters and sample pts cubes can reuse these dimensions  just need to define specific measures
  17. 17. From pilot to practice reduce modelling costs  patterns  reuse handling change and update  patterns dive 2  publication process automation  conversion  publication embed in the business process  use internally as well as externally  publish once, use many  data platform
  18. 18. Handling change critical challenge  most initial pilots choose a snapshot dataset  and go stale, fast  understanding the nature of data updates and how to handle them is critical to successful scaling to business as usual types of change  new data related to different time period  corrections to data  entities change  properties  identity
  19. 19. Modelling change1. Individual data items relate to new time periodPattern: n-ary relation  observation resource relates value to time period and other context  use Data Cube dimensions for this bwq:sampleYear bwq:bathingWater http://reference.data.gov.uk/id/year/2009http://environment.data.gov. uk/id/bathing- bwq:classification Higher water/ukk1202-36000 bwq:sampleYear Clevedon Beach http://reference.data.gov.uk/id/year/2010 bwq:classification Minimum bwq:sampleYear http://reference.data.gov.uk/id/year/2011 bwq:classification HigherHistory or latest?  latest is non-monotonic but helpful for many practical uses  materialize (SPARQL Update), implement in query, implement in API  choice whether to keep history as well  water quality v. weather forecasts
  20. 20. Modelling change2. Corrections patterns  silent change (!)  explicit replacement  API level hides replaced values but SPARQL query can retrieve & trace  explicit change event bwq:sampleYearhttp://environment.data.gov. bwq:bathingWater classification : Higher http://reference.data.gov.uk/id/year/2011 uk/id/bathing- water/ukk1202-36000 dct:isReplacedBy ev:after Clevedon Beach dct:replaces ev:occuredOn classification : Minimum status: replaced analysis event reason: reanalysis ev:before ev:agent
  21. 21. Modelling change3. Mutation Infrequent change of properties, essential identity remains  e.g. renaming a school, adding another building  routine accesses see property value, not function of time patterns  in place update  named graphs  current graph + graphs for each previous state + meta-graph  explicit versioning with open periods
  22. 22. Modelling change3. Mutationexplicit versioning with open periods dct:hasVersion dct:hasVersion endurant “Clevedon Beach” “Clevedon Sands” time:intervalStarts time:intervalStarts dct:valid 2003 dct:valid 2011 2011 time:intervalFinishes  find right version by query on validity interval  simplify use through  non-monotonic “latest value” link  API to implement query filters automatically
  23. 23. Application to case study weekly and annual samples  use Data Cube pattern (n-ary relation) withdrawn samples  replacement pattern (no explicit change event)  Data Cube slice for “latest valid assessment”  generated by a SPARQL Update query  API gives easy access to the latest valid values  linked data following or raw SPARQL query allows drilling into changes changes to bathing water profile  versioning pattern  bathing water entity points to latest profile (SPARQL Update again)
  24. 24. From pilot to practice reduce modelling costs  patterns  reuse handling change and update  patterns  publication process automation  conversion dive 3  publication embed in the business process  use internally as well as externally  publish once, use many  data platform
  25. 25. AutomationTransform and publish data feed increments  transformation engine service  reusable mappings, low cost to adapt to new feeds  linking to reference data  publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  26. 26. Transformation service declarative specification of transform  single service support range of transformations  easy to adapt transformation to new feeds and modelling changes R2RML – RDB to RDF Mapping Language  specify mapping from database tables to RDF triples  W3C candidate recommendation D2RML  R2RML extension to treat CSV feed as a database table
  27. 27. Small D2RML example:dataSource a dr:CSVDataSource ; rdfs:label "dataSource" .:bathingWaterTermMap a dr:SubjectMap; dr:template "http://environment.data.gov.uk/id/bathing-water/{EUBWID2}" ; dr:class def-bw:BathingWater .:bathingWaterMap dr:logicalTable :dataSource ; dr:subjectMap :bathingWaterTermMap ; dr:predicateObjectMap [ dr:predicate rdfs:label ; dr:objectMap [dr:column "description_english" ; dr:language "en" ] ] dr:predicateObjectMap [ dr:predicate def-bw:eubwidNotation; dr:objectMap [ dr:column "EUBWID2"; dr:datatype def-bw:eubwid ] ] .
  28. 28. Using patterns problems with verbosity, increases reuse costs extend to support modelling patterns Data Cube  specify mapping to observation with measures and dimensions  engine generates Data Set and Data Structure Definition automatically
  29. 29. D2RML cube map example:dataCubeMap a dr:DataCubeMap ; rr:logicalTable “dataSource”; dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ; dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ; Instances will dr:observationMap [ automatically link to rr:subjectMap [ base Data Set rr:termType rr:IRI ; rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ; rr:componentMap [ Implies an entry in the Data dr:componentType qb:measure ; Structure Definition which is rr:predicate aq:concentration ; auto-generated rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ] ] ; ... Define how measure value is to be represented
  30. 30. But what about linking? connect observations to reference data  a core value of linked data R2RML has Term Maps to create values  constants and templates extend to allow maps based on other data sources  Lookup map  lookup resource in a store, fetch predicate  Reconcile  specify lookup in a remote service  use Google Refine reconciliation API
  31. 31. AutomationTransform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  32. 32. Publication service goals  cope with non-monotonic effects of change representation  so replication is robust and cheap (=> make it idempotent) solution  SPARQL Update  publish transformed increment as a simple DATA INSERT  then run SPARQL Update script for non-monotonic links  dct:replacedBy links  lastest value slices
  33. 33. Sample update scriptDELETE { ?bw bwq:latestComplianceAssessment ?o .} WHERE { ?bw bwq:latestComplianceAssessment ?o .}INSERT { ?bw bwq:latestComplianceAssessment ?o .} WHERE { { ?slice a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year]. OPTIONAL { ?slice2 a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year2]. FILTER (?year2 > ?year) } FILTER ( !bound(?slice2) ) } ?slice qb:observation ?o . ?o bwq:bathingWater ?bw.}
  34. 34. AutomationTransform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes  publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  35. 35. Application to case study Update server  transforms based on scripts (earlier scripting utility)  linking to reference data  distributed publication via SPARQL Update  extensible range of data sets  annual assessments  in-season assessments  bathing water profile  features (e.g. pollution sources)  reference data
  36. 36. From pilot to practice reduce modelling costs  patterns  reuse handling change and update  patterns  publication process automation  conversion  publication embed in the business process dive 4  use internally as well as externally  publish once, use many  data platform
  37. 37. Embed in business process embedding is critical to ensure data kept up to date in turn needs usage=> lower barrier to use external use data not used rich, up to date invest data data goes hard to stale justify internal use
  38. 38. Lowering barrier to use simple REST APIs  use Linked Data API specification  rich query without learning SPARQL  easy consumption as JSON, XML  gets developers used to data and data model publication LD API service transform service
  39. 39. Application to case study embedded in process for weekly/daily updates infrastructure to automate conversion and publishing API plus extensive developer documentation third party and in-house applications built over API publish once, use many information products as applications over a data platform, usable externally as well as internally
  40. 40. The next stage grow range of data publications and uses range of reference data and sets brings new challenges  discover reference terms and models to reuse  discover datasets to use for application  discover models and links between sets needs a coordination or registry service story for another day ...
  41. 41. Conclusions illustrated how public sector users of linked are moving from static pilots to operational systems keys are:  reduce modelling costs through patterns and reuse  design for continuous update  automation of publication using declarative mappings and SPARQL Update  lower barrier to use through API design and documentation  embed in organization’s process so the data is used and usefulAcknowledgementsOnly possible thanks to many smart colleagues: StuartWilliams, Andy Seaborne, Ian Dickinson, Brian McBride,Chris Dollinplus Alex Coley and team from the Environment Agency

×