SlideShare a Scribd company logo
Industrialized Linked Data

     Dave Reynolds, Epimorphics Ltd
                            @der42
Context: public sector Linked Data
Linked Data journey ...

    explore
   what is linked data?
   what use it is for us?
Linked Data journey ...

    explore
   what is linked data?
   what use it is for us?

                      self-describing                Integration
                      carries semantics with it      comparable
                      annotate and explain           slice and dice
                      data in context                web API
                      ...                            ...
Linked Data journey ...

    explore
   what is linked data?
   what use it is for us?

                      self-describing                Integration
                      carries semantics with it      comparable
                      annotate and explain           slice and dice
                      data in context                web API
                      ...                            ...
   what’s involved?
Linked Data journey ...

      explore                                        pilot


     data                     model                convert   publish   apply




Photo of The Thinker © dSeneste.dk@flicker CC BY
Linked Data journey ...

  explore                 pilot              routine?
Great pilot but ...
 can we reduce the time and cost?
 how do we handle changes and updates?
 how can we make the published data easier to use?


How do we make Linked Data “business as usual”?
Example case study: Environment Agency
   monitoring of bathing
    water quality
   static pilot
   live pilot
       historic annual
        assessments
       weekly assessments
   operational system
       additional data feeds
       live update
       integrated API
       data explorer
From pilot to practice
   reduce modelling costs
       patterns                  dive1
       reuse
   handling change and update
       patterns
       publication process
   automation
       conversion
       publication
   embed in the business process
       use internally as well as externally
       publish once, use many
       data platform
Reduce costs - modelling
1. Don’t do it
     map source data into isomorphic RDF, synthesize URIs
     loses some of the value proposition
2. Reuse existing ontologies intact or mix-and-match
     best solution when available
     W3C GLD work on vocabularies – people, organizations,
      datasets ...
3. Reusable vocabulary patterns
     example:
         Data cube plus reference URI sets
         adaptable to broad range of data – environmental, statistical,
          financial ...
Reusable patterns: Data cube
   Much public sector data has regularities
       set of measures
            observations, forecasts, budgets, assessments, statistics ...




                    >0.1                   34


                           27               good
        excellent
                                                                     poor
                            good                   125
Reusable patterns: Data cube
   Much public sector data has regularities
       sets of measures
           observations, forecasts, budgets, assessments, estimates ...
       organized along some dimensions
           region, agency, time, category, cost centre ...




              objective code             cost centre


                               12   15           25
measure: spend
                               8     9           11
                           120      130         180
                                                          time
Reusable patterns: Data cube
   Much public sector data has regularities
       sets of measures
           observations, forecasts, budgets, assessments, estimates ...
       organized along some dimensions
           region, agency, time, category, cost centre ...
       interpreted according to attributes
           units, multipliers, status

              objective code             cost centre

                                                              provisional
                           $12k      $15k      $25k
measure: spend
                               $8k    $9k      $11k
                                                                 final
                          $120k      $130k     $180k
                                                          time
Data cube vocabulary
Data cube pattern
   Pattern, not a fixed ontology
       customize by selecting measures, dimensions and attributes
       originated in publishing of statistics
       applied to environment measurements, weather forecasts, budgets
        and spend, quality assessments, regional demographics ...
   Supports reuse
       widely reusable URI sets – geography, time periods, agencies, units
       organization-wide sets
       modelling often only requires small increments on top of core
        pattern and reusable components
   opens door for reusable visualization tools
   standardization through W3C GLD
Application to case study
   Data Cubes for water quality measurement
       in-season weekly assessments
       end of season annual assessments
   dimensions:
       time intervals – UK reference time service
       location - reference URI set for bathing waters and sample pts
   cubes can reuse these dimensions
       just need to define specific measures
From pilot to practice
   reduce modelling costs
       patterns
       reuse
   handling change and update
       patterns                               dive 2
       publication process
   automation
       conversion
       publication
   embed in the business process
       use internally as well as externally
       publish once, use many
       data platform
Handling change
   critical challenge
       most initial pilots choose a snapshot dataset
           and go stale, fast
       understanding the nature of data updates and how to handle
        them is critical to successful scaling to business as usual
   types of change
       new data related to different time period
       corrections to data
       entities change
           properties
           identity
Modelling change
1. Individual data items relate to new time period
Pattern: n-ary relation
        observation resource relates value to time period and other context
        use Data Cube dimensions for this
                                                  bwq:sampleYear
                               bwq:bathingWater                        http://reference.data.gov.uk/id/year/2009
http://environment.data.gov.
        uk/id/bathing-                            bwq:classification    Higher
    water/ukk1202-36000
                                                  bwq:sampleYear
    Clevedon Beach                                                     http://reference.data.gov.uk/id/year/2010
                                                  bwq:classification
                                                                       Minimum

                                                  bwq:sampleYear
                                                                       http://reference.data.gov.uk/id/year/2011

                                                  bwq:classification
                                                                        Higher

History or latest?
        latest is non-monotonic but helpful for many practical uses
             materialize (SPARQL Update), implement in query, implement in API
        choice whether to keep history as well
             water quality v. weather forecasts
Modelling change
2. Corrections
   patterns
        silent change (!)
        explicit replacement
             API level hides replaced values but SPARQL query can retrieve & trace
        explicit change event

                                                                               bwq:sampleYear
http://environment.data.gov.   bwq:bathingWater
                                                   classification : Higher                      http://reference.data.gov.uk/id/year/2011
        uk/id/bathing-
    water/ukk1202-36000
                                                                dct:isReplacedBy          ev:after
    Clevedon Beach                            dct:replaces
                                                                                                                        ev:occuredOn
                                                  classification : Minimum
                                                      status: replaced
                                                                                                     analysis event
                                                     reason: reanalysis
                                                                             ev:before                                      ev:agent
Modelling change
3. Mutation
   Infrequent change of properties, essential identity remains
     e.g. renaming a school, adding another building
     routine accesses see property value, not function of time
   patterns
     in place update
     named graphs
           current graph + graphs for each previous state + meta-graph
       explicit versioning with open periods
Modelling change
3. Mutation
explicit versioning with open periods
                       dct:hasVersion                   dct:hasVersion
                                             endurant




                “Clevedon Beach”                            “Clevedon Sands”

                           time:intervalStarts                        time:intervalStarts
               dct:valid                         2003     dct:valid                         2011

                                                 2011
                           time:intervalFinishes



     find right version by query on validity interval
     simplify use through
         non-monotonic “latest value” link
         API to implement query filters automatically
Application to case study
   weekly and annual samples
       use Data Cube pattern (n-ary relation)
   withdrawn samples
       replacement pattern (no explicit change event)
       Data Cube slice for “latest valid assessment”
           generated by a SPARQL Update query
       API gives easy access to the latest valid values
       linked data following or raw SPARQL query allows drilling into changes
   changes to bathing water profile
       versioning pattern
       bathing water entity points to latest profile (SPARQL Update again)
From pilot to practice
   reduce modelling costs
       patterns
       reuse
   handling change and update
       patterns
       publication process
   automation
       conversion                             dive 3
       publication
   embed in the business process
       use internally as well as externally
       publish once, use many
       data platform
Automation
Transform and publish data feed increments
    transformation engine service
    reusable mappings, low cost to adapt to new feeds
    linking to reference data
    publication service that supports non-monotonic changes




                                                           publication
                                                             service
     data increments (csv)                 transform
                                             service


                                                                         replicated
                             xform xform         reconciliation
                                xform
                             spec. spec.                                 publication
                                spec.               service
                                                                           servers

                                                   Reference data
Transformation service
   declarative specification of transform
       single service support range of transformations
       easy to adapt transformation to new feeds and modelling
        changes
   R2RML – RDB to RDF Mapping Language
       specify mapping from database tables to RDF triples
       W3C candidate recommendation
   D2RML
       R2RML extension to treat CSV feed as a database table
Small D2RML example
:dataSource a dr:CSVDataSource ;
  rdfs:label "dataSource" .

:bathingWaterTermMap a dr:SubjectMap;
  dr:template "http://environment.data.gov.uk/id/bathing-water/{EUBWID2}" ;
  dr:class def-bw:BathingWater .

:bathingWaterMap
  dr:logicalTable :dataSource ;
  dr:subjectMap   :bathingWaterTermMap ;

  dr:predicateObjectMap [
    dr:predicate rdfs:label ;
    dr:objectMap [dr:column "description_english" ;   dr:language "en"   ] ]

  dr:predicateObjectMap [
    dr:predicate def-bw:eubwidNotation;
    dr:objectMap [ dr:column "EUBWID2"; dr:datatype def-bw:eubwid    ]   ] .
Using patterns
   problems with verbosity, increases reuse costs
   extend to support modelling patterns
   Data Cube
       specify mapping to observation with measures and dimensions
       engine generates Data Set and Data Structure Definition
        automatically
D2RML cube map example
:dataCubeMap a dr:DataCubeMap ;
    rr:logicalTable “dataSource”;
    dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ;
    dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ;

                                                            Instances will
    dr:observationMap [                                  automatically link to
     rr:subjectMap [                                        base Data Set
        rr:termType rr:IRI ;
        rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ;
        rr:componentMap [
                                              Implies an entry in the Data
          dr:componentType qb:measure ;
                                              Structure Definition which is
          rr:predicate aq:concentration ;
                                                    auto-generated
          rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ]
         ] ;
        ...                                     Define how measure
                                                    value is to be
                                                    represented
But what about linking?
   connect observations to reference data
       a core value of linked data
   R2RML has Term Maps to create values
       constants and templates
   extend to allow maps based on other data sources
       Lookup map
           lookup resource in a store, fetch predicate
       Reconcile
           specify lookup in a remote service
           use Google Refine reconciliation API
Automation
Transform and publish data feed increments
    transformation engine service 
    reusable mappings, low cost to adapt to new feeds 
    linking to reference data 
    publication service that supports non-monotonic changes




                                                           publication
                                                             service
     data increments (csv)                 transform
                                             service


                                                                         replicated
                             xform xform         reconciliation
                                xform
                             spec. spec.                                 publication
                                spec.               service
                                                                           servers

                                                   Reference data
Publication service
   goals
       cope with non-monotonic effects of change representation
       so replication is robust and cheap (=> make it idempotent)
   solution
       SPARQL Update
       publish transformed increment as a simple DATA INSERT
       then run SPARQL Update script for non-monotonic links
           dct:replacedBy links
           lastest value slices
Sample update script
DELETE {
  ?bw bwq:latestComplianceAssessment ?o .
} WHERE {
  ?bw bwq:latestComplianceAssessment ?o .
}



INSERT {
   ?bw bwq:latestComplianceAssessment ?o .
} WHERE {
 {
   ?slice a bwq:ComplianceByYearSlice;    bwq:sampleYear [interval:ordinalYear ?year].
   OPTIONAL {
     ?slice2 a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year2].
        FILTER (?year2 > ?year)
      } FILTER ( !bound(?slice2) )
   }
   ?slice qb:observation ?o .

    ?o bwq:bathingWater ?bw.
}
Automation
Transform and publish data feed increments
    transformation engine service 
    reusable mappings, low cost to adapt to new feeds 
    linking to reference data 
    publication service that supports non-monotonic changes 




                                                           publication
                                                             service
     data increments (csv)                 transform
                                             service


                                                                         replicated
                             xform xform         reconciliation
                                xform
                             spec. spec.                                 publication
                                spec.               service
                                                                           servers

                                                   Reference data
Application to case study
   Update server
       transforms based on scripts (earlier scripting utility)
       linking to reference data
       distributed publication via
        SPARQL Update
       extensible range of data sets
             annual assessments
             in-season assessments
             bathing water profile
             features (e.g. pollution sources)
             reference data
From pilot to practice
   reduce modelling costs
       patterns
       reuse
   handling change and update
       patterns
       publication process
   automation
       conversion
       publication
   embed in the business process              dive 4
       use internally as well as externally
       publish once, use many
       data platform
Embed in business process
 embedding is critical to ensure data kept up to date
 in turn needs usage
=> lower barrier to use                   external
                                                   use



                  data not
                   used                rich, up
                                       to date               invest
                                         data



      data goes              hard to
        stale                justify
                                                  internal
                                                    use
Lowering barrier to use
   simple REST APIs
       use Linked Data API specification
       rich query without learning SPARQL
       easy consumption as JSON, XML
       gets developers used to data and data model
                    publication




                                            LD API
                      service




        transform
          service
Application to case study
   embedded in process for weekly/daily updates
   infrastructure to automate conversion and publishing
   API plus extensive developer documentation
   third party and in-house applications built over API




   publish once, use many
   information products as applications over a data platform,
    usable externally as well as internally
The next stage
   grow range of data publications and uses
   range of reference data and sets brings new challenges
       discover reference terms and models to reuse
       discover datasets to use for application
       discover models and links between sets
   needs a coordination or registry service
   story for another day ...
Conclusions
   illustrated how public sector users of linked are moving
    from static pilots to operational systems
   keys are:
       reduce modelling costs through patterns and reuse
       design for continuous update
       automation of publication using declarative mappings and
        SPARQL Update
       lower barrier to use through API design and documentation
       embed in organization’s process so the data is used and useful
Acknowledgements
Only possible thanks to many smart colleagues: Stuart
Williams, Andy Seaborne, Ian Dickinson, Brian McBride,
Chris Dollin
plus Alex Coley and team from the Environment Agency

More Related Content

Similar to Industrialized Linked Data

Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
Dave Reynolds
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of Data
John Domingue
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
Derek Diamond
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012
John Domingue
 
Environmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonEnvironmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz London
Alex Coley
 
Linking UK Government Data, John Sheridan
Linking UK Government Data, John SheridanLinking UK Government Data, John Sheridan
Linking UK Government Data, John Sheridan
Semantic Web Company
 
Ipres 2011 The Costs and Economics of Preservation
Ipres 2011 The Costs and Economics of PreservationIpres 2011 The Costs and Economics of Preservation
Ipres 2011 The Costs and Economics of Preservation
neilgrindley
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Bert Taube
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
Yehia El-khatib
 
Water Innovation Network (WIN)
Water Innovation Network (WIN)Water Innovation Network (WIN)
Water Innovation Network (WIN)
InnovatioNews
 
Y&L Information_Mgmt Portfolio
Y&L Information_Mgmt PortfolioY&L Information_Mgmt Portfolio
Y&L Information_Mgmt PortfolioClint Campbell
 
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
Greenapps&web
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Shirshanka Das
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
Yael Garten
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
Vivien Bonazzi
 
COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB Project
 
Models Done Better... - UDG2018 - Intertek and DHI
Models Done Better... - UDG2018 - Intertek and DHIModels Done Better... - UDG2018 - Intertek and DHI
Models Done Better... - UDG2018 - Intertek and DHI
Stephen Flood
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
MongoDB
 

Similar to Industrialized Linked Data (20)

Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of Data
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012
 
Environmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonEnvironmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz London
 
Linking UK Government Data, John Sheridan
Linking UK Government Data, John SheridanLinking UK Government Data, John Sheridan
Linking UK Government Data, John Sheridan
 
Ipres 2011 The Costs and Economics of Preservation
Ipres 2011 The Costs and Economics of PreservationIpres 2011 The Costs and Economics of Preservation
Ipres 2011 The Costs and Economics of Preservation
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Water Innovation Network (WIN)
Water Innovation Network (WIN)Water Innovation Network (WIN)
Water Innovation Network (WIN)
 
Y&L Information_Mgmt Portfolio
Y&L Information_Mgmt PortfolioY&L Information_Mgmt Portfolio
Y&L Information_Mgmt Portfolio
 
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...
 
Models Done Better... - UDG2018 - Intertek and DHI
Models Done Better... - UDG2018 - Intertek and DHIModels Done Better... - UDG2018 - Intertek and DHI
Models Done Better... - UDG2018 - Intertek and DHI
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 

Recently uploaded

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

Industrialized Linked Data

  • 1. Industrialized Linked Data Dave Reynolds, Epimorphics Ltd @der42
  • 3. Linked Data journey ... explore  what is linked data?  what use it is for us?
  • 4. Linked Data journey ... explore  what is linked data?  what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ...
  • 5. Linked Data journey ... explore  what is linked data?  what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ...  what’s involved?
  • 6. Linked Data journey ... explore pilot data model convert publish apply Photo of The Thinker © dSeneste.dk@flicker CC BY
  • 7. Linked Data journey ... explore pilot routine? Great pilot but ...  can we reduce the time and cost?  how do we handle changes and updates?  how can we make the published data easier to use? How do we make Linked Data “business as usual”?
  • 8. Example case study: Environment Agency  monitoring of bathing water quality  static pilot  live pilot  historic annual assessments  weekly assessments  operational system  additional data feeds  live update  integrated API  data explorer
  • 9. From pilot to practice  reduce modelling costs  patterns dive1  reuse  handling change and update  patterns  publication process  automation  conversion  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  • 10. Reduce costs - modelling 1. Don’t do it  map source data into isomorphic RDF, synthesize URIs  loses some of the value proposition 2. Reuse existing ontologies intact or mix-and-match  best solution when available  W3C GLD work on vocabularies – people, organizations, datasets ... 3. Reusable vocabulary patterns  example:  Data cube plus reference URI sets  adaptable to broad range of data – environmental, statistical, financial ...
  • 11. Reusable patterns: Data cube  Much public sector data has regularities  set of measures  observations, forecasts, budgets, assessments, statistics ... >0.1 34 27 good excellent poor good 125
  • 12. Reusable patterns: Data cube  Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ... objective code cost centre 12 15 25 measure: spend 8 9 11 120 130 180 time
  • 13. Reusable patterns: Data cube  Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ...  interpreted according to attributes  units, multipliers, status objective code cost centre provisional $12k $15k $25k measure: spend $8k $9k $11k final $120k $130k $180k time
  • 15. Data cube pattern  Pattern, not a fixed ontology  customize by selecting measures, dimensions and attributes  originated in publishing of statistics  applied to environment measurements, weather forecasts, budgets and spend, quality assessments, regional demographics ...  Supports reuse  widely reusable URI sets – geography, time periods, agencies, units  organization-wide sets  modelling often only requires small increments on top of core pattern and reusable components  opens door for reusable visualization tools  standardization through W3C GLD
  • 16. Application to case study  Data Cubes for water quality measurement  in-season weekly assessments  end of season annual assessments  dimensions:  time intervals – UK reference time service  location - reference URI set for bathing waters and sample pts  cubes can reuse these dimensions  just need to define specific measures
  • 17. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns dive 2  publication process  automation  conversion  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  • 18. Handling change  critical challenge  most initial pilots choose a snapshot dataset  and go stale, fast  understanding the nature of data updates and how to handle them is critical to successful scaling to business as usual  types of change  new data related to different time period  corrections to data  entities change  properties  identity
  • 19. Modelling change 1. Individual data items relate to new time period Pattern: n-ary relation  observation resource relates value to time period and other context  use Data Cube dimensions for this bwq:sampleYear bwq:bathingWater http://reference.data.gov.uk/id/year/2009 http://environment.data.gov. uk/id/bathing- bwq:classification Higher water/ukk1202-36000 bwq:sampleYear Clevedon Beach http://reference.data.gov.uk/id/year/2010 bwq:classification Minimum bwq:sampleYear http://reference.data.gov.uk/id/year/2011 bwq:classification Higher History or latest?  latest is non-monotonic but helpful for many practical uses  materialize (SPARQL Update), implement in query, implement in API  choice whether to keep history as well  water quality v. weather forecasts
  • 20. Modelling change 2. Corrections  patterns  silent change (!)  explicit replacement  API level hides replaced values but SPARQL query can retrieve & trace  explicit change event bwq:sampleYear http://environment.data.gov. bwq:bathingWater classification : Higher http://reference.data.gov.uk/id/year/2011 uk/id/bathing- water/ukk1202-36000 dct:isReplacedBy ev:after Clevedon Beach dct:replaces ev:occuredOn classification : Minimum status: replaced analysis event reason: reanalysis ev:before ev:agent
  • 21. Modelling change 3. Mutation  Infrequent change of properties, essential identity remains  e.g. renaming a school, adding another building  routine accesses see property value, not function of time  patterns  in place update  named graphs  current graph + graphs for each previous state + meta-graph  explicit versioning with open periods
  • 22. Modelling change 3. Mutation explicit versioning with open periods dct:hasVersion dct:hasVersion endurant “Clevedon Beach” “Clevedon Sands” time:intervalStarts time:intervalStarts dct:valid 2003 dct:valid 2011 2011 time:intervalFinishes  find right version by query on validity interval  simplify use through  non-monotonic “latest value” link  API to implement query filters automatically
  • 23. Application to case study  weekly and annual samples  use Data Cube pattern (n-ary relation)  withdrawn samples  replacement pattern (no explicit change event)  Data Cube slice for “latest valid assessment”  generated by a SPARQL Update query  API gives easy access to the latest valid values  linked data following or raw SPARQL query allows drilling into changes  changes to bathing water profile  versioning pattern  bathing water entity points to latest profile (SPARQL Update again)
  • 24. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns  publication process  automation  conversion dive 3  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  • 25. Automation Transform and publish data feed increments  transformation engine service  reusable mappings, low cost to adapt to new feeds  linking to reference data  publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  • 26. Transformation service  declarative specification of transform  single service support range of transformations  easy to adapt transformation to new feeds and modelling changes  R2RML – RDB to RDF Mapping Language  specify mapping from database tables to RDF triples  W3C candidate recommendation  D2RML  R2RML extension to treat CSV feed as a database table
  • 27. Small D2RML example :dataSource a dr:CSVDataSource ; rdfs:label "dataSource" . :bathingWaterTermMap a dr:SubjectMap; dr:template "http://environment.data.gov.uk/id/bathing-water/{EUBWID2}" ; dr:class def-bw:BathingWater . :bathingWaterMap dr:logicalTable :dataSource ; dr:subjectMap :bathingWaterTermMap ; dr:predicateObjectMap [ dr:predicate rdfs:label ; dr:objectMap [dr:column "description_english" ; dr:language "en" ] ] dr:predicateObjectMap [ dr:predicate def-bw:eubwidNotation; dr:objectMap [ dr:column "EUBWID2"; dr:datatype def-bw:eubwid ] ] .
  • 28. Using patterns  problems with verbosity, increases reuse costs  extend to support modelling patterns  Data Cube  specify mapping to observation with measures and dimensions  engine generates Data Set and Data Structure Definition automatically
  • 29. D2RML cube map example :dataCubeMap a dr:DataCubeMap ; rr:logicalTable “dataSource”; dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ; dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ; Instances will dr:observationMap [ automatically link to rr:subjectMap [ base Data Set rr:termType rr:IRI ; rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ; rr:componentMap [ Implies an entry in the Data dr:componentType qb:measure ; Structure Definition which is rr:predicate aq:concentration ; auto-generated rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ] ] ; ... Define how measure value is to be represented
  • 30. But what about linking?  connect observations to reference data  a core value of linked data  R2RML has Term Maps to create values  constants and templates  extend to allow maps based on other data sources  Lookup map  lookup resource in a store, fetch predicate  Reconcile  specify lookup in a remote service  use Google Refine reconciliation API
  • 31. Automation Transform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  • 32. Publication service  goals  cope with non-monotonic effects of change representation  so replication is robust and cheap (=> make it idempotent)  solution  SPARQL Update  publish transformed increment as a simple DATA INSERT  then run SPARQL Update script for non-monotonic links  dct:replacedBy links  lastest value slices
  • 33. Sample update script DELETE { ?bw bwq:latestComplianceAssessment ?o . } WHERE { ?bw bwq:latestComplianceAssessment ?o . } INSERT { ?bw bwq:latestComplianceAssessment ?o . } WHERE { { ?slice a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year]. OPTIONAL { ?slice2 a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year2]. FILTER (?year2 > ?year) } FILTER ( !bound(?slice2) ) } ?slice qb:observation ?o . ?o bwq:bathingWater ?bw. }
  • 34. Automation Transform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes  publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  • 35. Application to case study  Update server  transforms based on scripts (earlier scripting utility)  linking to reference data  distributed publication via SPARQL Update  extensible range of data sets  annual assessments  in-season assessments  bathing water profile  features (e.g. pollution sources)  reference data
  • 36. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns  publication process  automation  conversion  publication  embed in the business process dive 4  use internally as well as externally  publish once, use many  data platform
  • 37. Embed in business process  embedding is critical to ensure data kept up to date  in turn needs usage => lower barrier to use external use data not used rich, up to date invest data data goes hard to stale justify internal use
  • 38. Lowering barrier to use  simple REST APIs  use Linked Data API specification  rich query without learning SPARQL  easy consumption as JSON, XML  gets developers used to data and data model publication LD API service transform service
  • 39. Application to case study  embedded in process for weekly/daily updates  infrastructure to automate conversion and publishing  API plus extensive developer documentation  third party and in-house applications built over API  publish once, use many  information products as applications over a data platform, usable externally as well as internally
  • 40. The next stage  grow range of data publications and uses  range of reference data and sets brings new challenges  discover reference terms and models to reuse  discover datasets to use for application  discover models and links between sets  needs a coordination or registry service  story for another day ...
  • 41. Conclusions  illustrated how public sector users of linked are moving from static pilots to operational systems  keys are:  reduce modelling costs through patterns and reuse  design for continuous update  automation of publication using declarative mappings and SPARQL Update  lower barrier to use through API design and documentation  embed in organization’s process so the data is used and useful Acknowledgements Only possible thanks to many smart colleagues: Stuart Williams, Andy Seaborne, Ian Dickinson, Brian McBride, Chris Dollin plus Alex Coley and team from the Environment Agency