readying the public sector for
web-scale data challenges
Government ICT 2.0 Data Stream
Alexander Coley
Data Business Strategist
epimorphics.com/
live-projects
experts in
data of
the webdata integration
that scales
Open Data
1995
Linked Data
2006
Power of Info Taskforce
2009
data
locked
away and
rarely
shared
(even internally)
data
that is
open(ed)
is
considered
separate
data
treated as
discrete
objects
that you
withdraw
data not
treated as
critical
infrastructure
(only considered in
isolation)
new data
sources
slowly
considered
especially
if real-time
data
integration
built upon
lakes with-
in one
organisation
costs
are
not
fairly
shared
lots of
progress
and
examples
of best
practice
still relevant just
not about ticking
a box (and for more
than open data)
URIs for identifying things
reference data
for re-usedesigning data
and ideally
with users
with data
modeler
with domain
experts
building connected data
ecosystems
metadata needs to
stay with the data
(context and provenance)
data across
multiple
organisational
boundaries
supporting
analytics
evolving connected data ecosystems
NOW
SOON
we publish over
for the UK public-sector
rising to
1bn
triples
10bn
triples
with near real-time,
weekly, monthly
and
other update cycles
including some
high use data
services
epimorphics.com/live-projects
treasure-troves
of data in
multiple legacy
systems
connecting to
and unlocking
this data
without needing
to build from
scratch
not much
use if not
maintained
recognising
commitment:
change
needs
control
Thank you
alex.coley@epimorphics.com
@alexrcoley
Note: in this presentation all slide background
images have been obtained with a CCO license
what is linked data: bit.ly/2bY6mNI
live projects: bit.ly/2coGfil
our technology: bit.ly/2chze5G
case studies: bit.ly/2djvQnw

readying the public sector for web-scale data challenges

Editor's Notes

  • #2 Thank you at Epimorphics – we use linked-data technology in ways that has been / is deeply involved in supporting companies make the most of their data through web-scale data integration We’ve heard from Heather and Charlie… That there is huge potential for better use of data in delivering public services (and great examples of that happening)
  • #3 I’m hoping to give some insight into how we are moving to a vision of connected data ecosystems across the public sector that can support meeting that data potential. From scattered data (scattered enterprise data) to connected data ecosystems that power new more effective public services There are challenges in realising web-scale data but we have learnt a lot in the nearly 12 years of data-on-the web.
  • #4 From what were quite humble beginnings we have seen growing accessibility of data ALONGSIDE the vast increases in range and volume of data from all sectors Open data as a term first appeared some 22 years ago in relation to environmental data in the US Linked-data and the semantic web grew from the initial proposals some 11 years ago. and a real impetus in the UK kicked off around the time of the power of information taskforce report 8 years ago – who remembers that far back? Not that long OR ages depending on your perspective BUT all the components are there and some of the original visions of data on the web that Tim Berners Lee suggested are beginning to be realised
  • #5 HUGE Strives have been made through working through challenges… Since the push for more accessible time we have seen… Data locked away and rarely shared Beginning to be accessible sharing internally in some organisations is still problematic (and I don’t mean for protected data)
  • #6 Even data that is opened up is considered a separate thing not “the real data”, a special case Not part of business as usual operations
  • #7 Data is still commonly treated as something you withdraw in one set rather than using the components you need and recombining as you change those needs
  • #8 Fundamentally – not enough organisations have treated data as critical infrastructure and certainly haven’t considered the consequences of change or the service upon down-stream use for example what happens when I turn off this dataset
  • #9 Even the vast potential of new available data sources - that there undoubtably are Are not used or not used quickly – being considered only very slowly Real time data has been especially problematic for organisation to incorporate into the operations
  • #10 Those organisation that have moved towards data integration have tended to focus on integration within a smaller subset of data. Sometimes organisations are thrilled with benefits that come just with the first small step of simple integrations such as through business intelligence tools And building data-lakes within the one organisation but struggling with the barriers between organisations when the challenge to solve - crosses boundaries
  • #11 Commonly reference data is owned and published by one organisation and used by many. Even in the public sector the cost are not always fairly shared. There are examples of datasets where much of the benefits of use sit outside of the organisation that “owns” the data … putting the cost pressure of increased use on the organisation that doesn't’t see the benefits. You can imagine the behavioural pressures that this can impose.
  • #12 We should be really upbeat BECAUSE in spite of those challenges, some technical but many cultural huge progress has been made There are great examples of best practice here in the UK and a growing loose connected data ecosystem where there is huge potential for more benefits to be seen quickly
  • #13 Rarely is there a talk mentioning linked-data that doesn't have a image of this 5-star mug… And the open data white paper 2012 (5 years ago) used this rating to suggest direction of travel Unfortunately it was often used as a tick box exercise and little real notice given of the advantages of 4 and 5 star data The 5-stars are still relevant BUT only when taken in broader context of publishing data for re-use As in maintaining, updating, supporting users etc…
  • #14 Underpinning 4 and 5 star data are URIs Persistent resolvable identifiers as URIs to identify things (at all levels within data) are a mandatory standard on the government open standards hub.
  • #15 For example we see reference data and code-lists using these approaches as registers and more complex code-lists and vocabularies from across the public sector A register is an authoritative list of information you can trust. A canonical source of truth. Registers are important, and there are already many of them from across government.  Previously, Services had no standard way of accessing the data in these lists so they needed to develop bespoke software to do it. This limited CONSISTENCY and DATA INTEGRITY and resulted in lots of duplication
  • #16 There is an increasing recognition that data needs to be developed and designed using good design practices – learned from digital service design Being designed for multiple-reuse Using domain experts Good data modelers (people who are experienced in designing data and working with standards) and Actual users So less taking a table exported from a relational database and just publishing that
  • #17 What we are working TOWARDS and are beginning to see are real CONNECTED DATA ECOSYSTEMS. The aim is to use the technology of the web - for web-scale data integration… To integrate and analyze scattered enterprise data and in many cases data across more than one organisation One of the problems with having all this data is that it’s scattered across so many different places. Business systems store data in silos, so your line-of-business application has a database, your HR department has a database and your planning systems have a database. All separate all distinct. Yet organisational challenges rarely fit those same silos. Rare is the organization that has built the glue in between these services and products in a way that meaningful data can be gleaned after the fact. A number are beginning the task and moving beyond the simple integrations.
  • #18 What we are seeing are growing connected data ecosystems built on the open standards of the web that use more agile data integration and governance solutions…. Building flexibility maintaining Integrity and supporting a robust use of data for multiple applications For example we and other suppliers have been supporting a number of parts of the public sector in publishing data that interconnects These connections within the data support different use depending on a specific need / or question or service challenge Also we see the utility of each individual dataset increase as more interconnections are made
  • #19 Just to give you an idea of the scale… we publish over 1billion triples for the UK public-sector (including some high use data services) with near real-time, weekly, monthly and other update cycles and this will grow to over 10billion triples with some new datasets in the coming months
  • #20 A key factor is being able to build connectors into existing legacy systems rather than rebuild systems that have previously been considered too expensive to change but the data not high enough value to invest yet there are real treasure troves of this siloed isolated data out there
  • #21 Some examples include The Food Standard Agencies food codes and Food Safety Alerts (coming soon) The Environment Agencies Public Registers, real time river levels, tides, rainfall and flood data, water quality and so on And HM Land Registries UK House Price Index All of these…
  • #22 All of these.. There is a need to (and they do) recognise the commitment that comes with this data Users need to have trust Someone making the effort to develop something new on top of the data needs to TRUST that the investment is worth it – in part including that is will still be there in the future
  • #23 Thank you
  • #24 Contact details and links to examples