Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked Open Government Data in UK


Published on

Linked Data
John Sheridan,
National Archives UK

REEEP Open Data Workshop, Abu Dhabi, UAE
18 Jan 2011

Published in: Technology
  • Be the first to comment

Linked Open Government Data in UK

  1. 1. Linked DataJohn Sheridan@johnlsheridan18 January 2012
  2. 2. “We shape our tools and they in turn shape us” Marshall McLuhan2
  3. 3. The Wealth of Networks “Different technologies make different kinds of human action and interaction easier or harder to perform. All other things being equal, things that are easier to do are more likely to be done and things that are harder to do are less likely to be done. All other things are *never* equal. That is why technological determinism in the strict sense–if you have technology “t” you should expect social structure or relation “s” to emerge–is false…Neither deterministic nor wholly malleable, technology sets some parameters of individual and social action. It can make some actions, relationships, organizations and institutions easier to pursue, and others harder… The same technologies of networked computers can be adopted in very different patterns. There is no guarantee that networked information technology will lead to the improvements in innovation, freedom and justice that I suggest are possible…The way we develop will, in significant measure, depend on choices we make in the next decade or so.” – Yochai Benkler, The Wealth of Networks
  4. 4. Information economics and data• Better informed markets operate more efficiently• Governments are making more data available on the web• We are at the beginning of an age of data abundance• Large scale data aggregation is now possible4
  5. 5. Interoperability with the world?• [DN: insert picture of globe]5
  7. 7. Transparency and
  8. 8. Commitments8
  9. 9. Which says…16. GOVERNMENT TRANSPARENCY The Government believes that we need to throw open the doors of public bodies, to enable the public to hold politicians and public bodies to account. We also recognise that this will help to deliver better value for money in public spending, and help us achieve our aim of cutting the record deficit. Setting government data free will bring significant economic benefits by enabling businesses and non- profit organisations to build innovative applications and websites. We will ensure that all data published by public bodies is published in an open and standardised format, so that it can be used easily and with minimal cost by third parties.9
  10. 10. Open Data Policy in the UK• Open by default• Open Government Licence• Seeking to address substantial policy issues through the use of open data• Health and Transport data are at the forefront of this drive• Consultation in Autumn 2011, White Paper early this year10
  11. 11. CHOICES11
  12. 12. Choosing formats for data Formats for people Formats for machines  Focused on presentation or  Focused on data interchange typographic layout between computers  Look good, but hard to  Look dreadful, hard for people access the underlying data to understand but easy to import into other systems and use12
  13. 13. A false dichotomy Formats for Single Formats for people source of machines  Focused on  Focused on data presentation or data interchange typographic layout between computers13
  14. 14. Download or programmatic access?• Download o Good for static information o Small files o Used for export/import o Easy for publishers o Most of the data registered on• Programmatic access o Good for dynamic or real-time information or very large datasets o Lets developers select and use just the information they need o Retains more control for the publisher o More complicated to implement but much more powerful o Vital for many useful datasets14
  15. 15. STANDARDS15
  16. 16. Henry Maudslay (1771–1831)He also developed the first industriallypractical screw-cutting lathe in 1800,allowing standardisation of screw threadsizes for the first time. This allowed theconcept of interchangeability (a idea thatwas already taking hold) to be practicallyapplied to nuts and bolts. Before this, allnuts and bolts had to be made as matchingpairs only. This meant that when machineswere disassembled, careful account had tobe kept of the matching nuts and boltsready for when reassembly took place.
  17. 17. Joseph Whitworth (1804-1887)In 1841, Joseph Whitworth created adesign that, through its adoption by manyBritish railroad companies, became anational standard for the United Kingdomcalled British Standard Whitworth. Duringthe 1840s through 1860s, this standardwas often used in the United States andCanada as well, in addition to myriadintra- and inter-company standards. .
  18. 18. Tim Berners-Lee five stars* make your stuff available on the Web (whatever format) under an open licence** make it available as structured data (e.g., Excel instead of image scan of a table)*** use non-proprietary formats (e.g., CSV instead of Excel)**** use URIs to identify things, so that people can point at your stuff***** link your data to other data to provide context18
  19. 19. LINKED DATA19
  20. 20. Linked Data• Give names, or web identifiers (URIs), to things• Publish information about them as Web Resources• Use RDF triples (subject, property, value)• Link to other data about those things20
  21. 21. Benefits• Enables web-scale data publishing - distributed publication with web-based discovery mechanisms• Everything is a resource – follow your nose to discover more about properties, classes, or codes within a code list• Everything can be annotated - make comments about observations, data series, points on a map• Easy to extend - create new properties as required, no need to plan everything up-front• Easy to merge - slot together RDF graphs, no need to worry about name clashes 21
  22. 22. You can do more with Linked Data
  23. 23. UK Government has been:• developing standards for responsible publishing of key types of data (financial data, organisation data, aggregate statistics, location data)• developing guidance, practices and tools that make it easy to publish data in Linked Data form, at low cost• making it easy for people to consume data in a programmatic way
  24. 24. Types of data: 2008 2009 2010 Director General A 1,345 1,456 2,301 B 2,112 3,543 2,111 C 2,345 2,987 2,455 Director Director (Operations) (Strategy) D 6,342 6,256 6,123 E 7,435 7,432 8,102Deputy Director Deputy Director (A) Transaction (A) Date Supplier Amount A-1263 09/09/2010 Spottiswoode & Co £ 2,345 A-1264 09/09/2010 JSB & Sons £ 2,111 A-1265 09/09/2010 BLG Ltd £ 2,455 A-1266 09/09/2010 Spottiswoode & Co £ 6,123 A-1267 09/09/2010 BLG Ltd £ 8,102
  25. 25. Naming things with URIs• URI = uniform resource identifier• Everything starts HTTP – which gives us actionable names• There is choice about how to make URIs• We are using {sector}{something} 25
  26. 26. Location URIs for INSPIRE
  27. 27. Naming things in legislation
  28. 28. Naming things in legislation• If you visit you will see we have taken great care with naming thingsReturns an html document for United Kingdom Public General Act (ukpga),2005, Chapter 14, Section 1Returns an html document with a list from all legislation types where thetitle contains “wildlife”
  29. 29. Some names are quite sophisticated…• UK Public General Act (ukpga)• 1981• Chapter 69• Section 5• As it extends to England• As it stood on 30th January 2001• Displayed as an HTML document with the timeline on• Although URIs are opaque having this type of design changes how people use the service29
  30. 30. Legislation as Open Data• Everything on is available as open data under the terms of our Open Government Licence• To access the data, visit any page and add: o /data.xml o /data.rdf o /data.xht• For lists o /data.feed30
  31. 31. Linked Data Standards• Re-use where we can, create where we must• Small, high level, light weight vocabularies o Examples include datacube, organization, provenance• Create local specialisations o Examples include payments, central-government• Post hoc linking 31
  32. 32. Data cube vocabulary qb:componentRequired : boolean qb:DataStructureDefinition qb:ComponentSpecification qb:componentAttachment : rdfs:Class qb:order : xsd:int qb:sliceKey qb:componentProperty qb:dimension qb:structure qb:attribute qb:componentProperty qb:measure qb:DataSet qb:SliceKey qb:slice qb:sliceStructure qb:ComponentProperty qb:dataset qb:concept qb:DimensionProperty qb:Slice qb:measureType qb:subSlice skos:Concept qb:AttributeProperty qb:observationqb:Observation sdmx:Concept qb:MeasureProperty qb:CodedProperty sdmx:ConceptRole qb:codeList sdmx:FrequencyRole skos:ConceptScheme sdmx:CountRole sdmx:EntityRole sdmx:TimeRole sdmx:CodeList ...
  33. 33. Payments (a cube specialisation) qb:structure payer foaf:Agent qb:dataset PaymentDataset payee foaf:Agent qb:slice unit org:OrganizationalUnit date payment interval:Interval Payment expenditureLine purchase order Purchase ExpenditureLine invoice narrative amountIncludingVAT contract amountExcludingVAT procurementCategory transactionReference expenditureCode vatCategory vatRate paymentReference skos:Concept totalAmountIncludingVAT skos:Concept item totalAmountExcludingVAT redacted revenue Item capital skos:Concept ItemCategory33
  34. 34. DATA34
  35. 35. Reference data
  36. 36. British time intervals • • There are similar URIs for seconds, minutes, hours, weeks, months, quarters, years • We were a bit slow (170 years) to move from the Julian to Gregorian Calendar (see the Calendar Act, 1750) • To transition, we lost 11 days in 1752 • Convoluted explanation of why the tax year in the UK starts on the 6th April • Our URIs for time intervals work this way too and the British time intervals URI Set is linked to the legislation
  37. 37. PRODUCTION37
  38. 38. Chop-O-Matic• Malcolm Gladwell article on Ron Popeil from 2000 in the New Yorker:• ”And how do you persuade people to disrupt their lives? Not merely by ingratiation or sincerity, and not by being famous or beautiful. You have to explain the invention to consumers - not once or twice but three or four times, with a different twist each time. You have to show them exactly how it works and why it works, and make them follow your hands as you chop liver with it, and then tell them precisely how it fits into their routine, and, finally, sell them on the paradoxical fact that, revolutionary as the gadget is, its not at all hard to use.”
  39. 39. Google Refine (formerly Gridworks)39
  40. 40. Use Refine to map and export Linked Data40
  41. 41. PUBLISHING41
  42. 42. 42
  43. 43. Linked Data API• Open Standard• Generic approach for creating APIs from Linked Data• Sits on top of a Linked Data store• Several implementations, most mature is Puelia 43
  44. 44. 44
  45. 45. 45
  46. 46. CASE STUDIES46
  47. 47. Back to those commitments47
  48. 48. Publishing Organisation Data• We will require public bodies to publish online the job titles of every member of staff and the salaries and expenses of senior officials paid more than the lowest salary permissible in Pay Band 1 of the Senior Civil Service pay scale, and organograms that include all positions in those bodies.
  49. 49. Our first go…• October 2010• CSV template and PDFs of organograms, typically authored using Powerpoint• Emphasis on visual appearance, led to inconsistent datasets which are very hard to re-use• No relationship between the organogram and data• Not using web standards 49
  50. 50. Press Release “The Government has published the most comprehensive organisational charts of the UK Civil Service ever released online, taking another step towards its goal of being the most transparent government in the world and opening up the structure of the Civil Service to public scrutiny”
  51. 51. It’s *all* Linked Data• 100s of UK Government Organisations published their organisation data as Linked Data• Distributed data publishing• The data is deeply linked (Departments, Grades , Professions, date of the snapshot)• Cross dataset queries are perhaps the most interesting• Proves Linked Data is moving from research topic to commodity publishing• We can now extend this approach to other types of dataset and link our transparency data 51
  52. 52. Our aims with Organogram Data• Make it as simple as possible for people in Departments to create Linked Data• Create high quality, consistent data that matches the policy intent and guidance• Distributed capture and publishing• Create open data in open standards using open source tools• Human readable and machine readable from single source• Provide download and API access in different formats (CSV, XML, JSON, RDF, HTML)• Evolutionary route to create longitudinal datasets, reconciling against previous data• Enable everyone to publish 5 Star Linked Data 52
  53. 53. The process• Capture organisation data using a spreadsheet, which verifies policy rules and datatypes• Upload spreadsheet• Preview organogram• Download RDF and two CSVs• Publish on your website and register with 53
  54. 54. The Excel bit…• It’s the tool most Civil Servants have• This *does* also work in Libre Office / Open Office etc 54
  55. 55. 55
  56. 56. 56
  57. 57. 57
  58. 58. Linked Data Publishing Infrastructure Organogram HTML, CSS & JavaScript Excel file HTML XML JSON 1. Upload Excel Organogram (PHP) Linked Data API 2. Create 3. Create 4. Query 5. Create CSVs Mapping (SPARQL) RDF RDF file Senior Junior Mapping API 6. Load CSV CSV TRiG RDF Config 7. Query XLWrap (SPARQL) Sesame TDB RDF Store Reconciliation58
  59. 59. Linked Data adds value• Implicit properties are made explicit (person, role, person in a role)• Reconciliation adds value by automatic linking to other data• Provenance• Example data• Explicit open licence
  60. 60. 60
  61. 61. On the web, everything is a claim• How did you come by this information?• What did you do with it?• When, who and how?62
  62. 62. An opportunity• We are developing a new system for publishing legislation, operating inside the government secure intranet / extranet• We want to provide evidence that supports the data we are publishing63
  63. 63. Legislation workflows• Complicated and vary by jurisdiction and content type• We take documents in different formats (Word, Framemaker) and convert them to a single format (XML)• We store XML documents in an XML Database• We take documents from a single format (XML) and transform them to different formats (HTML and PDF)• Complex processes for handling images etc• Sometimes mistakes are made, which can be corrected through a “Correction Slip”64
  64. 64. Objectives for provenance with legislation• Transparency and public trust - we substantiate our claim that this web page is what the legislation says• The audit trail is repeatable• Performs automatic checks along the way and evidence that checking• Use digital signatures rather than rely on the immutability of paper, to ensure authenticity• Create a data source we can use to resolve any disputes (where did that footnote go?)• Create a data source we can use to measure contractual performance (how long did it take to publish that document?)65
  65. 65. Our technology choices• We use both XML and RDF• XML is brilliant for single source publishing solutions – one source, many outputs• RDF provides a flexible data model for other types of information (bibliographic metadata, but also things like which item of legislation has changed what)• We are recording provenance in RDF using the Open Provenance Model Vocabulary66
  66. 66. Open Provenance Model Vocabulary Opmv:Artifact(k-1) Opmv:Agent Opmv:Artifact(k-1) Opmv:Artifact Opmv:wasPerformedBy Opmv:used Opmv:wasControlledBy Opmv:Process Opmv:wasGeneratedBy Document(k) Opmv:Artifact(k) Document(k) Opmv:Artifact(k) Document Opmv:Artifact67
  67. 67. Provenance chain audit trail<urn:uuid:6F677120-152C-11E1-8715-95963F5713B6><http://w8www077254:9999/vsrs_api/bundle/2011-11-09/2/uksi/task/word-export-wml/1> a ns0:Process ; rdfs: "Word Export to WML1 Process" ; ns0:wasControlledBy Container1<> ,<> ; ns1:hasParentProcess <http://w8www077254:9999/vsrs_api/bundle/2011-11-09/2/uksi/task/word-to-xml> ; ns2:source <http://w8www077254:9999/vsrs_api/bundle/2011-11-09/2/uksi/data.doc> .} Signature(c1) Container2<urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> {<urn:uuid:6F677120-152C-11E1-8715-95963F5713B6> swp:assertedBy <urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> ; swp:digest Signature(c2)"N2U1ZGZhMzI3M2IzNmFjNDNlMmZkZTkyZTkwY2RlYWY4NmU5MDJiYw=="^^<> ; Container3 swp:digestMethod swp:JjcRdfC14N-sha1 . <urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> swp:assertedBy <urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> ; swp:authority <> ; swp:signature"kWcf…6g=="^^< Signature(c3)>; swp:signatureMethod swp:JjcRdfC14N-rsa-sha1 . <> swp:X509Certificate "MIIG …. “ .}68
  68. 68. Publishing provenance• Provenance information may be associated by including a <link> element in the HTML <head> section:<html xmlns=""> <head> <link rel="provenance" href="provenance-URI"> <link rel="anchor" href="entity-URI"> <title>Welcome to</title> </head> <body> ... </body></html>69
  69. 69. Summary• Linked Data is essential to realising the promise of Open Government Data• Using Linked Data means working on o Standards o Reference Data o Production o Publishing• Benefits grow with the more data you want to combine• Lots of opportunities for international collaboration• Best advice, just start
  70. 70. Questions?71