Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linking UK Government Data, John Sheridan


Published on

Keynote Präsentation von John Sheridan bei der OGD2011 Konferenz am 16. Juni 2011 in Wien: Linking UK Government Data (englisch).

Published in: Technology
  • Slide 15 is a little incorrect - Linked Data does not have to be RDF. RDF is just one of the many frameworks that can establish Linked Data, granted it is only of the most 'networkable' frameworks, but Linked Data does not have to be explicitly RDF.... just thought I'd clear that up
    Are you sure you want to  Yes  No
    Your message goes here

Linking UK Government Data, John Sheridan

  1. 1. John SheridanLinked Data lead for of Legislation Services at The [UK] National Archives
  2. 2. 2
  3. 3. 3
  4. 4. 4
  5. 5. 16. GOVERNMENT TRANSPARENCY The Government believes that we need to throw open the doors of public bodies, to enable the public to hold politicians and public bodies to account. We also recognise that this will help to deliver better value for money in public spending, and help us achieve our aim of cutting the record deficit. Setting government data free will bring significant economic benefits by enabling businesses and non- profit organisations to build innovative applications and websites. We will require public bodies to publish online the job titles of every member of staff and the salaries and expenses of senior officials paid more than the lowest salary permissible in Pay Band 1 of the Senior Civil Service pay scale, and organograms that include all positions in those bodies. We will ensure that all data published by public bodies is published in an open and standardised format, so that it can be used easily and with minimal cost by third parties.5
  6. 6. 6
  7. 7. Formats for people Formats for machines  Focused on presentation or  Focused on data interchange typographic layout between computers  Look good, but hard to access  Look dreadful, hard for people the underlying data to understand but easy to import into other systems and use7
  8. 8. Formats for Single Formats forpeople source of machines Focused on  Focused on datapresentation or data interchange betweentypographic layout computers8
  9. 9.  Download  Good for static information  Small files  Used for export/import  Easy for publishers  Most of the data registered on Programmatic access  Good for dynamic or real-time information or very large datasets  Lets developers select and use just the information they need  Retains more control for the publisher  More complicated to implement but much more powerful  Vital for many useful datasets9
  10. 10. 10
  11. 11. He also developed the first industriallypractical screw-cutting lathe in 1800, allowingstandardisation of screw thread sizes for thefirst time. This allowed the concept ofinterchangeability (a idea that was alreadytaking hold) to be practically applied to nutsand bolts. Before this, all nuts and bolts had tobe made as matching pairs only. This meantthat when machines were disassembled,careful account had to be kept of thematching nuts and bolts ready for whenreassembly took place.
  12. 12. In 1841, Joseph Whitworth created a designthat, through its adoption by many Britishrailroad companies, became a nationalstandard for the United Kingdom calledBritish Standard Whitworth. During the1840s through 1860s, this standard wasoften used in the United States and Canadaas well, in addition to myriad intra- andinter-company standards. .
  13. 13. * make your stuff available on the Web (whatever format) under an open licence** make it available as structured data (e.g., Excel instead of image scan of a table)*** use non-proprietary formats (e.g., CSV instead of Excel)**** use URIs to identify things, so that people can point at your stuff***** link your data to other data to provide context13
  14. 14. 14
  15. 15.  Give names, or web identifiers (URIs), to things Publish information about them as Web Resources Use RDF triples (subject, property, value) Link to other data about those things15
  16. 16.  Enables web-scale data publishing - distributed publication with web-based discovery mechanisms Everything is a resource – follow your nose to discover more about properties, classes, or codes within a code list Everything can be annotated - make comments about observations, data series, points on a map Easy to extend - create new properties as required, no need to plan everything up-front Easy to merge - slot together RDF graphs, no need to worry about name clashes 16
  17. 17.  developing standards for responsible publishing of key types of data (financial data, organisation data, aggregate statistics, location data) developing guidance, practices and tools that make it easy to publish data in Linked Data form, at low cost making it easy for people to consume data in a programmatic way
  18. 18. Director General 2008 2009 2010 A 1,345 1,456 2,301 Director Director B 2,112 3,543 2,111 (Operations) (Strategy) C 2,345 2,987 2,455 D 6,342 6,256 6,123Deputy Director Deputy Director E 7,435 7,432 8,102 (A) (A) Transaction Date Supplier Amount A-1263 09/09/2010 Spottiswoode & Co £ 2,345 A-1264 09/09/2010 JSB & Sons £ 2,111 A-1265 09/09/2010 BLG Ltd £ 2,455 A-1266 09/09/2010 Spottiswoode & Co £ 6,123 A-1267 09/09/2010 BLG Ltd £ 8,102
  19. 19.  URI = uniform resource identifier Everything starts HTTP – which gives us actionable names There is choice about how to make URIs We are using {sector}{something} 20
  20. 20.  If you visit you will see we have taken great care with naming thingsReturns an html document for United Kingdom Public General Act (ukpga), 2005,Chapter 14, Section 1Returns an html document with a list from all legislation types where the titlecontains “wildlife” 23
  21. 21.  UK Public General Act (ukpga) 1981 Chapter 69 Section 5 As it extends to England As it stood on 30th January 2001 Displayed as an HTML document with the timeline on Although URIs are opaque having this type of design changes how people use the service24
  22. 22. 25
  23. 23.  Everything on is available as open data under the terms of our Open Government Licence To access the data, visit any page and add:  /data.xml  /data.rdf  /data.xht For lists  /data.feed26
  24. 24.  Re-use where we can, create where we must Small, high level, light weight vocabularies  Examples include datacube, organization, provenance Create local specialisations  Examples include payments, central-government Post hoc linking 27
  25. 25. qb:componentRequired : boolean qb:DataStructureDefinition qb:ComponentSpecification qb:componentAttachment : rdfs:Class qb:order : xsd:int qb:sliceKey qb:componentProperty qb:dimension qb:structure qb:attribute qb:componentProperty qb:measure qb:DataSet qb:SliceKey qb:slice qb:sliceStructure qb:ComponentProperty qb:dataset qb:concept qb:Slice qb:DimensionProperty qb:measureType qb:subSlice skos:Concept qb:AttributeProperty qb:observationqb:Observation sdmx:Concept qb:MeasureProperty qb:CodedProperty sdmx:ConceptRole qb:codeList sdmx:FrequencyRole skos:ConceptScheme sdmx:CountRole sdmx:EntityRole sdmx:TimeRole sdmx:CodeList ... 28
  26. 26. qb:structure payer foaf:Agent qb:dataset PaymentDataset payee foaf:Agent qb:slice unit org:OrganizationalUnit date payment interval:Interval Payment expenditureLine purchase order Purchase ExpenditureLine invoice narrative amountIncludingVAT contract amountExcludingVAT procurementCategory transactionReferenceexpenditureCode vatCategory vatRate paymentReference skos:Concept totalAmountIncludingVATskos:Concept item totalAmountExcludingVAT redacted revenue Item capital skos:Concept29 ItemCategory
  27. 27.  *new* Government Linked Data Working Group Provenance Working Group
  28. 28. 31
  29. 29.
  30. 30.  There are similar URIs for seconds, minutes, hours, weeks, months, quarters, years We were a bit slow (170 years) to move from the Julian to Gregorian Calendar (see the Calendar Act, 1750) To transition, we lost 11 days in 1752 Convoluted explanation of why the tax year in the UK starts on the 6th April Our URIs for time intervals work this way too and the British time intervals URI Set is linked to the legislation
  31. 31. 34
  32. 32.  Malcolm Gladwell article on Ron Popeil from 2000 in the New Yorker: ”And how do you persuade people to disrupt their lives? Not merely by ingratiation or sincerity, and not by being famous or beautiful. You have to explain the invention to consumers - not once or twice but three or four times, with a different twist each time. You have to show them exactly how it works and why it works, and make them follow your hands as you chop liver with it, and then tell them precisely how it fits into their routine, and, finally, sell them on the paradoxical fact that, revolutionary as the gadget is, its not at all hard to use.”
  33. 33. 36
  34. 34. 37
  35. 35. 38
  36. 36. 39
  37. 37.  Open Standard Generic approach for creating APIs from Linked Data Sits on top of a Linked Data store Several implementations, most mature is Puelia 40
  38. 38. 41
  39. 39. 42
  40. 40. 43
  41. 41. 44
  42. 42.  We will require public bodies to publish online the job titles of every member of staff and the salaries and expenses of senior officials paid more than the lowest salary permissible in Pay Band 1 of the Senior Civil Service pay scale, and organograms that include all positions in those bodies.
  43. 43.  October 2010 CSV template and PDFs of organograms, typically authored using Powerpoint Emphasis on visual appearance, led to inconsistent datasets which are very hard to re-use No relationship between the organogram and data Not using web standards 46
  44. 44. “The Government has publishedthe most comprehensiveorganisational charts of the UKCivil Service ever released online,taking another step towards itsgoal of being the most transparentgovernment in the world andopening up the structure of theCivil Service to public scrutiny”
  45. 45.  100s of UK Government Organisations have published their organisation data as Linked Data Distributed data publishing It the largest number of organisations joining the Web of Linked Data in a single day! The data is deeply linked (Departments, Grades , Professions, date of the snapshot) Cross dataset queries are perhaps the most interesting Proves Linked Data is moving from research topic to commodity publishing We can now extend this approach to other types of dataset and link our transparency data 49
  46. 46.  Make it as simple as possible for people in Departments to create Linked Data Create high quality, consistent data that matches the policy intent and guidance Distributed capture and publishing Create open data in open standards using open source tools Human readable and machine readable from single source Provide download and API access in different formats (CSV, XML, JSON, RDF, HTML) Evolutionary route to create longitudinal datasets, reconciling against previous data Enable everyone to publish 5 Star Linked Data 50
  47. 47.  Capture organisation data using a spreadsheet, which verifies policy rules and datatypes Upload spreadsheet Preview organogram Download RDF and two CSVs Publish on your website and register with 51
  48. 48.  It’s the tool most Civil Servants have This *does* also work in Libre Office / Open Office etc 52
  49. 49. 53
  50. 50. 54
  51. 51. 55
  52. 52. Organogram HTML, CSS & JavaScriptExcel file HTML XML JSON 1. Upload Excel Organogram (PHP) Linked Data API 2. Create 3. Create 4. Query 5. Create CSVs Mapping (SPARQL) RDF RDF fileSenior Junior Mapping API Config 6. Load CSV CSV TRiG RDF 7. Query XLWrap (SPARQL) Sesame TDB RDF Store 57 Reconciliation
  53. 53.  Implicit properties are made explicit (person, role, person in a role) Reconciliation adds value by automatic linking to other data Provenance Example data Explicit open licence
  54. 54. 60
  55. 55.  Linked Data is essential to realising the promise of Open Government Data Using Linked Data means working on  Standards  Reference Data  Production  Publishing Lots of opportunities for international collaboration Best advice, just start
  56. 56. email: john@johnlsheridanTwitter: @johnlsheridanSkype: johnlsheridan