Digital Enterprise Research Institute                                         www.deri.ie                                 ...
TimBL’s 5-star plan for open dataDigital Enterprise Research Institute                                                    ...
Five-shamrock schemeDigital Enterprise Research Institute   www.deri.ie
Five-shamrock schemeDigital Enterprise Research Institute            www.deri.ie                   1. Publish data on the...
Five-shamrock schemeDigital Enterprise Research Institute                                www.deri.ie                   1....
Five-shamrock schemeDigital Enterprise Research Institute                                 www.deri.ie                   1...
Five-shamrock schemeDigital Enterprise Research Institute                                 www.deri.ie                   1...
Five-shamrock schemeDigital Enterprise Research Institute                                 www.deri.ie                   1...
Digital Enterprise Research Institute            www.deri.ie                    1. Publish data on the web
Why?Digital Enterprise Research Institute                            www.deri.ie            The web is where people look ...
Lots of data is already thereDigital Enterprise Research Institute   www.deri.ie            Databases            Reports...
Digital Enterprise Research Institute                       www.deri.ie                                         2. Publish...
Why?Digital Enterprise Research Institute                                www.deri.ie            Allow others to do their ...
ExamplesDigital Enterprise Research Institute                                          www.deri.ie            CSO Quarter...
Symptom: screenscrapingDigital Enterprise Research Institute                                 www.deri.ie            Peopl...
FormatsDigital Enterprise Research Institute                       www.deri.ie            Good: MS Excel, CSV, XML, JSON,...
Good practicesDigital Enterprise Research Institute                             www.deri.ie            Publish in multipl...
Digital Enterprise Research Institute   www.deri.ie         3. Use an open standard format
Why?Digital Enterprise Research Institute                              www.deri.ie            Not all formats are created...
Quick tour of formatsDigital Enterprise Research Institute                                     www.deri.ie            CSV...
Developer-oriented formatsDigital Enterprise Research Institute                                       www.deri.ie        ...
Also: standard classificationsDigital Enterprise Research Institute                                           www.deri.ie ...
Also: standard identifiersDigital Enterprise Research Institute                                             www.deri.ie   ...
Linked Open Data CloudDigital Enterprise Research Institute   www.deri.ie
SummaryDigital Enterprise Research Institute                               www.deri.ie            Prefer open, widely use...
Digital Enterprise Research Institute                    www.deri.ie                                   4. Publish under an...
Why?Digital Enterprise Research Institute                               www.deri.ie            Regulates what others can ...
Complex topicDigital Enterprise Research Institute                www.deri.ie            Destroying a potential income st...
Irish PSI LicenseDigital Enterprise Research Institute                                   www.deri.ie            Created i...
Open database licensesDigital Enterprise Research Institute                                         www.deri.ie           ...
License featuresDigital Enterprise Research Institute                            www.deri.ie            Youre allowed to ...
Does Open Data have to be free?Digital Enterprise Research Institute                              www.deri.ie            ...
Data protectionDigital Enterprise Research Institute                www.deri.ie            Personal information is not op...
SummaryDigital Enterprise Research Institute                               www.deri.ie            Stating an explicit lic...
Digital Enterprise Research Institute                     www.deri.ie                                 5. List your data in...
Why?Digital Enterprise Research Institute                            www.deri.ie            So that people know it exists...
Some key information about a datasetDigital Enterprise Research Institute          www.deri.ie        What data is being ...
How to do this in practice?Digital Enterprise Research Institute               www.deri.ie            Have a simple page ...
Open community catalogsDigital Enterprise Research Institute       www.deri.ie            The Data Hub                  ...
Set up your own catalogDigital Enterprise Research Institute       www.deri.ie            Requires a budget            R...
National Irish data catalog?Digital Enterprise Research Institute                             www.deri.ie            CSOs...
SummaryDigital Enterprise Research Institute                                  www.deri.ie        Data catalogs make it ea...
Five-shamrock schemeDigital Enterprise Research Institute                                 www.deri.ie                   1...
Upcoming SlideShare
Loading in …5
×

How to Publish Open Data

4,300
-1

Published on

A practical guide to publishing open data, presented at the Galway event of Irish Open Data Week 2011. Introducing the “five-shamrock scheme”!

Published in: Technology, Education

How to Publish Open Data

  1. 1. Digital Enterprise Research Institute www.deri.ie How to publish Open Data Richard Cyganiak Opening Up Government Data – Galway, 8 Nov 2011 Stefan.Decker@deri.org http://www.StefanDecker.org/ Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
  2. 2. TimBL’s 5-star plan for open dataDigital Enterprise Research Institute www.deri.ie ★Make your stuff available on the Web ★★Make it available as structured data (e.g., an Excel sheet instead of image scan of a table) ★★★Use a non-proprietary format (e.g., a CSV file instead of an Excel sheet) ★★★★Use linked data format (i.e., URIs to identify things, and RDF to represent data) ★★★★★Link your data to other people’s data to provide context Source: http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/
  3. 3. Five-shamrock schemeDigital Enterprise Research Institute www.deri.ie
  4. 4. Five-shamrock schemeDigital Enterprise Research Institute www.deri.ie  1. Publish data on the web
  5. 5. Five-shamrock schemeDigital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processableformat
  6. 6. Five-shamrock schemeDigital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processable format  3. Use an open standard format
  7. 7. Five-shamrock schemeDigital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processable format  3. Use an open standard format  4. Publish under an open license
  8. 8. Five-shamrock schemeDigital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processable format  3. Use an open standard format  4. Publish under an open license  5. List your data in a data catalog
  9. 9. Digital Enterprise Research Institute www.deri.ie 1. Publish data on the web
  10. 10. Why?Digital Enterprise Research Institute www.deri.ie  The web is where people look for it first  Google can index it  Less phone calls and emails (and FoI requests) to answer
  11. 11. Lots of data is already thereDigital Enterprise Research Institute www.deri.ie  Databases  Reports  Spreadsheets  Maps
  12. 12. Digital Enterprise Research Institute www.deri.ie 2. Publish data in a machine- processableformat
  13. 13. Why?Digital Enterprise Research Institute www.deri.ie  Allow others to do their own processing, analysis and visualisation of your data  New services, new ideas
  14. 14. ExamplesDigital Enterprise Research Institute www.deri.ie  CSO Quarterly National Household Survey  http://cso.ie/qnhs/calendar_quarters_qnhs.htm  EPA enforcement files and ScraperWiki  http://www.epa.ie/whatwedo/enforce/lic/info/  https://views.scraperwiki.com/run/irish-epa-visuals/  Galway and Fingal planning applications  http://lab.linkeddata.deri.ie/2010/planning-apps/  Getting the data: 210 lines of code vs. 30 lines of code
  15. 15. Symptom: screenscrapingDigital Enterprise Research Institute www.deri.ie  People use tools like ScraperWiki to get at data that isnt machine-readable  https://scraperwiki.com/tags/ireland  Scraping is not the right way of doing this  Expensive  Brittle  Strain on computing resources
  16. 16. FormatsDigital Enterprise Research Institute www.deri.ie  Good: MS Excel, CSV, XML, JSON, Microdata  Not so good: Pure websites, MS Word  Bad: PDF  Really bad: Only charts/maps without numbers
  17. 17. Good practicesDigital Enterprise Research Institute www.deri.ie  Publish in multiple formats, at least one machine- readable  Publish Excel files alongside large PDF reports  Publish CSV alongside database-backed web applications
  18. 18. Digital Enterprise Research Institute www.deri.ie 3. Use an open standard format
  19. 19. Why?Digital Enterprise Research Institute www.deri.ie  Not all formats are created equal  Some formats bring many tools and applications that people can already use
  20. 20. Quick tour of formatsDigital Enterprise Research Institute www.deri.ie  CSV – Comma-Separated Values  More open (and simpler) alternative to Excel format  Can be opened in and exported from Excel, Google Spreadsheets, Google Refine, …  KML – Keyhole Markup Language  Simple format for presenting geographic data  Can be opened in Google Maps  RSS – Really Simple Syndication  Notifications of updates of any kind  Can be opened in RSS readers and many email clients
  21. 21. Developer-oriented formatsDigital Enterprise Research Institute www.deri.ie  XML – Extensible Markup Language  W3C (World Wide Web Consortium) standard, 1997  established, reliable, ubiquitous  JSON – Javascript Object Notation  IETF (Internet Engineering Task Force) standard, 2006  great for web APIs  very simple; very fashionable right now  RDF – Resource Description Framework  W3C standard, 2004  great for data integration  steeper learning curve
  22. 22. Also: standard classificationsDigital Enterprise Research Institute www.deri.ie  Within your data, use the same categories as everybody else  CSO  http://www.cso.ie/surveysandmethodologies/classifications_stan. htm  StatCentral list of classifications  http://www.statcentral.ie/classifications.asp
  23. 23. Also: standard identifiersDigital Enterprise Research Institute www.deri.ie  Example: School roll numbers  Department of Education publishes an Excel file with all school roll numbers  Can be used to Google the same school on other websites, school evaluation reports etc  Example: Ordnance Survey UK geo identifiers  Uses URIs (web addresses) as identifiers  http://data.ordnancesurvey.co.uk/doc/7000000000037256  Great for use in RDF
  24. 24. Linked Open Data CloudDigital Enterprise Research Institute www.deri.ie
  25. 25. SummaryDigital Enterprise Research Institute www.deri.ie  Prefer open, widely used standards  But: also prefer what you know best  Support multiple formats for different audiences where it makes sense  Great: CSV, KML, RSS, XML, JSON
  26. 26. Digital Enterprise Research Institute www.deri.ie 4. Publish under an open license
  27. 27. Why?Digital Enterprise Research Institute www.deri.ie  Regulates what others can and cannot do with the data  For re-users, uncertainty about rights is a major concern  A good way to ensure that your organisation gets acknowledged  You need some non-discriminatory policy for giving rights to the data anyway (PSI directive)
  28. 28. Complex topicDigital Enterprise Research Institute www.deri.ie  Destroying a potential income stream?  Content licenses vs database licenses  Mixing and compatibility of licenses  Wikipedia, OpenStreetMap
  29. 29. Irish PSI LicenseDigital Enterprise Research Institute www.deri.ie  Created in response to PSI Directive  Available at http://psi.gov.ie/  Problems: Documents may not be used “for the principal purpose of advertising or promoting a particular product or service”  Cant be combined with Wikipedia or OpenStreetMap  Not an open license according to Open Definition  http://opendefinition.org/
  30. 30. Open database licensesDigital Enterprise Research Institute www.deri.ie http://opendefinition.org/licenses/
  31. 31. License featuresDigital Enterprise Research Institute www.deri.ie  Youre allowed to do pretty much anything, provided you…  Attribution (“By”) – give credit  ShareAlike (“SA”) – adapted data must be published in the same way
  32. 32. Does Open Data have to be free?Digital Enterprise Research Institute www.deri.ie  Many would say yes  A matter of terminology and definitions  Either way there is nothing wrong with charging for certain data
  33. 33. Data protectionDigital Enterprise Research Institute www.deri.ie  Personal information is not open data  Freedom of Information legislation  http://foi.gov.ie/
  34. 34. SummaryDigital Enterprise Research Institute www.deri.ie  Stating an explicit license is important  Irish PSI License: Its readily available, but not “open enough” for some applications  Open Data Commons licenses with various constraints
  35. 35. Digital Enterprise Research Institute www.deri.ie 5. List your data in a data catalog
  36. 36. Why?Digital Enterprise Research Institute www.deri.ie  So that people know it exists  This is how the world learns about available data  This is how you learn what they do and need
  37. 37. Some key information about a datasetDigital Enterprise Research Institute www.deri.ie  What data is being published?  Whats the license?  When was the data collected?  When will it be updated, if at all?  How was/is this data collected?  What was/is the data used for?  Contact person?  Where to give feedback?
  38. 38. How to do this in practice?Digital Enterprise Research Institute www.deri.ie  Have a simple page on your website  Use an open community data catalog  Set up your own catalog  Use a national Irish data catalog???
  39. 39. Open community catalogsDigital Enterprise Research Institute www.deri.ie  The Data Hub  http://thedatahub.org  Irish CKAN  http://ie.ckan.net
  40. 40. Set up your own catalogDigital Enterprise Research Institute www.deri.ie  Requires a budget  Roll your own software?  data.fingal.ie  Use open source, e.g., CKAN?  data.gov.uk  Berlin Open Data  …
  41. 41. National Irish data catalog?Digital Enterprise Research Institute www.deri.ie  CSOsStatCentral?  Marine Institutes ISDE?  Who publishes the catalog in other countries?  UK: Cabinet Office  US: White House  Australia: Dept of Finance and Deregulation  New Zealand: Dept of Internal Affairs
  42. 42. SummaryDigital Enterprise Research Institute www.deri.ie  Data catalogs make it easy to find data  Basic metadata, how to give feedback etc  Important: How often are datasets accessed?  “Request a dataset” feature  Also: Open Data Ireland Google Group  http://groups.google.com/group/open-data-ireland
  43. 43. Five-shamrock schemeDigital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processable format  3. Use an open standard format  4. Publish under an open license  5. List your data in a data catalog
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×