Open Data and CKAN Data Catalogues


Published on

Given to EuroPyCon 2010, Birmingham

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Also Haiti, iPhone cycle map
  • Allowing with JobCentresPlus, was highlighted as an innovative use of government data
  • But linking data is even more powerful Health and economic data Dr. Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)
  • Geonames – lat/long of place names Dbpedia – munge of Wikipedia content e.g. Where do footballers in the premiership come from?
  • Note: google maps here – Google have built their business on being very good at not only search, but linking data too. Map has restaurants, travel directions, traffic, related ads. This profits them, but what about the rest of society?
  • One way achieve what hundreds of organised and motivated Google programmers do?
  • Installing linux packages – really sophisticated system of downloading lots of modules and they work together Someone might combine a couple of datasets, may well do some cleaning, produce a graph, but doesn't give back the data. Also: Scraperwiki
  • Core metadata based on debian package. No dependencies shown here, but we do have that too.
  • Can also update via API. Also have python, php, Drupal, Wordpress and other clients to help access API.
  • Lobbying governements, or just tocollect known datasets. Groups like ownership and personalisation of the site.
  • clone/push/pull/merge/reject changes
  • Open Data and CKAN Data Catalogues

    1. 1. Open Data & coding <ul>David Read </ul><ul>Open Knowledge Foundation [email_address] </ul>
    2. 2. Contents <ul><li>The context: Linked Open Data
    3. 3. Our data catalogue: CKAN
    4. 4. using CKAN
    5. 5. Discussion </li></ul>
    6. 6. Open Data <ul>”Data is expensive to create” ”But think of the mutual benefits of it being open” </ul>Accessible Allowed to use and republish Without restiction
    7. 7. Science UEA criticised for a &quot;culture of withholding information.&quot; CC-BY-SA
    8. 8. Geographic data
    9. 9. Public data
    10. 10. Linking data Dr. Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)
    11. 11. Linking data 2
    12. 12. Linked data
    13. 13. Opening government data <ul><li>Transparency --> effectiveness
    14. 14. Labour and Conservatives agree (!) with Cambridge economists: </li><ul><li>Making government datasets public will bring a £6bn boost to UK economy </li></ul><li>(We have paid for it...) </li></ul>
    15. 15. Open Data and Open Software <ul><li>Zero cost
    16. 16. Good performance
    17. 17. Principles: Many hands make light work / natural selection / wisdom of crowd / on shoulders of giants
    18. 18. Not a proprietry format
    19. 19. No supplier lock-in </li></ul>
    20. 20. Infrastructure Software Data Licence GPL PDDL, ODbL, ODC-By (OKF 2007-) (OKF 2009-) Modules/Linking Lib, egg Spreadsheet, database, RDF/OWL Human Discovery CKAN (OKF 2008-) Automatic Distribution Apt-get, CPAN, easy_install CKAN datapkg (OKF 2008-) Hosting Sourceforge, PyPI, bitbucket / Community freshmeat email list closest?
    21. 21. Open Knowledge Foundation <ul><li>Aim: promote Open Knowledge
    22. 22. Founded 2004 as a 'not for profit' organisation
    23. 23. Strong connections with Cambridge University
    24. 24. A key director: Rufus Pollock
    25. 25. Volunteer driven
    26. 26. Create software tools (CKAN, KnowledgeForge), organise conferences, licenses, create visuals & mash-ups (Where Does My Money Go, Open Shakespeare), campaigns (Panton Principals) </li></ul>
    27. 27. Introducing... CKAN <ul><li>”Comprehensive Knowledge Archive Network” ...well... a fancy Data Catalog
    28. 28. ”CKAN is a registry or catalogue system for datasets or other &quot;knowledge&quot; resources. CKAN aims to make it easy to find, share and reuse open content and data, especially in ways that are machine automatable.” </li></ul>
    29. 32. Data Package name title version url author licence notes extras Tag name Resource url format description hash CKAN data model * * * Group name title description * *
    30. 33. Wiki
    31. 36. API <ul><li>REST </li></ul>$ curl [&quot;2000-us-census-rdf&quot;, &quot;32000-naples-florida-businesses-kml&quot;, &quot;aaoe-87&quot;, &quot;acawiki&quot;, &quot;adb-sdbs&quot;, &quot;addgene&quot;, &quot;adopt-a-roadside&quot;, … <ul><li>Search </li></ul>$ curl &quot;; {&quot;count&quot;: 6, &quot;results&quot;: [{&quot;id&quot;: &quot;819c811c-7afc-4d4f-a7f8-aca0b2a84df5&quot;, &quot;package_id&quot;: &quot;0ad0dbb9-e1b7-43d6-9fae-ca92a889e871&quot;, &quot;url&quot;: &quot;;, &quot;format&quot;: &quot;Spreadsheet&quot;, &quot;description&quot;: &quot;Future funding (FRST): Spreadsheet&quot;, &quot;hash&quot;: &quot;&quot;, &quot;position&quot;: 0}, {&quot;id&quot;: ... $ curl {&quot;id&quot;: &quot;78eccf9d-d5b3-4dbd-8ada-6801cfd7e4c8&quot;, &quot;name&quot;: &quot;coins-data&quot;, &quot;title&quot;: &quot;COINS data&quot;, &quot;version&quot;: null, &quot;url&quot;: &quot;;, &quot;author&quot;: &quot;HM Treasury (UK Government)&quot;, &quot;author_email&quot;: null, &quot;maintainer&quot;: null, &quot;maintainer_email&quot;: null, &quot;notes&quot;: &quot;### AboutrnrnThe UK Government's HM Trea...
    32. 37. datapkg <ul><li>Getting a data package </li></ul>$ datapkg index-add file:///.... $ datapkg update $ datapkg search &quot;military spending&quot; military: Military Spending 1890-1914 military-norm: Military Spending 1890-1914 (normalized) $ datapkg install military-norm Downloading military-norm and dependencies. $ datapkg plot military $ datapkg create military-uk-usa table.csv ”Military spending UK vs USA” $ datapkg register military-uk-usa <ul><li>Upload derivative data </li></ul>
    33. 38. CKAN communities <ul><li>Europe: Austria, Hungary, Germany, Italy, Finland, Netherlands, France, Norway
    34. 39. North America: Colorado, Canada
    35. 40. Australasia: New Zealand </li></ul>
    36. 41. Sharing metadata
    37. 42. Architecture view controller model Drupal front-end Pylons front-end (genshi, routes, repoze.who) Vdm - Versioned Domain Model Postgres REST & Search APIs sqlalchemy Formalchemy Data import scripts Search Export scripts repoze.who Atom feeds carrot pyamqp sqlalchemy-migrate blinker
    38. 43. <ul><li>Gordon Brown invited Tim Berners-Lee for exciting digital plans
    39. 44. David Cameron supportive
    40. 45. Run by Cabinet Office, aided by The National Archives
    41. 46. Raw Data Now, then improve and link
    42. 47. COI team produce Drupal front-end with OKFN producing CKAN back-end </li></ul>
    43. 55. Measuring success <ul><li>Stats: users, number of datasets, per department, big wins: Ordnance Survey, Coins, top public salaries
    44. 56. Creation of visualisations, apps, linked data, news stories, companies - £6bn
    45. 57. CKAN – similar goals
    46. 58. What do you think? </li></ul>
    47. 59. Software Learnings <ul><li>Pylons – flexible, organised, powerful to customise
    48. 60. Formalchemy – tough to get beyond basics (had to read lots of code), but really neat, flexible & powerful system
    49. 61. Pip, virtualenv, nose – use happily
    50. 62. Drupal interfacing – Drupal modules rely on internal model </li></ul>
    51. 63. CKAN futures <ul><li>More metadata fields and guidance / control
    52. 64. INSPIRE geographic bounding boxes
    53. 65. Improve navigating datasets – to help linking data
    54. 66. Improving RDF catalog
    55. 67. Keep goal of supporting automated linking data
    56. 68. Suggestions please! </li></ul>
    57. 69. Project learnings <ul><li>Open source, trac, email discussion </li><ul><li>Good for getting feedback and people involved
    58. 70. Slightly worrying
    59. 71. Easy to get flooded with requests </li></ul><li>Easy to criticise – high load on launch
    60. 72. Civil servants surprisingly happy to open data </li></ul>
    61. 73. Questions