Open Data & coding data.gov.uk David Read Open Knowledge Foundation [email_address]
Contents The context: Linked Open Data
Our data catalogue: CKAN
data.gov.uk using CKAN
Discussion
Open Data ”Data is expensive to create” ”But think of the mutual benefits of it being open” Accessible Allowed to use and republish Without restiction
Science UEA criticised for a "culture of withholding information." CC-BY-SA http://commons.wikimedia.org/wiki/User:ChrisO
Geographic data
Public data
Linking data Dr. Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)
Linking data 2
Linked data
Opening government data Transparency --> effectiveness
Labour and Conservatives agree (!)  with Cambridge economists: Making government datasets public will bring a £6bn boost to UK economy (We have paid for it...)
Open Data and Open Software Zero cost
Good performance
Principles: Many hands make light work / natural selection / wisdom of crowd / on shoulders of giants
Not a proprietry format
No supplier lock-in
Infrastructure Software Data Licence GPL PDDL, ODbL, ODC-By (OKF 2007-) isitopendata.org (OKF 2009-) Modules/Linking Lib, egg Spreadsheet, database, RDF/OWL Human Discovery CKAN (OKF 2008-) Automatic Distribution Apt-get, CPAN, easy_install CKAN datapkg (OKF 2008-) Hosting Sourceforge, PyPI, bitbucket archive.org / knowledgeforge.net Community freshmeat data.gov.uk email list closest?
Open Knowledge Foundation Aim: promote Open Knowledge
Founded 2004 as a 'not for profit' organisation
Strong connections with Cambridge University
A key director: Rufus Pollock
Volunteer driven
Create software tools (CKAN, KnowledgeForge), organise conferences, licenses, create visuals & mash-ups (Where Does My Money Go, Open Shakespeare), campaigns (Panton Principals)
Introducing... CKAN ”Comprehensive Knowledge Archive Network” ...well... a fancy Data Catalog
”CKAN is a registry or catalogue system for datasets or other "knowledge" resources. CKAN aims to make it easy to find, share and reuse open content and data, especially in ways that are machine automatable.”
 
 
 
Dataset name title version url author licence notes extras Tag name Resource url format description hash CKAN data model * * * Group name title description * *
Wiki
 
 

Open Data and CKAN Data Catalogues

  • 1.
    Open Data &coding data.gov.uk David Read Open Knowledge Foundation [email_address]
  • 2.
    Contents The context:Linked Open Data
  • 3.
  • 4.
  • 5.
  • 6.
    Open Data ”Datais expensive to create” ”But think of the mutual benefits of it being open” Accessible Allowed to use and republish Without restiction
  • 7.
    Science UEA criticisedfor a "culture of withholding information." CC-BY-SA http://commons.wikimedia.org/wiki/User:ChrisO
  • 8.
  • 9.
  • 10.
    Linking data Dr.Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)
  • 11.
  • 12.
  • 13.
    Opening government dataTransparency --> effectiveness
  • 14.
    Labour and Conservativesagree (!) with Cambridge economists: Making government datasets public will bring a £6bn boost to UK economy (We have paid for it...)
  • 15.
    Open Data andOpen Software Zero cost
  • 16.
  • 17.
    Principles: Many handsmake light work / natural selection / wisdom of crowd / on shoulders of giants
  • 18.
  • 19.
  • 20.
    Infrastructure Software DataLicence GPL PDDL, ODbL, ODC-By (OKF 2007-) isitopendata.org (OKF 2009-) Modules/Linking Lib, egg Spreadsheet, database, RDF/OWL Human Discovery CKAN (OKF 2008-) Automatic Distribution Apt-get, CPAN, easy_install CKAN datapkg (OKF 2008-) Hosting Sourceforge, PyPI, bitbucket archive.org / knowledgeforge.net Community freshmeat data.gov.uk email list closest?
  • 21.
    Open Knowledge FoundationAim: promote Open Knowledge
  • 22.
    Founded 2004 asa 'not for profit' organisation
  • 23.
    Strong connections withCambridge University
  • 24.
    A key director:Rufus Pollock
  • 25.
  • 26.
    Create software tools(CKAN, KnowledgeForge), organise conferences, licenses, create visuals & mash-ups (Where Does My Money Go, Open Shakespeare), campaigns (Panton Principals)
  • 27.
    Introducing... CKAN ”ComprehensiveKnowledge Archive Network” ...well... a fancy Data Catalog
  • 28.
    ”CKAN is aregistry or catalogue system for datasets or other "knowledge" resources. CKAN aims to make it easy to find, share and reuse open content and data, especially in ways that are machine automatable.”
  • 29.
  • 30.
  • 31.
  • 32.
    Dataset name titleversion url author licence notes extras Tag name Resource url format description hash CKAN data model * * * Group name title description * *
  • 33.
  • 34.
  • 35.
  • 36.
    API REST $curl http://ckan.net/api/rest/package ["2000-us-census-rdf", "32000-naples-florida-businesses-kml", "aaoe-87", "acawiki", "adb-sdbs", "addgene", "adopt-a-roadside", … Search $ curl "http://ckan.net/api/search/resource?url=.fr&all_fields=1" {"count": 6, "results": [{"id": "819c811c-7afc-4d4f-a7f8-aca0b2a84df5", "package_id": "0ad0dbb9-e1b7-43d6-9fae-ca92a889e871", "url": "http://www.frst.govt.nz/funding/futurefunding", "format": "Spreadsheet", "description": "Future funding (FRST): Spreadsheet", "hash": "", "position": 0}, {"id": ... $ curl http://ckan.net/api/rest/package/coins-data {"id": "78eccf9d-d5b3-4dbd-8ada-6801cfd7e4c8", "name": "coins-data", "title": "COINS data", "version": null, "url": "http://data.gov.uk/dataset/coins", "author": "HM Treasury (UK Government)", "author_email": null, "maintainer": null, "maintainer_email": null, "notes": "### About\r\n\r\nThe UK Government's HM Trea...
  • 37.
    Datapkg (dpm) Gettinga data package $ datapkg index-add file:///.... $ datapkg update $ datapkg search "military spending" military: Military Spending 1890-1914 military-norm: Military Spending 1890-1914 (normalized) $ datapkg install military-norm Downloading military-norm and dependencies. $ datapkg plot military $ datapkg create military-uk-usa table.csv ”Military spending UK vs USA” $ datapkg register military-uk-usa Upload derivative data
  • 38.
    CKAN communities Europe:Austria, Hungary, Germany, Italy, Finland, Netherlands, France, Norway
  • 39.
  • 40.
  • 41.
    Sharing metadata ckan.netcanada.ckan.net it.ckan.net no.ckan.net data.gov.no data.gov.it
  • 42.
    Architecture view controllermodel Drupal front-end Pylons front-end (genshi, routes, repoze.who) Vdm - Versioned Domain Model Postgres REST & Search APIs sqlalchemy Logic layer Data import scripts SOLR Search Export scripts repoze.who Atom feeds sqlalchemy-migrate
  • 43.
    data.gov.uk Gordon Browninvited Tim Berners-Lee for exciting digital plans
  • 44.
  • 45.
    Run by AntonioAcuna at the Cabinet Office, aided by The National Archives
  • 46.
    ”Raw Data Now”,then improve and link
  • 47.
    COI team produceDrupal bulk of data.gov.uk pages with OKFN producing 'data' page in CKAN
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
    Measuring success Stats:users, number of datasets, per department, big wins: Ordnance Survey, Coins, top public salaries
  • 56.
    Creation of visualisations,apps, linked data, news stories, companies - £6bn
  • 57.
  • 58.

Editor's Notes

  • #6 Also Haiti, iPhone cycle map
  • #7 Allowing with JobCentresPlus, was highlighted as an innovative use of government data
  • #8 But linking data is even more powerful Health and economic data Dr. Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)
  • #9 Geonames – lat/long of place names Dbpedia – munge of Wikipedia content e.g. Where do footballers in the premiership come from?
  • #10 Note: google maps here – Google have built their business on being very good at not only search, but linking data too. Map has restaurants, travel directions, traffic, related ads. This profits them, but what about the rest of society?
  • #12 One way achieve what hundreds of organised and motivated Google programmers do?
  • #13 Installing linux packages – really sophisticated system of downloading lots of modules and they work together Someone might combine a couple of datasets, may well do some cleaning, produce a graph, but doesn't give back the data. Also: Scraperwiki
  • #19 Core metadata based on debian package. No dependencies shown here, but we do have that too.
  • #23 Can also update via API. Also have python, php, Drupal, Wordpress and other clients to help access API.
  • #25 Lobbying governements, or just tocollect known datasets. Groups like ownership and personalisation of the site.
  • #26 clone/push/pull/merge/reject changes