Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CKAN as an open-source data management solution for open data

1,861 views

Published on

CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.

Published in: Technology
  • Be the first to comment

CKAN as an open-source data management solution for open data

  1. 1. CKAN an open-source data management solution for open data Ivan Ermilov
  2. 2. AKSW Research Group http://aksw.org
  3. 3. My experience with CKAN ● PublicData.eu portal o Crowd-sourcing CSV2RDF mappings ● LODStats o Version 1: crawling datahub.io (CKAN) o Version 2: CKAN aggregator for data.gov, publicdata.eu and datahub.io o Version 2: Crawled all three portals and published the data on datahub.io
  4. 4. CKAN IS NOT a file storage!
  5. 5. Why CKAN? ● An open source platform o Relatively easy to deploy o Provides a rich set of features for free ● Data management ● Community involvement
  6. 6. Who use CKAN? ● All major open governments o Canada (open.canada.ca): 244,238 datasets o The U.S. (data.gov): 131,348 datasets o Europe (publicdata.eu): 47,863 datasets ● And some other communities: o Semantic Web community (datahub.io): 9,509 datasets
  7. 7. CKAN architecture
  8. 8. CKAN Pros/Cons ● Pros o Organizes your data in structured way o Have an extension to support DCAT (only for datasets) o Provides API to digest your data ● Cons o The data model does not work for all use cases (DBpedia) o No strict guidelines for dataset publishing
  9. 9. CKAN functionality ● Publishing metadata ● Exposing metadata (API/front-end) ● Access control for users/organizations ● Additional functionality via plugins
  10. 10. CKAN extensions/plugins ● Data preview and visualization ● CKAN + DCAT ● Extension that adds the Disqus commenting system to CKAN ● Simple API dataset hits counter Full list is available at: http://extensions.ckan.org/
  11. 11. CKAN deployment ● From source ● OS package (e.g. as debian package) ● Docker image Official guide: http://docs.ckan.org/en/latest/maintaining/installing/index.html
  12. 12. CKAN Multi-Tier Deployment
  13. 13. CKAN API ● Well documented ● Covers everything you can do with the web interface o You can write your own web interface ● Various API clients o ckanclient (python) - official o Ruby, PHP, Java, Nodejs, Perl, R https://github.com/ckan/ckan/wiki/CKAN-API-Clients
  14. 14. CKAN API methods ● Retrieving data ● Creating new data ● Update existing data ● Delete existing data ● Data is: packages, resources, groups, tags, users etc. http://docs.ckan.org/en/latest/api/index.html
  15. 15. CKAN API: Examples ● Get package list o http://demo.ckan.org/api/3/action/package_list o Disabled for data.gov ● Get one package o http://demo.ckan.org/api/3/action/package_show?id= adur_district_spending ● ckan.logic.action.get.organization_show o api/3/action/organization_show?id=...
  16. 16. Use Case: LODStats ● Aggregate CKAN instances via API ● Filter out only related datasets ● Build an application on top of it
  17. 17. Use Case: CSV2RDF ● Integrated with a particular CKAN instance ● Aggregates all CSV files from the instance ● Provides an interface for CSV2RDF conversion
  18. 18. Thank you for your attention! Presented by Ivan Ermilov. LinkedIn: https://www.linkedin.com/in/iermilov Email: iermilov@informatik.uni-leipzig.de Skype: earthquakesan

×