An introduction to the free and open source software for data catalogs, CKAN (Comprehensive Knowledge Archive Network). Presented at the IV Moscow Urban Forum, Russia, in December 2014. http://mosurbanforum.com/forum2014/
1. Open source data catalog
An overview of CKAN
Augusto Herrmann
Open Knowledge Brazil
2. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Topics covered in this presentation
• Introduction
○ what is CKAN
○ who uses it
○ feature tour
• Features of CKAN
• Data publishing
2
• Under the hood
○ installation and maintenance
• Site administration
• Directions (where to find stuff)
3. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Time constraints
• pick and choose topics accordingly
• I’ll be quick, but will address questions
3
by Moyan Brenn
4. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
4
First, a quick poll
•who is familiar with
○ the concepts of open data
○ browsing open data catalogs
○ including data in CKAN catalogs
○ installing CKAN
○ developing / theming CKAN
y sean dreilinger
6. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
What is it?
Comprehensive
Knowledge
Archive
Network
by degreezero2000
6
7. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
by Steven de Costa
7
8. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
Affero GPL 3 Licence
● if you offer it as software-as-a-
service (SaaS), you also have
to make source code available
https://github.com/ckan/ckan
more than 7 years old
more than 80 developers
8
9. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● stores metadata, not data itself
(in principle)
● makes it easy to find data
● keep handy documentation about data
by Reeding Lessons
9
10. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● data must be available on the internet
in a permanent URL
○ directly linkable
by Dave Winer
10
11. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● data must be available on the internet
in a permanent URL
○ no captcha!
byLuChOeDu
11
12. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● structured data
○ no tables inside pdf or doc
■ common offenders: statistic bulletins,
official press
○ no tables as images
by Petras Gagilas
12
13. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● open formats
○ common formats: csv, json, xml, rdf
● open licences
○ “Open data and content can be freely
used, modified, and shared by anyone
for any purpose” - opendefinition.org
○ examples: CC 4.0, ODbL, OGL
by Jonathan Grey
13
14. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Who makes it?
● Open Knowledge
http://okfn.org
http://br.okfn.org
● Community of developers
http://github.com/ckan/ckan
● Governance: CKAN Association
http://ckan.org/about/association
14
16. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Who uses it?
● national governments
● local and regional
governments
● parliaments
● civil society
(e.g. community instances)
● research institutions
(open research data)
more at: http://ckan.org/instances
16
18. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
data.gov.uk
18
United Kingdom
Source code:
https://github.
com/datagovuk
19. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
data.gov
19
USA
20. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
dados.gov.br
20
Brazil
Source code:
http://dev.dados.gov.
br/codigo/dev/tema-ckan
21. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
and many other countries
21
● Argentina
● Australia
● Austria
● Canada
● Germany
● Iceland
● Ireland
● Italia
● Japan
● Mexico
● Netherlands
● Norway
● Romania
● Slovakia
● Sweden
● Switzerland
● Uruguay
Riley Kaminer
23. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
dados.recife.pe.gov.br
23
Recife, PE, Brazil
Source code:
http://dados.recife.pe.gov.
br/source/ckan_dados_recife_20140828.zip
24. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
data.rio.rj.gov.br
24
Rio de Janeiro, RJ,
Brazil
25. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
datapoa.com.br
25
Porto Alegre, RS,
Brazil
26. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
data.buenosaires.gob.ar
26
Buenos Aires,
Argentina
27. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
opendata.caceres.es
27
Cáceres, Spain
28. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
data.kk.dk
28
Copenhagen,
Denmark
30. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
datahub.io
30
Open Knowledge
31. CKAN Overview | Augusto Herrmann
IV Moscow Urban Forum
hubofdata.ru
31
OpenGovData.ru
32. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Internationalization (i18n)
● available in 53 languages
● languages with 99% or more complete
in version 2.2:
○ bulgarian
○ catalan
○ czech
○ dutch
○ french
○ finnish
○ german
○ italian
○ japanese
○ norweigan
○ portuguese (br)
○ spanish
○ swedish
32
by Eric Andresen
33. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Russian localization
● 92% completed for version 2.2
● translation of version 2.3 will soon begin
● join the localization team:
○ collaborative translation platform - Transifex
○ https://www.transifex.com/projects/p/ckan/language/ru/
33
35. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Catalog and search data
● catalog through the web interface,
using the API or harvesting tools
● search all metadata fields
● faceted search
○ organization, tag,
format, license
● data is sorted out as “datasets”
and “resources”
35
36. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Find related data
● related or similar resources
are registered in the same
dataset (e.g. same data, but
different format; same data,
but for differing time periods,
etc.)
36
37. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Find relevant metadata
● title
● description
● unique identifier
● author and maintainer
● license
● website or source page for the data
● groups, tags, organizations
● format (for the resource)
● other (including custom ones)
37
38. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Preview data
● preview a sample of the resource
as a table, chart, map, etc.
● interactive - e.g. tables are sortable
by column, axes in charts can
be configured to any column, etc.
● uses the recline.js data visualization
library
38
39. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Preview data
39
40. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Handle geospatial data
● through the ckanext-spatial extension
● visualize geo data in a map
(e.g. contours of plazas and parks)
● search for data inside a user-defined
bounding box selectable by the user
in a search query
40
41. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
See a dataset’s change history
● track changes to a dataset
● see who did what and when
41
42. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Sort out datasets by organization
● each organization can
manage their own data
in the catalog and authorize
users who can edit
● gets their own page in the
catalog with visibility for the
data they publish
● is also a facet available
for search
42
43. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Sort out datasets into groups
● another way to link related
datasets
● useful for thematic
classification
● is also a facet available for
search
43
44. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Sort out datasets into tags
● free-form user (editor) defined tags
● also for searching
44
45. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Custom themes
● simple customization (colors, layout of main page, portal title, etc.) can be made
through the user interface by the site administrator
● for deeper customization, use the extension programming interface (Python) and
develop custom templates (Jinja2)
45
46. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Extensible
● programming interface
for creating extensions
● extension repository
extensions.ckan.org
● has many extensions with
varying degrees of
maturity
46
James Petts
47. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
FileStore and DataStore
● built-in extensions
● FileStore: allow for uploading files and
store them in CKAN, instead of just
linking to a URL
● DataStore: allow for querying data through
an API, even “joining” data from different
resources
○ also comes with the DataPusher service,
which updates the DataStore on each
file registered
47
DRs Kulturarvsprojekt
48. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Harvesting
● metadata can be harvested from another portal by using the etension ckanex-harvest
● in (configurable) time, data newly catalogued or modified in the source will show up in
the harvesting portal
48
by Martin Pettitt
49. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Feedback
● there are extensions for users
to comment in a specific dataset
● stimulates discussion about and
improvement of data
49
50. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Access by API
● uses http requests (pseudo-RESTful)
● consumes and returns metadata in JSON format
● you can do programmatically any operation you can do
using the UI (e.g., searching)
● by using an access key on the API you can
overcome access throttling limitations
and also do any of the same read and write operations
your user is allowed to do via UI
● useful for processing and cataloguing data in great
volumes (e.g. apply a fix to many datasets in a batch,
include many similar resources in a dataset, etc.)
50
by Andrea Vallejos
52. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Datasets and resources
● resources can be data files, API entry points, query examples, extended data
documentation, etc.
● a resource has exactly one format and URL
● datasets can have one or more resources
● as a general guideline, can be catalogued under the same dataset:
○ resources that are representations of the same data in various formats
○ resources that are about the same data but in different time periods
○ resources that are about the same data but in different regional spans
52
53. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Datasets and resources
● a dataset has
○ a single source (URL for a source page of the data)
○ a single license
○ a single author
○ a single maintainer
○ a single (or none) organization
○ a set of groups that applies to the whole dataset
○ a set of tags that applies to the whole dataset
53
54. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Organizations
● only organization editors (or admins)
can create datasets in it
● users can create datasets in any
organizations for which they are editors
● organization admins can invite existing
or new users for the organization and
assign them a role (member, editor or
administrator)
54
55. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Creating a new dataset
● Click “add a new dataset”
○ on the dataset search screen; or
○ on the organization screen for an organization for which you are an editor
or admin
55
56. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Creating a new dataset
● CKAN will ask for the following basic metadata:
○ title
○ description
○ tags
○ license
○ organization (if you’re editor on
more than one organization)
● when finished, click “Next: add data”
to include resources
56
57. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Including resources
● select “link to file”, “link to an API” or “upload a file” (in case FileStore is
enabled)
● type in name, description and format
● if you have other resources to include,
select “save & add another”
● after including all resources, click
“next: additional info”
57
58. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Additional dataset information
● “visibility”: “public” can be seen by any site visitor; “private” means visible to
members of the organization only
● “author” / “author e-mail”: person or organization responsible for producing
the data
● “maintainer” / “maintainer e-mail”: person or organization technically
responsible for keeping data available
● optional custom fields
● press “finish” to create the dataset
58
60. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
System Architecture
• Usually sits alongside a CMS (e.g. Drupal or Wordpress)
• WGSI Application pluggable to Apache (modwsgi), to nginx, etc.
• PostgreSQL database (metadata, access control, etc.)
• Apache Solr (for indexing and searching)
• Other components (depending on the installed and in-use extensions)
60
61. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Installing CKAN
• Supported operating system:
• Other possible OS’s:
○ Debian
○ CentOS
○ Red Hat
○ Windows (version 1.8 of CKAN)
http://www.hackneyworkshop.com/2012/03/30/ckan-on-windows/
○ OS X
61
62. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Installing CKAN
• Types of installation
○ Ubuntu 12.04 64-bit server package
○ source code
○ using Docker
62
63. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Package install
● Requirements: Ubuntu 12.04 64-bit server
●installs CKAN and DataPusher (for DataStore)
●Steps:
1. Install the CKAN package and its
dependencies
2. Install PostgreSQL and Solr
3. Restart Apache and Nginx
sudo apt-get update
sudo apt-get install -y nginx apache2
libapache2-mod-wsgi libpq5
wget http://packaging.ckan.org/python-
ckan_2.2_amd64.deb
sudo dpkg -i python-ckan_2.2_amd64.deb
sudo apt-get install -y postgresql
solr-jetty
sudo service apache2 restart
sudo service nginx restart
63
64. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Source code install
● sequence of commands depend on operating system
○ detailed instructions for each are available in:
https://github.com/ckan/ckan/wiki/How-to-Install-CKAN
1. install dependency packages
2. install CKAN packages into a Python virtualenv
3. configure Postgres database
4. create a CKAN configuration file (production.ini)
5. configure Solr
6. create database tables
7. configure DataStore (optional)
8. link to who.ini (Repoze.who configuration file)
64
65. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Docker install
● Requirement: have Docker installed and
configured
● set of 3 commands
● Docker downloads images automatically (can
take a long time)
$ docker run -d --name db
ckan/postgresql
$ docker run -d --name solr ckan/solr
$ docker run -d -p 80:80 --link db:db
--link solr:solr ckan/ckan
65
66. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Initial configuration
• Create a site administrator user
paster sysadmin add seanh -c
/etc/ckan/default/production.ini
• Create other users if necessary
• Edit production.ini (for instance to configure the site name)
ckan.site_title = Open data portal
66
67. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Other maintenance commands
• Rebuild search index
paster --plugin=ckan search-index rebuild --
config=/etc/ckan/std/std.ini
• Create and remove users
paster --plugin=ckan user add exampleuser --
config=/etc/ckan/std/std.ini
paster --plugin=ckan user remove exampleuser --
config=/etc/ckan/std/std.ini
67
69. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Simple customization
http://<my-ckan-url>/ckan-admin/config/
● some simple customization changes
can be made through the UI
by the site administrator
○ site title and description
○ color scheme
○ intro text, about text and others
○ custom css
69
70. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
User registration
● by default, user self-registration
is enabled
● to disable (e.g. to avoid spam),
change a flag in .ini file
ckan.auth.create_user_via_web = False
70
71. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Registering new groups and organizations
● by default, creating new
organizations is enabled for all editors
● to disable, change a flag in .ini file
ckan.auth.user_create_organizations = False
● likewise, the same for groups
● note: site admin can always create
groups and organizations regardless
71
72. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Manage users
● look for user in
http://<my-ckan-url>/user/
● when logged in as admin, you
see a “manage” button under
the user profile
● admin can edit profile, change
passwords or delete the user
72
74. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Documentation
http://docs.ckan.org
There are specific manuals
for specific audiences:
● End user (editor)
● Site administrator
● Maintainer
74
75. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Documentation
Also manuals for specific subjects:
● API guide
● Extending guide
● Theming guide
● Contributing guide
by John Haslam
75
76. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Where to get help
On mailing lists:
● CKAN Global User Group
https://groups.google.com/forum/#!forum/ckan-global-user-group
● ckan-dev
https://lists.okfn.org/mailman/listinfo/ckan-dev
by Upupa4me
76
77. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Where to get help
On IRC chat:
server: irc.freenode.net
channel: #ckan
by Garry Knight
77
78. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Where to get help
Paid support:
● hosting with a SLA
● deployment and maintenance
● support, consultancy,
training
by glasseyes view
78
79. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Where to try CKAN
demo.ckan.org
● free for experimentation, cataloguing data
and getting to know CKAN
● content is periodically wiped out
by Horia Varlan
79
80. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Where to register datasets
datahub.io
● community instance
● as an individual, if you
don’t have you own
CKAN, this is an option
● e.g. data that has been
cleaned up as
result of a hackathon
80
81. IV Moscow Urban Forum
CKAN Overview | Augusto Herrmann
Questions?
thank you
спасибо
augusto@okfn.org.br
augusto.herrmann@planejamento.gov.br