ckan 2.0: Harvesting from other sources

2,196 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,196
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
26
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

ckan 2.0: Harvesting from other sources

  1. 1. ckan 2.0: Harvesting from other sources Internship @ Academia Sinica Report #3 Presenter: Cheng-Jen Lee (Sol) Email: cjlee AT iis.sinica.edu.tw This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Taiwan License.
  2. 2. Agenda ● Harvesters – Usage: manually and automatically – Custom harvester – some issues ● Linked Data and RDF Oct 13, 2014 2
  3. 3. Harvesters ● ckanext-harvest – Remote harvesting extension ● Source Type – CSW – csv/xls – WAF – custom Oct 13, 2014 3
  4. 4. Harvesters Oct 13, 2014 4
  5. 5. Harvesters ● Usage (manually) – (pyenv) $ paster --plugin=ckanext-harvest harvester gather_consumer -c /etc/ckan/default/production.ini – (pyenv) $ paster --plugin=ckanext-harvest harvester fetch_consumer -c /etc/ckan/default/production.ini – (pyenv) $ paster --plugin=ckanext-harvest harvester run -c /etc/ckan/default/production.ini Oct 13, 2014 5
  6. 6. Harvesters ● Usage (automatically) – Supervisor (for gather & fetch consumer) – Cron (for run) ● Supervisor (with profile) – $ sudo supervisorctl reread – $ sudo supervisorctl add ckan_gather_consumer – $ sudo supervisorctl add ckan_fetch_consumer – $ sudo supervisorctl start ckan_gather_consumer – $ sudo supervisorctl start ckan_fetch_consumer Oct 13, 2014 6
  7. 7. Harvesters ● Custom harvester – We can implement the harvester interface to perform harvesting operations – The process take place on three steps: ● gather: get the identification ● fetch: fetch the contents ● import: create ckan package(dataset) – Implementation ● https://github.com/u10313335/ckanext-harvest/ blob/master/ckanext/harvest/harvesters/srda harvester.py Oct 13, 2014 7
  8. 8. Harvesters ● Harvesting Interface from ckan.plugins.core import SingletonPlugin, implements from ckanext.harvest.interfaces import IHarvester class MyHarvester(SingletonPlugin): implements(IHarvester) def get_original_url(self, harvest_object_id): :param harvest_object_id: HarvestObject id :returns: A string with the URL to the original document def gather_stage(self, harvest_job): :param harvest_job: HarvestJob object :returns: A list of HarvestObject ids def fetch_stage(self, harvest_object): :param harvest_object: HarvestObject object :returns: True if everything went right, False if errors were found def import_stage(self, harvest_object): Oct 13, 2014 8 :param harvest_object: HarvestObject object :returns: True if everything went right, False if errors were found
  9. 9. Harvesters ● Some issues – Title with non-ASCII characters – Useless update check – TGOS CSW: failed in gather stage ● Caused by OWSLib – Harvest source varies ● We should modified the extension for properly harvesting ● Modified version available – On Github: https://github.com/u10313335/ckanext-harvest Oct 13, 2014 9
  10. 10. Linked Data and RDF ● Resource Description Framework – a family of W3C specifications – a metadata data model – based on XML, URI Oct 13, 2014 10 Source: http://techserviceslibrary.blogspot.tw/2011/04/rdf-resource-description.html
  11. 11. Linked Data and RDF ● Vocabularies – DCAT and Dublin Core ● Two way to get RDF metadata – curl -L -H "Accept:application/rdf+xml" http://thedatahub.org/dataset/gold-prices – curl -L http://thedatahub.org/dataset/gold-prices. rdf Oct 13, 2014 11
  12. 12. Documents ● Read the Docs: – https://readthedocs.org/projects/ckan-docs-tw/ Oct 13, 2014 12
  13. 13. Thanks for your attention! Any Q? Oct 13, 2014 13

×