Are we there yet?

630 views
539 views

Published on

An Open Data Metadata quality checker

Published in: Education, Technology
2 Comments
0 Likes
Statistics
Notes
  • Hi rossdjones, I choose a brute force approach right now, if the API takes too long and times out, I iterate a maximum of three times when fetching data. Seems like some queries take some time and require the database / engine to warm up, what ever, but that way I reliably could get all the data. Thank you for your advice!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I don't know if this would help you workaround issues with the API timing out, but at data.gov.uk we provide a data-dump in JSON every week - http://data.gov.uk/data/dumps/. Perhaps this might help make the analysis easier?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
630
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
1
Comments
2
Likes
0
Embeds 0
No embeds

No notes for slide

Are we there yet?

  1. 1. Are we there yet?
  2. 2. What?An Open (Govt.) Data Monitoring Tool– Metadata Quality and Consistency– Benchmarking: Who fixed what and how fast?– Is the data still there?
  3. 3. Why?● Dangling URLs into Nirvana – Data is meant to stay● (Meta-)Data is required to be consistent in order to be useful● Tendency to give without monitoring – Decoupled Metadata from Data – Question of responsibility
  4. 4. How?● Watcher – Get all metadata from CKAN data portal (legacy API calls) – Analyse metadata and URLs – Write result into staging database (SQL) – Watch for new / changed datasets● Analyser – Perform analysis on staging area (partly long-running and tedious), write result into RedisDB ● Who has the most data released? EASY! ● Who uploaded when which datasets? ● Who fixed the most mistakes during the last week? ● Who has the longest outstanding bugs? ● Which datasets are no more available?
  5. 5. How? ctd.● Presentation – Make some fancy display from the Redis results – Data drill-down – – What else?
  6. 6. Architecture● Heroku PaaS● PostgreSQL data store● Redis for ephemeral data● Application logic in Go● Front-end using Bootstrap & AngularJS
  7. 7. Whats there● Metadata spec machine readable http://htmlpreview.github.io/?https://github.com/the42/ogdat/blob/master/ppogdatspec/ogdat_s (automated conversion process from PDF [sic!])● Watcher stable● Analyser work in progress● Presentation layer: HELP
  8. 8. Show me and I believe● Uhm … nothing fancy yet● Business logic & server processes● Source: https://github.com/the42/ogdat/
  9. 9. Lessons learned● There are many (minor) issues with metadata● Heroku is easy to get going● Go as a novel language is easy to develop in – Built-in concurrency features come in handy when checking eg. Urls in parallel● CKAN API@data.gv.at is not that fast and times
  10. 10. Contact Johann Höchtl johann.hoechtl@gmail.com @myprivate42 http://www.slideshare.net/jhoechtl/ https://www.facebook.com/myprivate42●

×