Are we there yet?
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Are we there yet?

on

  • 518 views

An Open Data Metadata quality checker

An Open Data Metadata quality checker

Statistics

Views

Total Views
518
Views on SlideShare
511
Embed Views
7

Actions

Likes
0
Downloads
0
Comments
2

2 Embeds 7

https://twitter.com 6
http://myprivate42.wordpress.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Hi rossdjones, I choose a brute force approach right now, if the API takes too long and times out, I iterate a maximum of three times when fetching data. Seems like some queries take some time and require the database / engine to warm up, what ever, but that way I reliably could get all the data. Thank you for your advice!
    Are you sure you want to
    Your message goes here
    Processing…
  • I don't know if this would help you workaround issues with the API timing out, but at data.gov.uk we provide a data-dump in JSON every week - http://data.gov.uk/data/dumps/. Perhaps this might help make the analysis easier?
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Are we there yet? Presentation Transcript

  • 1. Are we there yet?
  • 2. What?An Open (Govt.) Data Monitoring Tool– Metadata Quality and Consistency– Benchmarking: Who fixed what and how fast?– Is the data still there?
  • 3. Why?● Dangling URLs into Nirvana – Data is meant to stay● (Meta-)Data is required to be consistent in order to be useful● Tendency to give without monitoring – Decoupled Metadata from Data – Question of responsibility
  • 4. How?● Watcher – Get all metadata from CKAN data portal (legacy API calls) – Analyse metadata and URLs – Write result into staging database (SQL) – Watch for new / changed datasets● Analyser – Perform analysis on staging area (partly long-running and tedious), write result into RedisDB ● Who has the most data released? EASY! ● Who uploaded when which datasets? ● Who fixed the most mistakes during the last week? ● Who has the longest outstanding bugs? ● Which datasets are no more available?
  • 5. How? ctd.● Presentation – Make some fancy display from the Redis results – Data drill-down – – What else?
  • 6. Architecture● Heroku PaaS● PostgreSQL data store● Redis for ephemeral data● Application logic in Go● Front-end using Bootstrap & AngularJS
  • 7. Whats there● Metadata spec machine readable http://htmlpreview.github.io/?https://github.com/the42/ogdat/blob/master/ppogdatspec/ogdat_s (automated conversion process from PDF [sic!])● Watcher stable● Analyser work in progress● Presentation layer: HELP
  • 8. Show me and I believe● Uhm … nothing fancy yet● Business logic & server processes● Source: https://github.com/the42/ogdat/
  • 9. Lessons learned● There are many (minor) issues with metadata● Heroku is easy to get going● Go as a novel language is easy to develop in – Built-in concurrency features come in handy when checking eg. Urls in parallel● CKAN API@data.gv.at is not that fast and times
  • 10. Contact Johann Höchtl johann.hoechtl@gmail.com @myprivate42 http://www.slideshare.net/jhoechtl/ https://www.facebook.com/myprivate42●