What?An Open (Govt.) Data Monitoring Tool– Metadata Quality and Consistency– Benchmarking: Who fixed what and how fast?– Is the data still there?
Why?● Dangling URLs into Nirvana – Data is meant to stay● (Meta-)Data is required to be consistent in order to be useful● Tendency to give without monitoring – Decoupled Metadata from Data – Question of responsibility
How?● Watcher – Get all metadata from CKAN data portal (legacy API calls) – Analyse metadata and URLs – Write result into staging database (SQL) – Watch for new / changed datasets● Analyser – Perform analysis on staging area (partly long-running and tedious), write result into RedisDB ● Who has the most data released? EASY! ● Who uploaded when which datasets? ● Who fixed the most mistakes during the last week? ● Who has the longest outstanding bugs? ● Which datasets are no more available?
How? ctd.● Presentation – Make some fancy display from the Redis results – Data drill-down – – What else?
Architecture● Heroku PaaS● PostgreSQL data store● Redis for ephemeral data● Application logic in Go● Front-end using Bootstrap & AngularJS
Whats there● Metadata spec machine readable http://htmlpreview.github.io/?https://github.com/the42/ogdat/blob/master/ppogdatspec/ogdat_s (automated conversion process from PDF [sic!])● Watcher stable● Analyser work in progress● Presentation layer: HELP
Show me and I believe● Uhm … nothing fancy yet● Business logic & server processes● Source: https://github.com/the42/ogdat/
Lessons learned● There are many (minor) issues with metadata● Heroku is easy to get going● Go as a novel language is easy to develop in – Built-in concurrency features come in handy when checking eg. Urls in parallel● CKAN API@data.gv.at is not that fast and times
Contact Johann Höchtl email@example.com @myprivate42 http://www.slideshare.net/jhoechtl/ https://www.facebook.com/myprivate42●