Are we there yet?
What?
An Open (Govt.) Data Monitoring Tool


–   Metadata Quality and Consistency
–   Benchmarking: Who fixed what and how fast?
–   Is the data still there?
Why?
●
    Dangling URLs into Nirvana
    –   Data is meant to stay
●
    (Meta-)Data is required to be consistent in order
    to be useful
●
    Tendency to give without monitoring
    –   Decoupled Metadata from Data
    –   Question of responsibility
How?
●
    Watcher
    –   Get all metadata from CKAN data portal (legacy API calls)
    –   Analyse metadata and URLs
    –   Write result into staging database (SQL)
    –   Watch for new / changed datasets
●
    Analyser
    –   Perform analysis on staging area (partly long-running and tedious), write result into RedisDB
        ●
            Who has the most data released? EASY!
        ●
            Who uploaded when which datasets?
        ●
            Who fixed the most mistakes during the last week?
        ●
            Who has the longest outstanding bugs?
        ●
            Which datasets are no more available?
How? ctd.
●
    Presentation
    –   Make some fancy display from the Redis results
    –   Data drill-down
    –

    –   What else?
Architecture
●
    Heroku PaaS
●
    PostgreSQL data store
●
    Redis for ephemeral data
●
    Application logic in Go
●
    Front-end using Bootstrap & AngularJS
What's there
●
    Metadata spec machine readable
    http://htmlpreview.github.io/?https://github.com/the42/ogdat/blob/master/ppogdatspec/ogdat_s

    (automated conversion process from PDF [sic!])



●
    Watcher stable
●
    Analyser work in progress
●
    Presentation layer: HELP
Show me and I believe
●
    Uhm … nothing fancy yet
●
    Business logic & server processes


●
    Source: https://github.com/the42/ogdat/
Lessons learned




●
    There are many (minor) issues with metadata
●
    Heroku is easy to get going
●
    Go as a novel language is easy to develop in
    –   Built-in concurrency features come in handy when
        checking eg. Urls in parallel
●
    CKAN API@data.gv.at is not that fast and times
Contact
    Johann Höchtl
    johann.hoechtl@gmail.com
    @myprivate42
    http://www.slideshare.net/jhoechtl/
    https://www.facebook.com/myprivate42
●

Are we there yet?

  • 1.
  • 2.
    What? An Open (Govt.)Data Monitoring Tool – Metadata Quality and Consistency – Benchmarking: Who fixed what and how fast? – Is the data still there?
  • 3.
    Why? ● Dangling URLs into Nirvana – Data is meant to stay ● (Meta-)Data is required to be consistent in order to be useful ● Tendency to give without monitoring – Decoupled Metadata from Data – Question of responsibility
  • 4.
    How? ● Watcher – Get all metadata from CKAN data portal (legacy API calls) – Analyse metadata and URLs – Write result into staging database (SQL) – Watch for new / changed datasets ● Analyser – Perform analysis on staging area (partly long-running and tedious), write result into RedisDB ● Who has the most data released? EASY! ● Who uploaded when which datasets? ● Who fixed the most mistakes during the last week? ● Who has the longest outstanding bugs? ● Which datasets are no more available?
  • 5.
    How? ctd. ● Presentation – Make some fancy display from the Redis results – Data drill-down – – What else?
  • 6.
    Architecture ● Heroku PaaS ● PostgreSQL data store ● Redis for ephemeral data ● Application logic in Go ● Front-end using Bootstrap & AngularJS
  • 7.
    What's there ● Metadata spec machine readable http://htmlpreview.github.io/?https://github.com/the42/ogdat/blob/master/ppogdatspec/ogdat_s (automated conversion process from PDF [sic!]) ● Watcher stable ● Analyser work in progress ● Presentation layer: HELP
  • 8.
    Show me andI believe ● Uhm … nothing fancy yet ● Business logic & server processes ● Source: https://github.com/the42/ogdat/
  • 9.
    Lessons learned ● There are many (minor) issues with metadata ● Heroku is easy to get going ● Go as a novel language is easy to develop in – Built-in concurrency features come in handy when checking eg. Urls in parallel ● CKAN API@data.gv.at is not that fast and times
  • 10.
    Contact Johann Höchtl johann.hoechtl@gmail.com @myprivate42 http://www.slideshare.net/jhoechtl/ https://www.facebook.com/myprivate42 ●