Hidden data treasures. How we can make more government data available for re-use without sacrificing the privacy of citizens? How to contact me: Harald Groven Web developer (actually database guy) [email_address] twitter.com/kongharald
Mashup of household register of 1865, 1886 and 2005 red dots = 2005 map = Friis 1861 New technology makes it possible to do visualizations and analyses not imaginable when data was collected
Recommended reading: The cultural & democratic impact of statistics on the public sphere Sarah Igo The Averaged American: Surveys, Citizens, and the Making of a Mass Public. Harvard UP 2007 Usefulness / uselessness of aggregates ? In the rest of the talk, I will argue that aggregates are mostly useless, but in fact they are not... good starting point
Usefulness of disaggregation Key concepts of data warehousing Rollup = Aggregate up one level Drill down = Disaggregate down one level Slice = Change variable Visual example thanks Obama & Vivek! Public spending in the US Text example thanks to NSD, Bergen % of students not passing exam 2005-09
Practical reason for not publishing disaggregated source data in pre-computer age: Space and cost ! Image: CC Harald Groven
Finding the needle in haystack of unaggregated data: Data warehouse. Data cube, visualized in 3D
What kind of statistical data are published? High level aggregates: Accessible Medium level aggregates (e.g. municipality level): Sometimes Low level aggregates, untraceable to identifiable persons: " grey zone ", accessible but largely unknown Anonymized raw data: Inaccessible for 100 years, some cases available for research! Raw data, unanonymized: Inaccessible for 100 years!
Creating a data.gov.no ? Require government agencies to publish anonymised data - Statistician's method: 1%, 10% samples? - CS method: Randomize variables so that the data set have the same statistical aggregates - Easy method: Publish data sets or each value, with a threshold value (e.g. 3 persons) to avoid tracing ID. - Use categories, not uncoded values.
CC Macwagon Government resources for making use of public data?