  1. Hidden data treasures. How we can make more government data available for re-use without sacrificing the privacy of citizens? How to contact me:  Harald Groven Web developer (actually database guy) [email_address]
  3. Public data: 150 years of no development? This is how public data was presented more than a century ago: Source: Eilert Sundt Om dødeligheden i Norge 1855
  4. How the kind of data set is being presented 150 years later... Any progress? (except change of font?)
  6. Mashup of household register of 1865, 1886 and 2005 red dots = 2005 map = Friis 1861 New technology makes it possible to do visualizations and analyses not imaginable when data was collected
  8. Recommended reading:  The cultural & democratic impact of statistics on the public sphere   Sarah Igo  The Averaged American: Surveys, Citizens, and the Making of a Mass Public.  Harvard UP 2007 Usefulness / uselessness of aggregates ? In the rest of the talk, I will argue that aggregates are mostly useless, but in fact they are not... good starting point
  9. Usefulness of disaggregation Key concepts of data warehousing Rollup = Aggregate   up one level Drill down = Disaggregate   down one level Slice = Change variable Visual example  thanks Obama & Vivek!  Public spending in the US   Text example  thanks to NSD, Bergen % of students not passing exam 2005-09  
  15. What kind of statistical data are published? High level aggregates: Accessible   Medium level aggregates (e.g. municipality level): Sometimes   Low level aggregates, untraceable to identifiable persons:  " grey zone ", accessible but largely unknown   Anonymized raw data: Inaccessible for 100 years, some cases available for research! Raw data, unanonymized: Inaccessible for 100 years!
  17. Creating a ?  Require government agencies to publish anonymised data - Statistician's method: 1%, 10% samples? - CS method: Randomize variables so that the data set have the same statistical aggregates - Easy method: Publish data sets or each value, with a threshold value (e.g. 3 persons) to avoid tracing ID. - Use categories, not uncoded values.
