More Related Content

Similar to Public Data.


Public Data.

  1. Hidden data treasures. How we can make more government data available for re-use without sacrificing the privacy of citizens? How to contact me:  Harald Groven Web developer (actually database guy) [email_address]
  2. CC Suzannelong
  3. Public data: 150 years of no development? This is how public data was presented more than a century ago: Source: Eilert Sundt Om dødeligheden i Norge 1855
  4. How the kind of data set is being presented 150 years later... Any progress? (except change of font?)
  5. Government data published after 175 years! Census of 1801
  6. Mashup of household register of 1865, 1886 and 2005 red dots = 2005 map = Friis 1861 New technology makes it possible to do visualizations and analyses not imaginable when data was collected
  8. Recommended reading:  The cultural & democratic impact of statistics on the public sphere   Sarah Igo  The Averaged American: Surveys, Citizens, and the Making of a Mass Public.  Harvard UP 2007 Usefulness / uselessness of aggregates ? In the rest of the talk, I will argue that aggregates are mostly useless, but in fact they are not... good starting point
  9. Usefulness of disaggregation Key concepts of data warehousing Rollup = Aggregate   up one level Drill down = Disaggregate   down one level Slice = Change variable Visual example  thanks Obama & Vivek!  Public spending in the US   Text example  thanks to NSD, Bergen % of students not passing exam 2005-09  
  10. Practical reason for not publishing disaggregated source data in pre-computer age: Space and cost !  Image: CC Harald Groven
  11. Finding the needle in haystack of unaggregated data: Data warehouse. Data cube, visualized in 3D
  13. Raw data from Statistics Norway
  15. What kind of statistical data are published? High level aggregates: Accessible   Medium level aggregates (e.g. municipality level): Sometimes   Low level aggregates, untraceable to identifiable persons:  " grey zone ", accessible but largely unknown   Anonymized raw data: Inaccessible for 100 years, some cases available for research! Raw data, unanonymized: Inaccessible for 100 years!
  16. Project: Display wage statistics on 500 of our web pages
  17. Creating a ?  Require government agencies to publish anonymised data - Statistician's method: 1%, 10% samples? - CS method: Randomize variables so that the data set have the same statistical aggregates - Easy method: Publish data sets or each value, with a threshold value (e.g. 3 persons) to avoid tracing ID. - Use categories, not uncoded values.
  21. CC Macwagon Government resources for making use of public data?