Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data presentation Mannheim

502 views

Published on

About Big Data, CBDS, Data Science and the skills needed

Published in: Government & Nonprofit
  • Login to see the comments

Big Data presentation Mannheim

  1. 1. From Big Data to Official Statistics Piet J.H. Daas and all my Big Data colleagues/Data scientists at CBDS 28 Jan., Mannheim Statistics Netherlands Current projects at Statistics Netherlands
  2. 2. Overview 2 • Big Data and Statistics Netherlands • A Big Data based official statistic • Skills needed • Results of other Big Data projects • Some concluding remarks
  3. 3. Statistics Netherlands – Where? 3 Heerlen Den Haag We love Big Data!!
  4. 4. Center for Big Data Statistics (CBDS) • Produce new, real time statistics and enriches and deepens the statistics already produced (such as regional indicators) • Reduce the impact on society (‘response burden’) • Deepens the methodological knowledge and privacy considerations for using Big Data in official statistics • Stimulate cooperation by creating an ecosystem of partners 4
  5. 5. CBDS Scope Data- scouting and data access Ethics and privacy Methodolo gy and data integration Big data in official statistics Social statistics, safety, housing and health Sustainable Development Goals Smart Cities Statistics on Economics internet economy, labour market, energy transition Mobility day time population, traffic flows 5
  6. 6. Why is Big Data important? Big Data has the potential to – Shorter time to publication – Respond to current events – Higher reliability – More detail – More efficient processes Considerations: - Infrastructure - Skills - Culture 6
  7. 7. Big data based official statistics – Big Data can be used for official statistics in several ways 1) As a single source - census like 2) As an additional source - combined with survey data - combined with admin data 3) Other ways - add missing data for some variables and/or units – Road sensor data is used by our office to produce the first Big Data based official statistic! ‐ Use this to illustrate the (new) skills needed!7
  8. 8. Road sensors Road sensor data – Passing vehicle counts for each minute (24/7) by about 60.000 sensors – 20.000 on the Dutch highways – Types of sensors: ‐ Induction loop ‐ Camera ‐ Bluetooth – Large volume: approx. 230 million records/day 8
  9. 9. Dutch highways 9
  10. 10. Dutch highways + road sensors 10 20.000 sensors on highways
  11. 11. Minute data of 1 sensor for 196 days 11
  12. 12. ‘Afsluitdijk’ (IJsselmeer dam) 12
  13. 13. ‘Afsluitdijk’ (IJsselmeer dam) (2)
  14. 14. Overall process (2) Cleaning (1) Transform + Select (3) Estimation (A)Frame 14 -Regional estimates -Month/quarter/year
  15. 15. ‘Reducing’ Big Data Big Data steps (1) (2) (3)
  16. 16. Process steps (1) Transform and Select (2) Cleaning (A) Frame (3) Estimation 16 Skills needed? Skills needed? Skills needed? Skills needed?
  17. 17. Skills needed 17 Data ScienceVenn Diagram
  18. 18. (1) Transform + Select – Convert raw data to more compact data (without information loss) ‐ Remove unneeded data (variables and erroneous records) ‐ Recalculate values ‐ Store as compact as possible ‐ Implement process as efficient as possible – Reduces size > 1000x !! 18 Statistics Statistics IT IT
  19. 19. (2) Cleaning – Check quality of daily sensor data – Correct for missing data – Implement process as efficiently as possible 19 Bayesian filter ( ‘a Kalman filter for semi Poisson process’) IT Statistics Statistics
  20. 20. (A) Frame – Use sensors on main route of Dutch Highways – Project geolocation of sensors on roads – Metadata quality checking and editing – Calculate weights for sensors on road segments 20 Statistics Statistics IT Statistics
  21. 21. (3) Estimation – Calculate number of vehicles per road segment – Calculate traffic intensity per region – Check/compare time series – Adjust extremes where needed (if unexplained) 21 Statistics Statistics Statistics Content
  22. 22. Skills when using Big Data 22 For Big Data we need Data Scientists (statisticians with IT skills!) 1x 10xStatistics Content IT 4x
  23. 23. Data journalism and fast statistics Produced within tw0 days! Produce very rapidly available statistics Traffic reduced by half because of glazed frost 23
  24. 24. Traffic intensity and GDP - GDP - Traffic Traffic precedes GDP! • By 1 quarter Correlation • 91% from 2011- Q2 till 2014-Q4 24
  25. 25. Day time population (mobile phone data) – Hourly changes of mobile phone activity – Only data for areas with > 15 events per hour 25
  26. 26. Social media sentiment Consumer confidence Socialmediasentiment - Correlation > 0.9, Facebook is most important date source (Twitter is the other one) - Including social media in survey based consumer confidence increases precision of estimate
  27. 27. Social unrest indicator (near ‘real time’) 27
  28. 28. Social unrest indicator (2) Year Month Week Day
  29. 29. Cyber security 29 Study DDos attacks in various sources These are all reactions to the attack, not the attack itself
  30. 30. Automatic Identification System data Data of ships (GPS signal) 200 millions records/day world wide Courtesy of Maarten Pouwels 30
  31. 31. New (and fun) indicators 31 ‘Pepernoten’ index: result of data-driven exploratory study on scanner data (Friday afternoon projects) Turn over of ‘cookies’ specific for Saint Nicolas festivities (2015 and 2016: weekly) 31
  32. 32. Spring in the Netherlands 2013 2,5 mean 8 days below zero 2014 8,3 mean 0 days below zero Flowering of the wood anemone 32
  33. 33. 33 Big Data and CBS Sources (bits) ‘Big Data’AdministrativedataSurvey data Statistics(bits) 16,00% 0,62% 13,62% 0,38% 23,95% 14,52% 5,09% 3,07% 3,05% 19,69% scanner data
  34. 34. Concluding remarks – Big Data has potential for official statistics – There is one example, more are on the way – Interesting (first) results but ‐ It is a relatively new area for official statistics, so a lot needs to be checked ‐ People need to get adjusted to the ‘Big Data’ way of working – The skills set of ‘statisticians’ needs to be extended ‐ Programming and optimization – Definite need for a methodological foundation ‐ Population view ‐ Interpret and asses data-driven results 34
  35. 35. Big Data !!! 35
  36. 36. The Future 36 The future of statistics looks BIG
  37. 37. Thank you for your attention!@pietdaas
  38. 38. Questions? 38

×