SlideShare a Scribd company logo
1 of 30
Edwin de Jonge
In cooperation with Piet Daas, Martijn Tennekes, Marco Puts, Alex Priem
Big Data
Case studies in Official Statistics
From a Official Statistics point of view
Three types of data:
1. Survey data = data collected by SN
with questionnaires
2. Admin data = administrative (register) data
collected by third parties such
as theTax Office
3. Big data = machine generated
data of events
2
Big Data case studies
Big data = machine generated data of events
3
Source Statistics
Social media Sentiment (as indicator for
business cycle)
Mobile phone data Daytime population, tourism
statistics
Traffic loops Traffic index statistics
At the end of this talk:
Visualization methods for Big Data
Overview
4
• Big Data
• Research ‘theme’ at Stat. Netherlands
• Data driven approach
•Visualization as a tool
•Why?
•Examples in our office
• Issues & challenges
• From an official statistical perspective
Big data approach
5
Case study 1: Social media
– 3 billion messages as of 2009 gathered from Facebook,
Twitter, LinkedIn, Google+ by a Dutch intermediate
companyCoosto.
– Sentiment per message determined by classifying words
as negative or positive.
– Could be used as indicator for the business cycle. Could it
be fit to the consumer confidence, the leading business
cycle indicator?
6
Sentiment in social media
7
Platform specific sentiment
8
Table 1. Social media messages properties for various platforms and their correlation with consumer confidence
Correlation coefficient of
Social media platform Number of social Number of messages as monthly sentiment index and
media messages1
percentage of total (%) consumer confidence ( r )2
All platforms combined 3,153,002,327 100 0.75 0.78
Facebook 334,854,088 10.6 0.81* 0.85*
Twitter 2,526,481,479 80.1 0.68 0.70
Hyves 45,182,025 1.4 0.50 0.58
News sites 56,027,686 1.8 0.37 0.26
Blogs 48,600,987 1.5 0.25 0.22
Google+ 644,039 0.02 -0.04 -0.09
Linkedin 565,811 0.02 -0.23 -0.25
Youtube 5,661,274 0.2 -0.37 -0.41
Forums 134,98,938 4.3 -0.45 -0.49
1
period covered June 2010 untill November 2013
2
confirmed by visual inspecting scatterplots and additional checks (see text)
*cointegrated
Platform specific results
Granger causality reveals that Consumer Confidence precedes
Facebook sentiment ! (p-value < 0.001)
9
Case study 2: mobile phone metadata
– Pilot study with a cell phone provider with market share
of 1/3 in the Netherlands.
– Aggregated data is queried by intermediate company
Mezuro and delivered to SN. Privacy is guaranteed!
– Applications: daytime population, tourism statistics,
economic activity, mobility studies, etcetera.
10
Mobile phone population
11
MPRD (Municipal Personal Records Database) = Dutch population
Subpopulations model
12
Mobile phone metadata
weighted to the MPRD.
MPRD data &
Education Registers. MPRD data only.
Mobile phone metadata
13
Event Datail Records (EDR) contain metadata on mobile phone events (i.e. call,
SMS or data transfer).
Aggregated table: number of unique devices X time period X current region X
residential region.
Daytime population results
14
Almere: commuter town?
Foreigners at SchipholAirport
Dutch
population
totals
Case study 3: Traffic loops
Traffic loop data
‐ Each minute (24/7) the number of passing vehicles is
counted in around 20.000 ‘loops’ in the Netherlands
(100 million records a day)
‐ Nice data source for transport and traffic statistics
(and more)15
Traffic loops on main roads
16A close look at the highways around Utrecht
Traffic loops on main roads (2)
17Traffic loops everywhere…
Traffic loops on main roads (3)
18Highways simplified for analysis
Raw data: Total number of vehicles a day
19
Time (hour)
Correct for missing data: macro level
Sliding window of 5 min. Impute missing data.
Before After
Total = ~ 295 million detected vehicles Total = ~ 330 million (+ 12%)
detected vehicles
20
Data by type of vehicle
21
Small vehicles (<= 5.6 meter)
Medium vehicles (> 5.6 & <= 12.2 meter)
Long vehicles (> 12.2 meter)
All Dutch vehicles in September
Selectivity of big data
– Big Data sources may be selective when
‐ Only part of the population contributes to the data set (e.g. mobile phone
owners)
‐ The measurement mechanism is selective (e.g. traffic loops placement on
Dutch highways is not random)
– Many Big Data sources contain events
‐ How to associate events with units?
‐ Number of events per unit may vary.
– Correcting for selectivity
‐ Background characteristics – or features – are needed (linking with registers;
profiling)
‐ Use predictive modeling / machine learning to produce population estimates
23
Why Visualization?
October 1st 2013, Statistics Netherlands
Effective Display!
(seeTor Norretranders, “Band width of our senses)
Visualization of Big Data
– Large volume:
‐ Data binning or aggregation
– High velocity:
‐ Animations
‐ Dashboard / small multiples
– Large variety:
‐ Interactive interface
‐ Advanced visualization methods
26
Tableplot: Dutch (Virtual) Census
27
October 1st 2013, Statistics Netherlands
Heat map: Age vs. ‘Income’
16
Age
Income(euro)
October 1st 2013, Statistics Netherlands
17
amount
mount
Questions?
30

More Related Content

What's hot

Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City ApplicationsAmit Sheth
 
EGOV / ePart 2015 - Policy Compass Workshop Presentation
EGOV / ePart 2015 - Policy Compass Workshop PresentationEGOV / ePart 2015 - Policy Compass Workshop Presentation
EGOV / ePart 2015 - Policy Compass Workshop PresentationPolicyCompass
 
Dealing with Open Data in Istat
Dealing with Open Data in IstatDealing with Open Data in Istat
Dealing with Open Data in IstatGiovanni Barbieri
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statisticsEdwin de Jonge
 
A proposed model_for_cybercrime_detectio
A proposed model_for_cybercrime_detectioA proposed model_for_cybercrime_detectio
A proposed model_for_cybercrime_detectioHossam Al-Ansary
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentationPiet J.H. Daas
 

What's hot (8)

EW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and SolutionsEW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and Solutions
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
EGOV / ePart 2015 - Policy Compass Workshop Presentation
EGOV / ePart 2015 - Policy Compass Workshop PresentationEGOV / ePart 2015 - Policy Compass Workshop Presentation
EGOV / ePart 2015 - Policy Compass Workshop Presentation
 
Dealing with Open Data in Istat
Dealing with Open Data in IstatDealing with Open Data in Istat
Dealing with Open Data in Istat
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
Suds summary
Suds summarySuds summary
Suds summary
 
A proposed model_for_cybercrime_detectio
A proposed model_for_cybercrime_detectioA proposed model_for_cybercrime_detectio
A proposed model_for_cybercrime_detectio
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
 

Viewers also liked

Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Impetus Technologies
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataRichard Vidgen
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use CasesDeZyre
 
Big Data & Analytics for Government - Case Studies
Big Data & Analytics for Government - Case StudiesBig Data & Analytics for Government - Case Studies
Big Data & Analytics for Government - Case StudiesJohn Palfreyman
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBlueData, Inc.
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 

Viewers also liked (12)

Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Big Data & Analytics for Government - Case Studies
Big Data & Analytics for Government - Case StudiesBig Data & Analytics for Government - Case Studies
Big Data & Analytics for Government - Case Studies
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 

Similar to Big data experiments

Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaPiet J.H. Daas
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 
P. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsP. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsIstituto nazionale di statistica
 
What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?Vincenzo Patruno
 
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenPiet J.H. Daas
 
From eGov 2.0 to eGov 3.0: The Research Agenda
From eGov 2.0 to eGov 3.0: The Research AgendaFrom eGov 2.0 to eGov 3.0: The Research Agenda
From eGov 2.0 to eGov 3.0: The Research Agendasamossummit
 
A forecasting of stock trading price using time series information based on b...
A forecasting of stock trading price using time series information based on b...A forecasting of stock trading price using time series information based on b...
A forecasting of stock trading price using time series information based on b...IJECEIAES
 
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...FIWARE
 
Age Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big DataAge Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big DataAgeFriendlyEconomy
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsPiet J.H. Daas
 
Power from big data - Are Europe's utilities ready for the age of data?
Power from big data - Are Europe's utilities ready for the age of data?Power from big data - Are Europe's utilities ready for the age of data?
Power from big data - Are Europe's utilities ready for the age of data?Steve Bray
 
An overview of big data analysis
An overview of big data analysisAn overview of big data analysis
An overview of big data analysisjournalBEEI
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation MannheimPiet J.H. Daas
 
Smart Policies: Uso de las TIC para mejorar la estructuración de políticas p...
 Smart Policies: Uso de las TIC para mejorar la estructuración de políticas p... Smart Policies: Uso de las TIC para mejorar la estructuración de políticas p...
Smart Policies: Uso de las TIC para mejorar la estructuración de políticas p...Comisión de Regulación de Comunicaciones
 
eGovernance Research Grand Challenges
eGovernance Research Grand ChallengeseGovernance Research Grand Challenges
eGovernance Research Grand ChallengesYannis Charalabidis
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)Sonu Gupta
 

Similar to Big data experiments (20)

Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
P. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsP. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European Statistics
 
What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?
 
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in Eindhoven
 
From eGov 2.0 to eGov 3.0: The Research Agenda
From eGov 2.0 to eGov 3.0: The Research AgendaFrom eGov 2.0 to eGov 3.0: The Research Agenda
From eGov 2.0 to eGov 3.0: The Research Agenda
 
A forecasting of stock trading price using time series information based on b...
A forecasting of stock trading price using time series information based on b...A forecasting of stock trading price using time series information based on b...
A forecasting of stock trading price using time series information based on b...
 
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Vtt intelligent data analytics - Ville Könönen
Vtt intelligent data analytics - Ville KönönenVtt intelligent data analytics - Ville Könönen
Vtt intelligent data analytics - Ville Könönen
 
Age Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big DataAge Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big Data
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
 
Power from big data - Are Europe's utilities ready for the age of data?
Power from big data - Are Europe's utilities ready for the age of data?Power from big data - Are Europe's utilities ready for the age of data?
Power from big data - Are Europe's utilities ready for the age of data?
 
An overview of big data analysis
An overview of big data analysisAn overview of big data analysis
An overview of big data analysis
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation Mannheim
 
Smart Policies: Uso de las TIC para mejorar la estructuración de políticas p...
 Smart Policies: Uso de las TIC para mejorar la estructuración de políticas p... Smart Policies: Uso de las TIC para mejorar la estructuración de políticas p...
Smart Policies: Uso de las TIC para mejorar la estructuración de políticas p...
 
eGovernance Research Grand Challenges
eGovernance Research Grand ChallengeseGovernance Research Grand Challenges
eGovernance Research Grand Challenges
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)
 
Presentation by Carmen Raluca, EC (ENG) Second SIGMA Regional ENP East Confe...
Presentation by Carmen Raluca, EC (ENG)  Second SIGMA Regional ENP East Confe...Presentation by Carmen Raluca, EC (ENG)  Second SIGMA Regional ENP East Confe...
Presentation by Carmen Raluca, EC (ENG) Second SIGMA Regional ENP East Confe...
 

More from Edwin de Jonge

Validatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rulesValidatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rulesEdwin de Jonge
 
Data error! But where?
Data error! But where?Data error! But where?
Data error! But where?Edwin de Jonge
 
Daff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frameDaff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frameEdwin de Jonge
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text filesEdwin de Jonge
 
Uncertainty visualisation
Uncertainty visualisationUncertainty visualisation
Uncertainty visualisationEdwin de Jonge
 
Heatmaps best practices Strata Hadoop
Heatmaps best practices Strata HadoopHeatmaps best practices Strata Hadoop
Heatmaps best practices Strata HadoopEdwin de Jonge
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014Edwin de Jonge
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data VisualizationEdwin de Jonge
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsEdwin de Jonge
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataEdwin de Jonge
 
Statmine, Visuele dataexploratie
Statmine, Visuele dataexploratieStatmine, Visuele dataexploratie
Statmine, Visuele dataexploratieEdwin de Jonge
 
StatMine (New Technologies and Techniques for Statistics)
StatMine (New Technologies and Techniques for Statistics)StatMine (New Technologies and Techniques for Statistics)
StatMine (New Technologies and Techniques for Statistics)Edwin de Jonge
 
StatMine, visual exploration of output data
StatMine, visual exploration of output dataStatMine, visual exploration of output data
StatMine, visual exploration of output dataEdwin de Jonge
 

More from Edwin de Jonge (15)

sdcSpatial user!2019
sdcSpatial user!2019sdcSpatial user!2019
sdcSpatial user!2019
 
Validatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rulesValidatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rules
 
Data error! But where?
Data error! But where?Data error! But where?
Data error! But where?
 
Daff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frameDaff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frame
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text files
 
Uncertainty visualisation
Uncertainty visualisationUncertainty visualisation
Uncertainty visualisation
 
Heatmaps best practices Strata Hadoop
Heatmaps best practices Strata HadoopHeatmaps best practices Strata Hadoop
Heatmaps best practices Strata Hadoop
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
 
StatMine
StatMineStatMine
StatMine
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large data
 
Statmine, Visuele dataexploratie
Statmine, Visuele dataexploratieStatmine, Visuele dataexploratie
Statmine, Visuele dataexploratie
 
StatMine (New Technologies and Techniques for Statistics)
StatMine (New Technologies and Techniques for Statistics)StatMine (New Technologies and Techniques for Statistics)
StatMine (New Technologies and Techniques for Statistics)
 
StatMine, visual exploration of output data
StatMine, visual exploration of output dataStatMine, visual exploration of output data
StatMine, visual exploration of output data
 

Recently uploaded

Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechNewman George Leech
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Tina Ji
 
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...lizamodels9
 
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneVIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc.../:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...lizamodels9
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfmuskan1121w
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 

Recently uploaded (20)

Best Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting PartnershipBest Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting Partnership
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman Leech
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
 
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
 
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneVIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
 
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc.../:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdf
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 

Big data experiments

  • 1. Edwin de Jonge In cooperation with Piet Daas, Martijn Tennekes, Marco Puts, Alex Priem Big Data Case studies in Official Statistics
  • 2. From a Official Statistics point of view Three types of data: 1. Survey data = data collected by SN with questionnaires 2. Admin data = administrative (register) data collected by third parties such as theTax Office 3. Big data = machine generated data of events 2
  • 3. Big Data case studies Big data = machine generated data of events 3 Source Statistics Social media Sentiment (as indicator for business cycle) Mobile phone data Daytime population, tourism statistics Traffic loops Traffic index statistics At the end of this talk: Visualization methods for Big Data
  • 4. Overview 4 • Big Data • Research ‘theme’ at Stat. Netherlands • Data driven approach •Visualization as a tool •Why? •Examples in our office • Issues & challenges • From an official statistical perspective
  • 6. Case study 1: Social media – 3 billion messages as of 2009 gathered from Facebook, Twitter, LinkedIn, Google+ by a Dutch intermediate companyCoosto. – Sentiment per message determined by classifying words as negative or positive. – Could be used as indicator for the business cycle. Could it be fit to the consumer confidence, the leading business cycle indicator? 6
  • 9. Table 1. Social media messages properties for various platforms and their correlation with consumer confidence Correlation coefficient of Social media platform Number of social Number of messages as monthly sentiment index and media messages1 percentage of total (%) consumer confidence ( r )2 All platforms combined 3,153,002,327 100 0.75 0.78 Facebook 334,854,088 10.6 0.81* 0.85* Twitter 2,526,481,479 80.1 0.68 0.70 Hyves 45,182,025 1.4 0.50 0.58 News sites 56,027,686 1.8 0.37 0.26 Blogs 48,600,987 1.5 0.25 0.22 Google+ 644,039 0.02 -0.04 -0.09 Linkedin 565,811 0.02 -0.23 -0.25 Youtube 5,661,274 0.2 -0.37 -0.41 Forums 134,98,938 4.3 -0.45 -0.49 1 period covered June 2010 untill November 2013 2 confirmed by visual inspecting scatterplots and additional checks (see text) *cointegrated Platform specific results Granger causality reveals that Consumer Confidence precedes Facebook sentiment ! (p-value < 0.001) 9
  • 10. Case study 2: mobile phone metadata – Pilot study with a cell phone provider with market share of 1/3 in the Netherlands. – Aggregated data is queried by intermediate company Mezuro and delivered to SN. Privacy is guaranteed! – Applications: daytime population, tourism statistics, economic activity, mobility studies, etcetera. 10
  • 11. Mobile phone population 11 MPRD (Municipal Personal Records Database) = Dutch population
  • 12. Subpopulations model 12 Mobile phone metadata weighted to the MPRD. MPRD data & Education Registers. MPRD data only.
  • 13. Mobile phone metadata 13 Event Datail Records (EDR) contain metadata on mobile phone events (i.e. call, SMS or data transfer). Aggregated table: number of unique devices X time period X current region X residential region.
  • 14. Daytime population results 14 Almere: commuter town? Foreigners at SchipholAirport Dutch population totals
  • 15. Case study 3: Traffic loops Traffic loop data ‐ Each minute (24/7) the number of passing vehicles is counted in around 20.000 ‘loops’ in the Netherlands (100 million records a day) ‐ Nice data source for transport and traffic statistics (and more)15
  • 16. Traffic loops on main roads 16A close look at the highways around Utrecht
  • 17. Traffic loops on main roads (2) 17Traffic loops everywhere…
  • 18. Traffic loops on main roads (3) 18Highways simplified for analysis
  • 19. Raw data: Total number of vehicles a day 19 Time (hour)
  • 20. Correct for missing data: macro level Sliding window of 5 min. Impute missing data. Before After Total = ~ 295 million detected vehicles Total = ~ 330 million (+ 12%) detected vehicles 20
  • 21. Data by type of vehicle 21 Small vehicles (<= 5.6 meter) Medium vehicles (> 5.6 & <= 12.2 meter) Long vehicles (> 12.2 meter)
  • 22. All Dutch vehicles in September
  • 23. Selectivity of big data – Big Data sources may be selective when ‐ Only part of the population contributes to the data set (e.g. mobile phone owners) ‐ The measurement mechanism is selective (e.g. traffic loops placement on Dutch highways is not random) – Many Big Data sources contain events ‐ How to associate events with units? ‐ Number of events per unit may vary. – Correcting for selectivity ‐ Background characteristics – or features – are needed (linking with registers; profiling) ‐ Use predictive modeling / machine learning to produce population estimates 23
  • 24. Why Visualization? October 1st 2013, Statistics Netherlands
  • 25. Effective Display! (seeTor Norretranders, “Band width of our senses)
  • 26. Visualization of Big Data – Large volume: ‐ Data binning or aggregation – High velocity: ‐ Animations ‐ Dashboard / small multiples – Large variety: ‐ Interactive interface ‐ Advanced visualization methods 26
  • 28. October 1st 2013, Statistics Netherlands Heat map: Age vs. ‘Income’ 16 Age Income(euro)
  • 29. October 1st 2013, Statistics Netherlands 17 amount mount