http://statbel.fgov.be
Marc Debusschere
Coordinator Administrative & Big Data
Statbel and big data
Data Science Meetup Leuven, 27 June 2018
http://statbel.fgov.be
Overview
❑ Context
Statbel, big data and the third data revolution in statistics
❑ Big data and Statbel
Projects, accomplishments and problems
❑ European collaboration
❑ Big data and official statistics
Provisional conclusions and way forward
❑ Big data, Statbel and data science
http://statbel.fgov.be
Statbel =
❑ Statistics Belgium
❑ The institute formerly known as Nationaal Instituut voor de
Statistiek (NIS) / Institut national de Statistique (INS)
❑ Administratively part of the FPS (‘ministry’) Economy
❑ Member of the European Statistical System (ESS) =
Eurostat + 32 EU & EFTA national statistical institutes + associated statistics producers
http://statbel.fgov.be
Big data
❑ = data impossible to process in a ‘normal’ way
‘normal’ is relative …
❑ 3 v’s: volume, velocity, variety
❑ Result of societal and technological changes
Satellites, cameras and sensors, internet and e-mail, social media, mobile
phones and tablets, e-business, e-government, machine-to-machine
(internet of things, IoT)
❑ Result: data explosion, data deluge
http://statbel.fgov.be
Big data and statistics
❑ Big data = ‘digital footprint’
❑ Containing valuable information, statistically exploitable
(but also commercially …)
❑ Resulting in the third data revolution in statistics
After surveys (>1846) and administrative data (>2000), now: big data!
❑ Possible data sources – list far from exhaustive!
❖ Scanner data, electronical payments, credit card data
❖ Webscraping for job vacancies, enterprise characteristics
❖ Traffic cameras and detection loops
❖ Smart meters (electricity, gas, water)
❖ Last but not least: mobile phone data!
http://statbel.fgov.be
The future of statistics …

… big data!
Instant statistics based on big data, complemented and/or
validated by administrative data and small and specific ad hoc
surveys.
Also known as: smart statistics …
http://statbel.fgov.be
Big data and Statbel: Big Data Team
❑ Start at the end of 2015
❑ Restricted group, operating informally and ad hoc
❑ Focus on mobile phone data, webscraping job vacancies
❑ Tasks:
❖Reflection on strategy, priorities
❖External contacts concerning big data, with data owners, potential users,
federal and regional authorities, academia, EU, international
organisations, …
❖Analysing big datasets and connecting them to statistical ones – see
below
http://statbel.fgov.be
Big data and Statbel: in production
❑ For consumer price index (CPI)
❖Scanner data supermarkets and retail chains
❖Webscraping prices (e.g. airplane tickets, webshops)
http://statbel.fgov.be
Big data and Statbel: not planned (at present)
❑ Social media, internet search results, text analytics, …
=‘high-hanging fruits’, access and interpretation very problematic!
❑ Smart meters
Political decision of regions (2012) not to deploy
=> no data (about to change in Flanders)
❑ Traffic cameras, traffic loops
Regional competence and data
http://statbel.fgov.be
Big data and Statbel: projects
❑ Mobile phone data
❖Project Statbel-Proximus-Eurostat
❖Border Region Data Collection (BRDC)
❖City data from LFS and Big Data
❑ Webscraping
❖Job vacancies
❑ Satellite data and aerial photography
❖Deep Solaris
http://statbel.fgov.be
Big data in production
❑ Scanner data for CPI
❖ Based on agreement with data owners, facilitated by political pressure
❖ Legal basis (HICP regulation) but cooperative model
❖ Being expanded gradually with new supermarket and retail chains
❖ Extremely smooth and cost-efficient after initial set-up
❑ Webscraping prices for CPI
❖ Collecting prices on webshops (e.g. airplane tickets)
❖ For efficiency but also out of necessity: e-commerce fast expanding!
❖ Legal issues possible
http://statbel.fgov.be
Big data almost in production
❑ Webscraping job vacancies: about to go in production …
❖ Methodological and practical issues
❖ Stand-alone results not sufficient, need to combine with existing Job
vacancy survey (JVS) and ‘administrative’ data from regional
employment agencies (VDAB, FOREM, Actiris)
❖ Linked to European project (ESSnet Big Data, see below): https://
webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/WP1_Webscraping_job_vacancies
http://statbel.fgov.be
Project Statbel, Proximus, Eurostat
❑ Start December 2015, first results April 2016
❑ Step by step approach:
❖ First: actual present population
❖ Basis for: resident population (via place of residence), workplace,
‘usual environment’, tourism, labour migration, migration, time use, …
❑ Innovative:
❖ First collaboration NSI/operator in EU => ‘real’ data
❖ No ‘call detail records’ but network signals: 10 x more frequent!
❖ Combining mobile phone data with statistical datasets
❑ Via geo-coupling of aggregates: no privacy issues!
http://statbel.fgov.be
Project Statbel, Proximus, Eurostat

Results and next steps
❑ Results
❖ 10 publications - see De Meersman e.a., Debusschere e.a., Demunter e.a., Reis e.a., Seynaeve e.a.,
Wirthmann e.a., all at https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/WP5_Documentation
❑ Our objectives:
❖ Further exploration for statistical ánd commercial applications
❖ Concrete use cases with a view to statistical production lines
❑ Unfortunately …
❖ March 2017: data blocked for everyone
❖ In the meantime: new initiatives via MIT/Univ. Newcastle, Eurostat
http://statbel.fgov.be
Some results
Population density per km²:
Mobile phone data (left) versus Census 2011 (right)
http://statbel.fgov.be
Some results, continued
Weekday cells identified as ‘residential’, ‘commuting’ or ‘work’, with geographical
representation
http://statbel.fgov.be
Some results, continued
‘Residential’, ‘work’ and ‘commuting’-cells in the Brussels-Leuven area
http://statbel.fgov.be
Other projects
❑ Border Region Data Collection (BRDC)
❖Grant EC DG Regio, 1 year, July 2017 - July 2018
❖Cross-border living place-workplace mobility through
combining Labour force survey (LFS), administrative data and
mobile phone data
❖With CBS Netherlands (lead), Destatis Germany, Insee
France, GUS Poland, SURS Slovenia
http://statbel.fgov.be
Other projects, continued
❑ Deep Solaris
❖Grant Eurostat, 1 year, kickoff 19 Febr. 2018
❖Detecting solar pannels on the basis of satellite data
and aerial photography
❖Via machine learning
❖With CBS Netherlands (lead), Destatis Germany,
IT.NRW (Düsseldorf, DE) and BISS (Heerlen, NL)
http://statbel.fgov.be
Other projects, continued
❑ City data from LFS and Big Data
❖Grant EC DG Regio, 1 year, Jan. – Dec. 2018
❖Mapping metropolitan areas on the basis of Labour force
survey (LFS) and mobile phone data
❖With CBS Netherlands (lead), Destatis Germany, Insee
France, Statistics Austria
http://statbel.fgov.be
From exploration to exploitation

Developing use cases
❑ Scanner data and webscraping for CPI
❑ Webscraping job vacancies
❑ Validation living place and workplace Census
❑ Matrix living place-workplace
❑ … (population, migration, tourism, mobility, transport, time use,
environment, agriculture, …)
http://statbel.fgov.be
Big data in the ESS (European Statistical System)

Initiatives and projects
http://statbel.fgov.be
Big data and official statistics

Provisional conclusions
❑ Size and complexity of datasets
❖ Less problematic than anticipated (because of pre-processing, at
least some structure)
❖ Focus consequently less on IT infrastructure and software
http://statbel.fgov.be
Big data and official statistics

Provisional conclusions, continued
❑ Biggest obstacle: access!
❖ Data owned by private enterprises: profit-oriented
❖ Fail to see any advantage in collaborating, on the contrary
(mistakenly!)
❖ Imposing legal obligation seems unavoidable …
❑ Link to privacy issues: fear of reputational damage
http://statbel.fgov.be
Big data and official statistics

Provisional conclusions, continued
❑ Additional challenge: methodology
❖ All ancient headaches are still there …
… with a lot of new ones added!
http://statbel.fgov.be
Big data and official statistics

The next stage: smart statistics
❑ Monitoring systems which are:
❖ integrated
❖ flexible
❖ multi-source
❖ real-time and highly detailed
❑ Some examples:
❖ continuous tracking of air quality
❖ highly granular actual present population (time, location, characteristics)
❖ smart farming statistics
http://statbel.fgov.be
For discussion:

big data, Statbel and data science
❑ Statbel owns numerous geocoded datasets (population,
employment, income, lodgings, …)
❑ and might gain access to big data sources …
❑ … but lacks data science, capability to analyse big data
❑ Two possible solutions:
❖ collaboration with academia, researchers
❖ hiring …
http://statbel.fgov.be
Questions?
Comments?

Statbel and big data

  • 1.
    http://statbel.fgov.be Marc Debusschere Coordinator Administrative& Big Data Statbel and big data Data Science Meetup Leuven, 27 June 2018
  • 2.
    http://statbel.fgov.be Overview ❑ Context Statbel, bigdata and the third data revolution in statistics ❑ Big data and Statbel Projects, accomplishments and problems ❑ European collaboration ❑ Big data and official statistics Provisional conclusions and way forward ❑ Big data, Statbel and data science
  • 3.
    http://statbel.fgov.be Statbel = ❑ StatisticsBelgium ❑ The institute formerly known as Nationaal Instituut voor de Statistiek (NIS) / Institut national de Statistique (INS) ❑ Administratively part of the FPS (‘ministry’) Economy ❑ Member of the European Statistical System (ESS) = Eurostat + 32 EU & EFTA national statistical institutes + associated statistics producers
  • 4.
    http://statbel.fgov.be Big data ❑ =data impossible to process in a ‘normal’ way ‘normal’ is relative … ❑ 3 v’s: volume, velocity, variety ❑ Result of societal and technological changes Satellites, cameras and sensors, internet and e-mail, social media, mobile phones and tablets, e-business, e-government, machine-to-machine (internet of things, IoT) ❑ Result: data explosion, data deluge
  • 5.
    http://statbel.fgov.be Big data andstatistics ❑ Big data = ‘digital footprint’ ❑ Containing valuable information, statistically exploitable (but also commercially …) ❑ Resulting in the third data revolution in statistics After surveys (>1846) and administrative data (>2000), now: big data! ❑ Possible data sources – list far from exhaustive! ❖ Scanner data, electronical payments, credit card data ❖ Webscraping for job vacancies, enterprise characteristics ❖ Traffic cameras and detection loops ❖ Smart meters (electricity, gas, water) ❖ Last but not least: mobile phone data!
  • 6.
    http://statbel.fgov.be The future ofstatistics …
 … big data! Instant statistics based on big data, complemented and/or validated by administrative data and small and specific ad hoc surveys. Also known as: smart statistics …
  • 7.
    http://statbel.fgov.be Big data andStatbel: Big Data Team ❑ Start at the end of 2015 ❑ Restricted group, operating informally and ad hoc ❑ Focus on mobile phone data, webscraping job vacancies ❑ Tasks: ❖Reflection on strategy, priorities ❖External contacts concerning big data, with data owners, potential users, federal and regional authorities, academia, EU, international organisations, … ❖Analysing big datasets and connecting them to statistical ones – see below
  • 8.
    http://statbel.fgov.be Big data andStatbel: in production ❑ For consumer price index (CPI) ❖Scanner data supermarkets and retail chains ❖Webscraping prices (e.g. airplane tickets, webshops)
  • 9.
    http://statbel.fgov.be Big data andStatbel: not planned (at present) ❑ Social media, internet search results, text analytics, … =‘high-hanging fruits’, access and interpretation very problematic! ❑ Smart meters Political decision of regions (2012) not to deploy => no data (about to change in Flanders) ❑ Traffic cameras, traffic loops Regional competence and data
  • 10.
    http://statbel.fgov.be Big data andStatbel: projects ❑ Mobile phone data ❖Project Statbel-Proximus-Eurostat ❖Border Region Data Collection (BRDC) ❖City data from LFS and Big Data ❑ Webscraping ❖Job vacancies ❑ Satellite data and aerial photography ❖Deep Solaris
  • 11.
    http://statbel.fgov.be Big data inproduction ❑ Scanner data for CPI ❖ Based on agreement with data owners, facilitated by political pressure ❖ Legal basis (HICP regulation) but cooperative model ❖ Being expanded gradually with new supermarket and retail chains ❖ Extremely smooth and cost-efficient after initial set-up ❑ Webscraping prices for CPI ❖ Collecting prices on webshops (e.g. airplane tickets) ❖ For efficiency but also out of necessity: e-commerce fast expanding! ❖ Legal issues possible
  • 12.
    http://statbel.fgov.be Big data almostin production ❑ Webscraping job vacancies: about to go in production … ❖ Methodological and practical issues ❖ Stand-alone results not sufficient, need to combine with existing Job vacancy survey (JVS) and ‘administrative’ data from regional employment agencies (VDAB, FOREM, Actiris) ❖ Linked to European project (ESSnet Big Data, see below): https:// webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/WP1_Webscraping_job_vacancies
  • 13.
    http://statbel.fgov.be Project Statbel, Proximus,Eurostat ❑ Start December 2015, first results April 2016 ❑ Step by step approach: ❖ First: actual present population ❖ Basis for: resident population (via place of residence), workplace, ‘usual environment’, tourism, labour migration, migration, time use, … ❑ Innovative: ❖ First collaboration NSI/operator in EU => ‘real’ data ❖ No ‘call detail records’ but network signals: 10 x more frequent! ❖ Combining mobile phone data with statistical datasets ❑ Via geo-coupling of aggregates: no privacy issues!
  • 14.
    http://statbel.fgov.be Project Statbel, Proximus,Eurostat
 Results and next steps ❑ Results ❖ 10 publications - see De Meersman e.a., Debusschere e.a., Demunter e.a., Reis e.a., Seynaeve e.a., Wirthmann e.a., all at https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/WP5_Documentation ❑ Our objectives: ❖ Further exploration for statistical ánd commercial applications ❖ Concrete use cases with a view to statistical production lines ❑ Unfortunately … ❖ March 2017: data blocked for everyone ❖ In the meantime: new initiatives via MIT/Univ. Newcastle, Eurostat
  • 15.
    http://statbel.fgov.be Some results Population densityper km²: Mobile phone data (left) versus Census 2011 (right)
  • 16.
    http://statbel.fgov.be Some results, continued Weekdaycells identified as ‘residential’, ‘commuting’ or ‘work’, with geographical representation
  • 17.
    http://statbel.fgov.be Some results, continued ‘Residential’,‘work’ and ‘commuting’-cells in the Brussels-Leuven area
  • 18.
    http://statbel.fgov.be Other projects ❑ BorderRegion Data Collection (BRDC) ❖Grant EC DG Regio, 1 year, July 2017 - July 2018 ❖Cross-border living place-workplace mobility through combining Labour force survey (LFS), administrative data and mobile phone data ❖With CBS Netherlands (lead), Destatis Germany, Insee France, GUS Poland, SURS Slovenia
  • 19.
    http://statbel.fgov.be Other projects, continued ❑Deep Solaris ❖Grant Eurostat, 1 year, kickoff 19 Febr. 2018 ❖Detecting solar pannels on the basis of satellite data and aerial photography ❖Via machine learning ❖With CBS Netherlands (lead), Destatis Germany, IT.NRW (Düsseldorf, DE) and BISS (Heerlen, NL)
  • 20.
    http://statbel.fgov.be Other projects, continued ❑City data from LFS and Big Data ❖Grant EC DG Regio, 1 year, Jan. – Dec. 2018 ❖Mapping metropolitan areas on the basis of Labour force survey (LFS) and mobile phone data ❖With CBS Netherlands (lead), Destatis Germany, Insee France, Statistics Austria
  • 21.
    http://statbel.fgov.be From exploration toexploitation
 Developing use cases ❑ Scanner data and webscraping for CPI ❑ Webscraping job vacancies ❑ Validation living place and workplace Census ❑ Matrix living place-workplace ❑ … (population, migration, tourism, mobility, transport, time use, environment, agriculture, …)
  • 22.
    http://statbel.fgov.be Big data inthe ESS (European Statistical System)
 Initiatives and projects
  • 23.
    http://statbel.fgov.be Big data andofficial statistics
 Provisional conclusions ❑ Size and complexity of datasets ❖ Less problematic than anticipated (because of pre-processing, at least some structure) ❖ Focus consequently less on IT infrastructure and software
  • 24.
    http://statbel.fgov.be Big data andofficial statistics
 Provisional conclusions, continued ❑ Biggest obstacle: access! ❖ Data owned by private enterprises: profit-oriented ❖ Fail to see any advantage in collaborating, on the contrary (mistakenly!) ❖ Imposing legal obligation seems unavoidable … ❑ Link to privacy issues: fear of reputational damage
  • 25.
    http://statbel.fgov.be Big data andofficial statistics
 Provisional conclusions, continued ❑ Additional challenge: methodology ❖ All ancient headaches are still there … … with a lot of new ones added!
  • 26.
    http://statbel.fgov.be Big data andofficial statistics
 The next stage: smart statistics ❑ Monitoring systems which are: ❖ integrated ❖ flexible ❖ multi-source ❖ real-time and highly detailed ❑ Some examples: ❖ continuous tracking of air quality ❖ highly granular actual present population (time, location, characteristics) ❖ smart farming statistics
  • 27.
    http://statbel.fgov.be For discussion:
 big data,Statbel and data science ❑ Statbel owns numerous geocoded datasets (population, employment, income, lodgings, …) ❑ and might gain access to big data sources … ❑ … but lacks data science, capability to analyse big data ❑ Two possible solutions: ❖ collaboration with academia, researchers ❖ hiring …
  • 28.