SlideShare a Scribd company logo
New data sources for
statistics
Experiences at Statistics Netherlands


Piet Daas, Marko Roos, Chris de Blois,
Rutger Hoekstra, Olav ten Bosch, and Yinyi Ma




                                                NTTS 2011
Why new data sources?
• Many NSI’s traditionally use:
     • Surveys
     • Administrative data (registers)

• But there are other sources of information out
  there (especially the electronic ones)
     • Are they really useful?
     • Investigate it! (studies are supported by DG of Stat. Neth.)


NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                1
Examples of new data sources
Three will be discussed:
     1. Product prices on the internet

     2. Mobile phone location data

     3. Twitter text messages

     4. Global Positioning System (GPS) data
        (and traffic loop information)


NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                             2
1) Product prices on the internet
• Ubiquitous on the World Wide Web
     • Stat. Neth. already uses web price data for the
       Consumer Price Index:
          • E.g. prices of airline tickets, books, CD’s, DVD’s

     • But this data is manually collected
     • Why not automatically (and more often)?
          • With a web robot (web spider)
          • Which is a script or (commercial) tool



NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                 3
Product prices on the internet (2)
Collected data:
      • Daily (over a 10 month period)
      • 6 websites: 4 airlines, 1 housing, and 1 (unstaffed) petrol station
        site

Works well but:
      • Some websites are very complicated
           •    Especially dynamic websites, if possible directly tap into database
      • Sometimes websites change lay-out
           •    3 of the 4 airline websites did this (in test period)
           •    Affects the cost efficiency (redesigning scripts takes a lot of time)
                  • Manual data collection is much easier and cheaper!
                  • But: Automatic data collection has its own merits



NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                                 4
Product prices on the internet (3)
Example: Airline ticket prices (shown over 116 day period)




                                                             Manual
                                                             collected:
                                                             1 day before
                                                             departure date


NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                     5
2) Mobile phone location data
• Almost everybody has a mobile phone
     • in the Netherlands ~92%
• People use it a lot
• For every outgoing and incoming call the phone
  connects to a nearby telephone mast (‘cell’)
     • Source of location information!
     • Every mobile phone and masts have an unique ID
     • This data is logged by mobile phone companies
          • Used for billing purposes & network maintenance



NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                              6
Mobile phone location data (2)
• Could be an interesting source of information
• Let's study a dataset
     • Obtained data from 1 large Dutch mobile phone company
          • over 5 million different phones active on their own network
     • Dataset covered a 14 day period
     • Contained 550 million records (call events)
     • Every record contains an unique phone ID, date-time stamp, and
       mast (cell) connection ID (= location info)
     • Phone ID’s were scrambled to avoid identification
          • Scrambled ID’s were stable over 14 day period




NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                          7
Mobile phone location data (3)




           Typical day in the Netherlands (call activity)
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                             8
Movie of call activity during the day




NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                             9
Mobile phone location data (4)
This type of location information could possibly
  be used to study:
     • Overall day time movement of people
          • Perhaps also: movement of individual mobile phones during the
            day
     • Distinguish regions of different economic activity
          • Different behaviour during the week and on ‘specific’ days
     • For tourism
          • Roaming info: Activity of non-‘Dutch’ phones in the Netherlands

     • BUT!


NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                         10
Mobile phone location data (5)

However: methodological issues
     1.       Translation of call intensity per cell to call intensity
              per region (cell coverage area)
     2.       Representativity
          •     Phone-ID’s vs. Dutch population
          •     Only used data of one mobile phone company
          •     Some people are more active callers then others
     3.       How does the number of calls relate to the number of
              people present at the location?


NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                   11
3) Twitter text messages
• Social media is used more and more
  intensively in the Netherlands & Europe
• Potential source of personal information,
  opinions, and sentiments
• But what type of information is actually
  exchanged?
     • Investigated Twitter (as an example)
          • Easily accessible (text)data and used a lot in
            the Netherlands


NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                             12
Twitter text messages (2)
• Twitter is a micro blogging service
     •   Text messages of 140 characters max
     •   Called ‘tweets’
     •   Posted to the public or to friends only
     •   Hashsign (#) is used to highlight ‘keywords’
          • Example: #Eurostat, #NTTS


     • A few examples

NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                             13
Twitter text messages (3)
• Identify topics discussed on Twitter in the
  Netherlands by collecting ‘tweets’
• Use this information to decide if Twitter (and
  perhaps social media in general) is of interest for
  Stat. Neth.
     • Collect tweets from ‘all’ Dutch Twitter users!
          • People located in the Netherlands
          • Use location info provided by users
          • Try to get a complete overview




NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                             14
Twitter text messages (4)
• Collect tweets
  • First studies by using Twitter search option
        • Radius of 200km from Utrecht + Dutch language filter
        • BUT: Twitter appears to apply a ‘quality filter’ so data was incomplete

  • Best alternative: First ‘crawl’ for users, then collect tweets
        • Search through users tree: select users with large number of
          followers (‘friends’), select these and expand search
        • User is Dutch if location includes ‘Netherlands’ or the name of a
          Dutch municipality
        • Collected 380,415 unique usernames
        • For ever user collect up to 200 tweets; obtained ~12 million tweets
        • Identify topics discussed (first approach: used hashtags, manually)
             • ~1,8 million tweets contained a hashtag



 NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                         15
Twitter text messages (5)
                                                                                    All #hashtags
                                                                                       News Events Products
                                                                                Companies
                           Top 500 #hashtags                                                            Locations
                                                                                  Radio 1% 1% 1% 1% 1%
                                                                                                              Emotions
 Table 1. Classification of Twitter messages collected according to hashtags used
                   • Results of #hashtag classification 500# Top 500# no
                               Applications
                                                      Top     TV
                                                                     2%
                                                                                                              3%
                                                                                                                            Politics All# no
Category       Description
                Politics           7%                       Examples                              only (%)         3%Other (%) Other (%)
Twitter                                    Sports
               Twitter/internet specific language & slang   #durftevragen, #fail, #twexit            12               3%
                                                                                                                       19         Applications
                                                                                                                                       19
                    7%
Sports         Sports, clubs, and sports events
                                             9%             #WK2010, #ajax, #oranje                   9                14              14
           TV Twitter specific programs
Applications                                                #nowplaying, #lastfm, #in                 8                13
                                                                                                                       3%              12
Politics     6%Political debates, leaders, and parties      #tk2010, #NOSdebat, #formatie             7                11 Sports 11
TV             Dutch TV-programs (no political & no news)   #dwdd, #ohohcherso, #tvoh                 6                10
                                                                                                                       4%              11
  Emotions Sentiment and feelings
                                                 Twitter
Emotions                                                    #moe, #LOL, #zucht, #heerlijk             6                10              10
                                                  12%
Locations 6%References to a location or municipality        #amsterdam, #utrecht                      3                 5    Twitter 5
                                                                                                                        5%
Products
 Locations 3% Referring to products                         #iPhone, #iPad, #android                  3                 4               4
Events         Non-sport and non-political happenings       #twibbon, #LL10, #lowlands                3                 4               4
 Products 3% Referring to news programs
News                                                        #nos, #pownews, #Nujij                    2                 4               4
Companies Referring to companies                            #ns, #google, #tmobile, #KPN              2                 4               4
          3%
  Events
Radio          Dutch radio programs                         #3fm, #53j8, #radio1                      1                 2               2
Other     2%   Rest group, mostly unrelated tags            #koffie, #goedemorgen Other              38                 -               -
     News
                                                                                      72%
            2%                          Other
Companies
              1%
      Radio                              38%




                   NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                                                                              16
Twitter text messages (6)
• First conclusions (based on topics identified by
  using hashtags)
     • Potential interesting for politics and events
          • Overall study suggests ~5% in our total dataset
          • Around 600,000 tweets
     • Twitter could probably also be used:
          • for info on social and cultural participation and on social cohesion

• Need to further refine our studies
     • More in depth studies of all tweets collected (also without #)
     • Use (more advanced) text mining techniques for classification



NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                            17
Overall remarks
• Many interesting new data sources out there
  • But its not easy to determine their usefulness
       • Automatically collect product prices from the web
          • Only in addition to the traditional manual process
          • To obtain more & more frequent data

       • Mobile phone & Twitter data
          • Representativity of the data is a key issue
              • Not all (Dutch) people are observed
              • Hardly any background information available
          • Is a major topic in future research


 NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                                 18
Thank you for your attention!
• #Questions?




NTTS2011 New data sources for statistics: Exp. Stat. Neth.
                                                             19

More Related Content

What's hot

Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
Edwin de Jonge
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Mindtrek
 
Big-Data Computing on the Cloud
Big-Data Computing on the CloudBig-Data Computing on the Cloud
Big-Data Computing on the Cloud
Data Driven Innovation
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
suresh sood
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
suresh sood
 
Data stories
Data storiesData stories
Data stories
Elena Simperl
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesPayamBarnaghi
 
Big Data and Nowcasting
Big Data and NowcastingBig Data and Nowcasting
Big Data and Nowcasting
Dario Buono
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
suresh sood
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media data
Piet J.H. Daas
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
Mohammad Reza Gerami
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...
Juan Mateos-Garcia
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
Dr Anjan Krishnamurthy
 
Spark
SparkSpark
Systemof insight
Systemof insightSystemof insight
Systemof insight
suresh sood
 
An introduction to Data Mining
An introduction to Data MiningAn introduction to Data Mining
An introduction to Data Mining
Shobhita Dayal
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
KamleshKumar394
 
Big data
Big dataBig data
P. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsP. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European Statistics
Istituto nazionale di statistica
 
An introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingAn introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt Thearling
Pim Piepers
 

What's hot (20)

Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
 
Big-Data Computing on the Cloud
Big-Data Computing on the CloudBig-Data Computing on the Cloud
Big-Data Computing on the Cloud
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Data stories
Data storiesData stories
Data stories
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart cities
 
Big Data and Nowcasting
Big Data and NowcastingBig Data and Nowcasting
Big Data and Nowcasting
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media data
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
 
Spark
SparkSpark
Spark
 
Systemof insight
Systemof insightSystemof insight
Systemof insight
 
An introduction to Data Mining
An introduction to Data MiningAn introduction to Data Mining
An introduction to Data Mining
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
 
Big data
Big dataBig data
Big data
 
P. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsP. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European Statistics
 
An introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingAn introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt Thearling
 

Viewers also liked

SCi Recruitment For Scientists
SCi   Recruitment For ScientistsSCi   Recruitment For Scientists
SCi Recruitment For Scientistscourtfoley
 
Checkitmobile - using Git for development
Checkitmobile - using Git for developmentCheckitmobile - using Git for development
Checkitmobile - using Git for developmentGerrit Wanderer
 
Checkitmobile Git Workshop
Checkitmobile Git WorkshopCheckitmobile Git Workshop
Checkitmobile Git Workshop
Gerrit Wanderer
 
Checkitmobile advanced git
Checkitmobile advanced gitCheckitmobile advanced git
Checkitmobile advanced gitGerrit Wanderer
 
Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.
Piet J.H. Daas
 
Research on the Quality of Registers To Make Data Decisions in the Dutch Virt...
Research on the Quality of Registers To Make Data Decisions in the Dutch Virt...Research on the Quality of Registers To Make Data Decisions in the Dutch Virt...
Research on the Quality of Registers To Make Data Decisions in the Dutch Virt...
Piet J.H. Daas
 

Viewers also liked (6)

SCi Recruitment For Scientists
SCi   Recruitment For ScientistsSCi   Recruitment For Scientists
SCi Recruitment For Scientists
 
Checkitmobile - using Git for development
Checkitmobile - using Git for developmentCheckitmobile - using Git for development
Checkitmobile - using Git for development
 
Checkitmobile Git Workshop
Checkitmobile Git WorkshopCheckitmobile Git Workshop
Checkitmobile Git Workshop
 
Checkitmobile advanced git
Checkitmobile advanced gitCheckitmobile advanced git
Checkitmobile advanced git
 
Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.
 
Research on the Quality of Registers To Make Data Decisions in the Dutch Virt...
Research on the Quality of Registers To Make Data Decisions in the Dutch Virt...Research on the Quality of Registers To Make Data Decisions in the Dutch Virt...
Research on the Quality of Registers To Make Data Decisions in the Dutch Virt...
 

Similar to New data sources for statistics: Experiences at Statistics Netherlands.

Jan Romportl, Chief Data Scientist at O2 Czech Republic
Jan Romportl, Chief Data Scientist at O2 Czech RepublicJan Romportl, Chief Data Scientist at O2 Czech Republic
Jan Romportl, Chief Data Scientist at O2 Czech Republic
Dataconomy Media
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
Piet J.H. Daas
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in Eindhoven
Piet J.H. Daas
 
Extracting Value from Big Data - Stuart Higgins
Extracting Value from Big Data - Stuart HigginsExtracting Value from Big Data - Stuart Higgins
Extracting Value from Big Data - Stuart Higgins
grhodes05
 
The data we want
The data we wantThe data we want
The data we want
Elena Simperl
 
Big Data World
Big Data WorldBig Data World
Big Data World
Hossein Zahed
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analytics
MJ Xavier
 
Van der Valk, J. - Using mobile phone data to understand functional geographies.
Van der Valk, J. - Using mobile phone data to understand functional geographies.Van der Valk, J. - Using mobile phone data to understand functional geographies.
Van der Valk, J. - Using mobile phone data to understand functional geographies.
OECDregions
 
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataA Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
Gloria Re Calegari
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris
 
Text mining
Text miningText mining
Text mining
Pankaj Thakur
 
Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)AltheimPrivacy
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
BigData_Europe
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
varun453331
 
Online data sources and information exposure
Online data sources and information exposureOnline data sources and information exposure
Online data sources and information exposure
University of Southampton
 
Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
PRELIDA Project
 
Data sharing between private companies and research facilities
Data sharing between private companies and research facilitiesData sharing between private companies and research facilities
Data sharing between private companies and research facilities
Institute of Contemporary Sciences
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
Samiksha880257
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
Laura Po
 
Using administrative data to measure public procurement of R&D: Opportunities...
Using administrative data to measure public procurement of R&D: Opportunities...Using administrative data to measure public procurement of R&D: Opportunities...
Using administrative data to measure public procurement of R&D: Opportunities...
STIEAS
 

Similar to New data sources for statistics: Experiences at Statistics Netherlands. (20)

Jan Romportl, Chief Data Scientist at O2 Czech Republic
Jan Romportl, Chief Data Scientist at O2 Czech RepublicJan Romportl, Chief Data Scientist at O2 Czech Republic
Jan Romportl, Chief Data Scientist at O2 Czech Republic
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in Eindhoven
 
Extracting Value from Big Data - Stuart Higgins
Extracting Value from Big Data - Stuart HigginsExtracting Value from Big Data - Stuart Higgins
Extracting Value from Big Data - Stuart Higgins
 
The data we want
The data we wantThe data we want
The data we want
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analytics
 
Van der Valk, J. - Using mobile phone data to understand functional geographies.
Van der Valk, J. - Using mobile phone data to understand functional geographies.Van der Valk, J. - Using mobile phone data to understand functional geographies.
Van der Valk, J. - Using mobile phone data to understand functional geographies.
 
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataA Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
 
Text mining
Text miningText mining
Text mining
 
Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
 
Online data sources and information exposure
Online data sources and information exposureOnline data sources and information exposure
Online data sources and information exposure
 
Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
 
Data sharing between private companies and research facilities
Data sharing between private companies and research facilitiesData sharing between private companies and research facilities
Data sharing between private companies and research facilities
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Using administrative data to measure public procurement of R&D: Opportunities...
Using administrative data to measure public procurement of R&D: Opportunities...Using administrative data to measure public procurement of R&D: Opportunities...
Using administrative data to measure public procurement of R&D: Opportunities...
 

More from Piet J.H. Daas

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their use
Piet J.H. Daas
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics Netherlands
Piet J.H. Daas
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
Piet J.H. Daas
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
Piet J.H. Daas
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statistics
Piet J.H. Daas
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
Piet J.H. Daas
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics Netherlands
Piet J.H. Daas
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONS
Piet J.H. Daas
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45
Piet J.H. Daas
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation Mannheim
Piet J.H. Daas
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daas
Piet J.H. Daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiek
Piet J.H. Daas
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivity
Piet J.H. Daas
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statistics
Piet J.H. Daas
 
Social media sentiment and consumer confidence
Social media sentiment and consumer confidenceSocial media sentiment and consumer confidence
Social media sentiment and consumer confidence
Piet J.H. Daas
 
Big data @ CBS
Big data @ CBSBig data @ CBS
Big data @ CBS
Piet J.H. Daas
 
Bi dutch meeting data science
Bi dutch meeting data scienceBi dutch meeting data science
Bi dutch meeting data science
Piet J.H. Daas
 
Piet daas big_data_official_statistics_target_groningen
Piet daas big_data_official_statistics_target_groningenPiet daas big_data_official_statistics_target_groningen
Piet daas big_data_official_statistics_target_groningen
Piet J.H. Daas
 
Big data en officiële statistiek
Big data en officiële statistiekBig data en officiële statistiek
Big data en officiële statistiek
Piet J.H. Daas
 

More from Piet J.H. Daas (19)

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their use
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics Netherlands
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statistics
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics Netherlands
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONS
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation Mannheim
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiek
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivity
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statistics
 
Social media sentiment and consumer confidence
Social media sentiment and consumer confidenceSocial media sentiment and consumer confidence
Social media sentiment and consumer confidence
 
Big data @ CBS
Big data @ CBSBig data @ CBS
Big data @ CBS
 
Bi dutch meeting data science
Bi dutch meeting data scienceBi dutch meeting data science
Bi dutch meeting data science
 
Piet daas big_data_official_statistics_target_groningen
Piet daas big_data_official_statistics_target_groningenPiet daas big_data_official_statistics_target_groningen
Piet daas big_data_official_statistics_target_groningen
 
Big data en officiële statistiek
Big data en officiële statistiekBig data en officiële statistiek
Big data en officiële statistiek
 

Recently uploaded

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 

Recently uploaded (20)

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 

New data sources for statistics: Experiences at Statistics Netherlands.

  • 1. New data sources for statistics Experiences at Statistics Netherlands Piet Daas, Marko Roos, Chris de Blois, Rutger Hoekstra, Olav ten Bosch, and Yinyi Ma NTTS 2011
  • 2. Why new data sources? • Many NSI’s traditionally use: • Surveys • Administrative data (registers) • But there are other sources of information out there (especially the electronic ones) • Are they really useful? • Investigate it! (studies are supported by DG of Stat. Neth.) NTTS2011 New data sources for statistics: Exp. Stat. Neth. 1
  • 3. Examples of new data sources Three will be discussed: 1. Product prices on the internet 2. Mobile phone location data 3. Twitter text messages 4. Global Positioning System (GPS) data (and traffic loop information) NTTS2011 New data sources for statistics: Exp. Stat. Neth. 2
  • 4. 1) Product prices on the internet • Ubiquitous on the World Wide Web • Stat. Neth. already uses web price data for the Consumer Price Index: • E.g. prices of airline tickets, books, CD’s, DVD’s • But this data is manually collected • Why not automatically (and more often)? • With a web robot (web spider) • Which is a script or (commercial) tool NTTS2011 New data sources for statistics: Exp. Stat. Neth. 3
  • 5. Product prices on the internet (2) Collected data: • Daily (over a 10 month period) • 6 websites: 4 airlines, 1 housing, and 1 (unstaffed) petrol station site Works well but: • Some websites are very complicated • Especially dynamic websites, if possible directly tap into database • Sometimes websites change lay-out • 3 of the 4 airline websites did this (in test period) • Affects the cost efficiency (redesigning scripts takes a lot of time) • Manual data collection is much easier and cheaper! • But: Automatic data collection has its own merits NTTS2011 New data sources for statistics: Exp. Stat. Neth. 4
  • 6. Product prices on the internet (3) Example: Airline ticket prices (shown over 116 day period) Manual collected: 1 day before departure date NTTS2011 New data sources for statistics: Exp. Stat. Neth. 5
  • 7. 2) Mobile phone location data • Almost everybody has a mobile phone • in the Netherlands ~92% • People use it a lot • For every outgoing and incoming call the phone connects to a nearby telephone mast (‘cell’) • Source of location information! • Every mobile phone and masts have an unique ID • This data is logged by mobile phone companies • Used for billing purposes & network maintenance NTTS2011 New data sources for statistics: Exp. Stat. Neth. 6
  • 8. Mobile phone location data (2) • Could be an interesting source of information • Let's study a dataset • Obtained data from 1 large Dutch mobile phone company • over 5 million different phones active on their own network • Dataset covered a 14 day period • Contained 550 million records (call events) • Every record contains an unique phone ID, date-time stamp, and mast (cell) connection ID (= location info) • Phone ID’s were scrambled to avoid identification • Scrambled ID’s were stable over 14 day period NTTS2011 New data sources for statistics: Exp. Stat. Neth. 7
  • 9. Mobile phone location data (3) Typical day in the Netherlands (call activity) NTTS2011 New data sources for statistics: Exp. Stat. Neth. 8
  • 10. Movie of call activity during the day NTTS2011 New data sources for statistics: Exp. Stat. Neth. 9
  • 11. Mobile phone location data (4) This type of location information could possibly be used to study: • Overall day time movement of people • Perhaps also: movement of individual mobile phones during the day • Distinguish regions of different economic activity • Different behaviour during the week and on ‘specific’ days • For tourism • Roaming info: Activity of non-‘Dutch’ phones in the Netherlands • BUT! NTTS2011 New data sources for statistics: Exp. Stat. Neth. 10
  • 12. Mobile phone location data (5) However: methodological issues 1. Translation of call intensity per cell to call intensity per region (cell coverage area) 2. Representativity • Phone-ID’s vs. Dutch population • Only used data of one mobile phone company • Some people are more active callers then others 3. How does the number of calls relate to the number of people present at the location? NTTS2011 New data sources for statistics: Exp. Stat. Neth. 11
  • 13. 3) Twitter text messages • Social media is used more and more intensively in the Netherlands & Europe • Potential source of personal information, opinions, and sentiments • But what type of information is actually exchanged? • Investigated Twitter (as an example) • Easily accessible (text)data and used a lot in the Netherlands NTTS2011 New data sources for statistics: Exp. Stat. Neth. 12
  • 14. Twitter text messages (2) • Twitter is a micro blogging service • Text messages of 140 characters max • Called ‘tweets’ • Posted to the public or to friends only • Hashsign (#) is used to highlight ‘keywords’ • Example: #Eurostat, #NTTS • A few examples NTTS2011 New data sources for statistics: Exp. Stat. Neth. 13
  • 15. Twitter text messages (3) • Identify topics discussed on Twitter in the Netherlands by collecting ‘tweets’ • Use this information to decide if Twitter (and perhaps social media in general) is of interest for Stat. Neth. • Collect tweets from ‘all’ Dutch Twitter users! • People located in the Netherlands • Use location info provided by users • Try to get a complete overview NTTS2011 New data sources for statistics: Exp. Stat. Neth. 14
  • 16. Twitter text messages (4) • Collect tweets • First studies by using Twitter search option • Radius of 200km from Utrecht + Dutch language filter • BUT: Twitter appears to apply a ‘quality filter’ so data was incomplete • Best alternative: First ‘crawl’ for users, then collect tweets • Search through users tree: select users with large number of followers (‘friends’), select these and expand search • User is Dutch if location includes ‘Netherlands’ or the name of a Dutch municipality • Collected 380,415 unique usernames • For ever user collect up to 200 tweets; obtained ~12 million tweets • Identify topics discussed (first approach: used hashtags, manually) • ~1,8 million tweets contained a hashtag NTTS2011 New data sources for statistics: Exp. Stat. Neth. 15
  • 17. Twitter text messages (5) All #hashtags News Events Products Companies Top 500 #hashtags Locations Radio 1% 1% 1% 1% 1% Emotions Table 1. Classification of Twitter messages collected according to hashtags used • Results of #hashtag classification 500# Top 500# no Applications Top TV 2% 3% Politics All# no Category Description Politics 7% Examples only (%) 3%Other (%) Other (%) Twitter Sports Twitter/internet specific language & slang #durftevragen, #fail, #twexit 12 3% 19 Applications 19 7% Sports Sports, clubs, and sports events 9% #WK2010, #ajax, #oranje 9 14 14 TV Twitter specific programs Applications #nowplaying, #lastfm, #in 8 13 3% 12 Politics 6%Political debates, leaders, and parties #tk2010, #NOSdebat, #formatie 7 11 Sports 11 TV Dutch TV-programs (no political & no news) #dwdd, #ohohcherso, #tvoh 6 10 4% 11 Emotions Sentiment and feelings Twitter Emotions #moe, #LOL, #zucht, #heerlijk 6 10 10 12% Locations 6%References to a location or municipality #amsterdam, #utrecht 3 5 Twitter 5 5% Products Locations 3% Referring to products #iPhone, #iPad, #android 3 4 4 Events Non-sport and non-political happenings #twibbon, #LL10, #lowlands 3 4 4 Products 3% Referring to news programs News #nos, #pownews, #Nujij 2 4 4 Companies Referring to companies #ns, #google, #tmobile, #KPN 2 4 4 3% Events Radio Dutch radio programs #3fm, #53j8, #radio1 1 2 2 Other 2% Rest group, mostly unrelated tags #koffie, #goedemorgen Other 38 - - News 72% 2% Other Companies 1% Radio 38% NTTS2011 New data sources for statistics: Exp. Stat. Neth. 16
  • 18. Twitter text messages (6) • First conclusions (based on topics identified by using hashtags) • Potential interesting for politics and events • Overall study suggests ~5% in our total dataset • Around 600,000 tweets • Twitter could probably also be used: • for info on social and cultural participation and on social cohesion • Need to further refine our studies • More in depth studies of all tweets collected (also without #) • Use (more advanced) text mining techniques for classification NTTS2011 New data sources for statistics: Exp. Stat. Neth. 17
  • 19. Overall remarks • Many interesting new data sources out there • But its not easy to determine their usefulness • Automatically collect product prices from the web • Only in addition to the traditional manual process • To obtain more & more frequent data • Mobile phone & Twitter data • Representativity of the data is a key issue • Not all (Dutch) people are observed • Hardly any background information available • Is a major topic in future research NTTS2011 New data sources for statistics: Exp. Stat. Neth. 18
  • 20. Thank you for your attention! • #Questions? NTTS2011 New data sources for statistics: Exp. Stat. Neth. 19