SlideShare a Scribd company logo
1 of 34
Analyzing Twitter data
Issues
  Challenges
    and
      Opportunities



RC33 Conference, Sydney Australia,
9-13 July 2012



Maurice Vergeer
m.vergeer@maw.ru.nl / www.mauricevergeer.nl / blog.mauricevergeer.nl
Radboud University Nijmegen, the Netherlands
   Many platform       Empty platform /
    -   Facebook         infrastructure
    -   Twitter          - Facility
    -   Linkedin
    -   Hyves
    -   RenRen
    -   Cyworld         User generated content
    -   Orkut            -   Text
    -   Youtube          -   Audio
    -   Flickr           -   Video
    -   Plurk            -   Pictures
    -   Sina Weibo
    -   Etc



Social media
Number of articles on politics, Internet and social media
                     180


                     160


                     140


                     120
Number of articles




                     100


                      80


                      60


                      40


                      20


                       0
                           1995   1996   1997    1998    1999   2000   2001   2002    2003    2004       2005   2006   2007   2008    2009    2010    2011      2012
                             Internet and politics (query 1)       Social media and politics (query 2)          Internet, social media and politics (query 3)


Source: Vergeer (in press / 2012) in New Media & Society
Focus on Twitter
The Netherlands



  A special case?
   Opportunities
    ◦ Methodological/technical
       Timeseries analysis
       Network analysis
        ◦ Actors
        ◦ Content
        ◦ Diffusion of information through onine social networks
        ◦ Social media activities

   Limitations
    ◦ Twitter
       Reliability of Twitter API




Outline
•   Within Twitter (using the API)
    • Username
    • Account creation data
    • # of followers
      • And the actual usernames of these followers
    • # of followers
      • And the actual usernames of those being followed
    • Tweet text

    • And many more (see dev.twitter.com)




Data sources
   Tweet
    ◦ Tweet text

    ◦ Whether or not it was a reply to another tweet
       To whom it was a reply (username/screenname and numerical
        userid)

    ◦ Whether or not it was a retweet (according to Twitter)
       Which tweet was retweeted (nunerical tweetid)
   Message of tweet

   Whether or not is was a directed tweet
    (sent to someone in particular)
    ◦ Identified by an @-sign


   Whether or not is was a retweet
    ◦ Identified by RT




Type of content
   Undirected tweet
    ◦ RCMP Commissioner appearing before Public Safety Cmte now.
      What a popular guy - he has his own paparazzi!

   Directed tweet
    ◦ Fantastic blog by my good friend @GlenPearson -
      http://bit.ly/hlAKXp #lpc

   Directed tweet to two usernames
    ◦ @miken32 @CBCEdmonton probably because that is NOT what I
      said--more commercially viable is different than not needed.

   Retweet
    ◦ RT @liberal_party: Think Durham deserves better than Bev Oda?
      Join @BobRaeMP for a rally tomorrow at 1pm http://lpc.ca/durham
      #cdnpoli #lpc




Tweet examples
   Traditional material
    ◦ Produced by professional actors
    ◦ Newspapers
    ◦ Public administration documents

   Social media
    ◦ Produced by
       professional actors
       general public




Content analysis of tweets
   Large quantities of data

   Word frequencies
    ◦ Identifying the most important words in the corpus
    ◦ Code these words into more general categories

   Switch to SPSS (or other type of data management tool)
    ◦ Search for the words in the actual tweets
    ◦ Assign tweet to a specific code

   Improvements in SPSS
    ◦ Compute command facilitates many new text operators
    ◦ Char.index, Char.substr, etc

   Alternative
    ◦ Regular expressions
    ◦ complex




Data extraction
   Publicly available data sources on
    parliament, election council

   Time series
    ◦ Identifying relevant societal/political events
      relevant for the study at hand
      Ex.1 temporarily shut down of election campaign
       due to passenger plane crash of Dutch airliner in
       Libia My 2010
      Ex.2 Deregistration of People s Political Power
       Party of Canada




External data sources
900


800


700


600


500


400


300


200


100


  0
      newspaper   broadcasting    radio    news agency    magazine   online only   local

                          institutional Twitter account       Personal Twitter account     9
Source: Vergeer & Hermans (forthcoming / 2013)
in Journal of Computer-Mediated Communication
1000




                               0
                                   100
                                         200
                                                           500
                                                                             800
                                                                                   900




                                               300
                                                     400
                                                                 600
                                                                       700
                 01-mei-2010
                 02-mei-2010
                 03-mei-2010
                 04-mei-2010
                 05-mei-2010
                 06-mei-2010
                 07-mei-2010
                 08-mei-2010
                 09-mei-2010




          CDA
PvdD
                 10-mei-2010
                 11-mei-2010
                 12-mei-2010




SGP
          PvdA
                 13-mei-2010
                 14-mei-2010
                 15-mei-2010




          SP
NN
                 16-mei-2010
                 17-mei-2010
                 18-mei-2010




          VVD
TON
                 19-mei-2010
                 20-mei-2010
                 21-mei-2010




          PVV
                 22-mei-2010




MenS
                 23-mei-2010
                 24-mei-2010



          GL
HNL
                 25-mei-2010
                 26-mei-2010
                 27-mei-2010
          CU

                 28-mei-2010
Partij1

                 29-mei-2010
                 30-mei-2010
                 31-mei-2010
          D66
Piraten




                 01-jun-2010
                 02-jun-2010
                 03-jun-2010
                 04-jun-2010
                 05-jun-2010
                 06-jun-2010
                 07-jun-2010
                 08-jun-2010
                 09-jun-2010
   Date and time

   For longitudinal analysis and cross-national comparisons
    ◦ take note of the time differences and correct if necessary.
        Time zones
        Daytime saving

   What to do with countries having multiple time zones?
    ◦ Depends on RQs
       Communication patterns: keep a single time zone
       Focus on individual daily patterns: adjust for time zones
   Total tweets by candidates, followers and followed:
    ◦ 4,536,854 tweets

   Breakdown
    ◦ Tweets among candidates:                            appr 2%
    ◦ Tweets to inner circles (followers or being followed)
       appr 18%
    ◦ Tweets to outer circle:                                  appr
      33%
    ◦ Tweets not directed to anyone in particular              appr
      49%

    ◦ Extracting users from tweets (@adresses)




Communication network analysis
 Communication network based on
  candidates identified in tweets
 Excluding the general public




Communication network analysis
   See http://tinyurl.com/blzajsl for
    animated version.
   Retrospective
    ◦ 3200 tweets back in time

   Cost technical
    ◦ Access to firehose for real time data




Limitations in data collection
   Date of tweet
    ◦ Minute fraction is time stamped with the wrong date
   Solution
    ◦ Estimate date and time using the tweetid

   Status of tweet as retweet
    ◦ RT
   Solution:
       Use text search operators to identify real retweets (“RT ”, “rt “)
        Also see http://tinyurl.com/bohhjzn

   Reply to tweets
    ◦ Only the first address is identified
   Solution
    ◦ Search for multiple @-addresses using text extraction methods



Reliability of data as provided by
the API
BIG DATA

The buzz word of these days
 Not gigabyte, ot terabytes,
 But petabytes and exabytes of data
 Only for the few
 Specific hardware requirements
    ◦ Computing power
    ◦ Data storage
   The data presented in this presentation
    ◦ Appr 4.5 million records equals appr 1
      gigabyte, not that Big
There is still so much to be done
with…
•   Focus on specific cases
     -political communication:
         politicians – candidates in elections
     -fan studies
         celebrities
         cast of popular soap opera’s
    ◦ -journalism studies
         journalists and newspapers





Focus on specific cases
 actor information
 information on societal events
 accumulate data over time using the
  same data structure
    ◦ Proonged analysis
    ◦ Multuple case studies, cross-national
      comparative analysis




Enrich existing Twitter data with
external data
   Traditional process (textbook approach)
    ◦ RQ -> research design

   Practice, particularly with secondaire (i.e. third party) data
    ◦ Data  RQ  research design
    ◦ Data  research design  RQ

Twitter
    Content analysis
    Longitudinal analysis
    Network analysis

   Different research designs requires different techniques
   Collaborate



Look at the data from different
angles, i.e. research designs
Thank you for your attention

More Related Content

Similar to Social media presentation held at RC33 conference, Sydney, Australia

Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social MediaDr Wasim Ahmed
 
SKOPOS Defining Social Media
SKOPOS Defining Social MediaSKOPOS Defining Social Media
SKOPOS Defining Social Mediaskoposuk
 
Social Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsIsmail Fahmi
 
Reading the Riots on Twitter
Reading the Riots on TwitterReading the Riots on Twitter
Reading the Riots on Twitterrobnprocter
 
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...Tim Highfield
 
CCI Winter School Workshop on Digital Methods and Social Media Analytics
CCI Winter School Workshop on Digital Methods and Social Media AnalyticsCCI Winter School Workshop on Digital Methods and Social Media Analytics
CCI Winter School Workshop on Digital Methods and Social Media AnalyticsJean Burgess
 
CCI Winter School Social Media Presentation
CCI Winter School Social Media PresentationCCI Winter School Social Media Presentation
CCI Winter School Social Media PresentationDarryl Woodford
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009Peter Mika
 
Social Media and Journalism
Social Media and JournalismSocial Media and Journalism
Social Media and JournalismSocialize Group
 
Social Media and You! Introduction to Social Media
Social Media and You! Introduction to Social MediaSocial Media and You! Introduction to Social Media
Social Media and You! Introduction to Social MediaMala Chandra
 
Wiki-course 'An Introduction to the IT Industry' 2010
Wiki-course 'An Introduction to the IT Industry' 2010Wiki-course 'An Introduction to the IT Industry' 2010
Wiki-course 'An Introduction to the IT Industry' 2010Sergey Dmitriev
 
Social media tool belt presentation at Ravenscroft
Social media tool belt presentation at RavenscroftSocial media tool belt presentation at Ravenscroft
Social media tool belt presentation at RavenscroftedSocialMedia
 
Working With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media CampWorking With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media CampMike Anderson
 
Canadian Municipal Gov 2.0 (Lac Carling 2009)
Canadian Municipal Gov 2.0 (Lac Carling 2009)Canadian Municipal Gov 2.0 (Lac Carling 2009)
Canadian Municipal Gov 2.0 (Lac Carling 2009)Robert Giggey
 
Going beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferenceGoing beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferencemikep007
 
Social Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterSocial Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterAxel Bruns
 
Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Axel Bruns
 

Similar to Social media presentation held at RC33 conference, Sydney, Australia (20)

Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
 
SKOPOS Defining Social Media
SKOPOS Defining Social MediaSKOPOS Defining Social Media
SKOPOS Defining Social Media
 
Social Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official Statistics
 
Reading the Riots on Twitter
Reading the Riots on TwitterReading the Riots on Twitter
Reading the Riots on Twitter
 
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
 
CCI Winter School Workshop on Digital Methods and Social Media Analytics
CCI Winter School Workshop on Digital Methods and Social Media AnalyticsCCI Winter School Workshop on Digital Methods and Social Media Analytics
CCI Winter School Workshop on Digital Methods and Social Media Analytics
 
CCI Winter School Social Media Presentation
CCI Winter School Social Media PresentationCCI Winter School Social Media Presentation
CCI Winter School Social Media Presentation
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
 
Social Media and Journalism
Social Media and JournalismSocial Media and Journalism
Social Media and Journalism
 
Social Media and You! Introduction to Social Media
Social Media and You! Introduction to Social MediaSocial Media and You! Introduction to Social Media
Social Media and You! Introduction to Social Media
 
Omd Meeting
Omd MeetingOmd Meeting
Omd Meeting
 
Wiki-course 'An Introduction to the IT Industry' 2010
Wiki-course 'An Introduction to the IT Industry' 2010Wiki-course 'An Introduction to the IT Industry' 2010
Wiki-course 'An Introduction to the IT Industry' 2010
 
Social media tool belt presentation at Ravenscroft
Social media tool belt presentation at RavenscroftSocial media tool belt presentation at Ravenscroft
Social media tool belt presentation at Ravenscroft
 
Future Media
Future MediaFuture Media
Future Media
 
Working With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media CampWorking With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media Camp
 
Canadian Municipal Gov 2.0 (Lac Carling 2009)
Canadian Municipal Gov 2.0 (Lac Carling 2009)Canadian Municipal Gov 2.0 (Lac Carling 2009)
Canadian Municipal Gov 2.0 (Lac Carling 2009)
 
Going beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferenceGoing beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conference
 
Twitter
TwitterTwitter
Twitter
 
Social Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterSocial Media in Australia: The Case of Twitter
Social Media in Australia: The Case of Twitter
 
Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010
 

Recently uploaded

Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Recently uploaded (20)

Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Social media presentation held at RC33 conference, Sydney, Australia

  • 1. Analyzing Twitter data Issues Challenges and Opportunities RC33 Conference, Sydney Australia, 9-13 July 2012 Maurice Vergeer m.vergeer@maw.ru.nl / www.mauricevergeer.nl / blog.mauricevergeer.nl Radboud University Nijmegen, the Netherlands
  • 2. Many platform  Empty platform / - Facebook infrastructure - Twitter - Facility - Linkedin - Hyves - RenRen - Cyworld  User generated content - Orkut - Text - Youtube - Audio - Flickr - Video - Plurk - Pictures - Sina Weibo - Etc Social media
  • 3. Number of articles on politics, Internet and social media 180 160 140 120 Number of articles 100 80 60 40 20 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Internet and politics (query 1) Social media and politics (query 2) Internet, social media and politics (query 3) Source: Vergeer (in press / 2012) in New Media & Society
  • 5. The Netherlands A special case?
  • 6.
  • 7. Opportunities ◦ Methodological/technical  Timeseries analysis  Network analysis ◦ Actors ◦ Content ◦ Diffusion of information through onine social networks ◦ Social media activities  Limitations ◦ Twitter  Reliability of Twitter API Outline
  • 8. Within Twitter (using the API) • Username • Account creation data • # of followers • And the actual usernames of these followers • # of followers • And the actual usernames of those being followed • Tweet text • And many more (see dev.twitter.com) Data sources
  • 9. Tweet ◦ Tweet text ◦ Whether or not it was a reply to another tweet  To whom it was a reply (username/screenname and numerical userid) ◦ Whether or not it was a retweet (according to Twitter)  Which tweet was retweeted (nunerical tweetid)
  • 10. Message of tweet  Whether or not is was a directed tweet (sent to someone in particular) ◦ Identified by an @-sign  Whether or not is was a retweet ◦ Identified by RT Type of content
  • 11. Undirected tweet ◦ RCMP Commissioner appearing before Public Safety Cmte now. What a popular guy - he has his own paparazzi!  Directed tweet ◦ Fantastic blog by my good friend @GlenPearson - http://bit.ly/hlAKXp #lpc  Directed tweet to two usernames ◦ @miken32 @CBCEdmonton probably because that is NOT what I said--more commercially viable is different than not needed.  Retweet ◦ RT @liberal_party: Think Durham deserves better than Bev Oda? Join @BobRaeMP for a rally tomorrow at 1pm http://lpc.ca/durham #cdnpoli #lpc Tweet examples
  • 12.
  • 13. Traditional material ◦ Produced by professional actors ◦ Newspapers ◦ Public administration documents  Social media ◦ Produced by  professional actors  general public Content analysis of tweets
  • 14. Large quantities of data  Word frequencies ◦ Identifying the most important words in the corpus ◦ Code these words into more general categories  Switch to SPSS (or other type of data management tool) ◦ Search for the words in the actual tweets ◦ Assign tweet to a specific code  Improvements in SPSS ◦ Compute command facilitates many new text operators ◦ Char.index, Char.substr, etc  Alternative ◦ Regular expressions ◦ complex Data extraction
  • 15. Publicly available data sources on parliament, election council  Time series ◦ Identifying relevant societal/political events relevant for the study at hand  Ex.1 temporarily shut down of election campaign due to passenger plane crash of Dutch airliner in Libia My 2010  Ex.2 Deregistration of People s Political Power Party of Canada External data sources
  • 16. 900 800 700 600 500 400 300 200 100 0 newspaper broadcasting radio news agency magazine online only local institutional Twitter account Personal Twitter account 9
  • 17. Source: Vergeer & Hermans (forthcoming / 2013) in Journal of Computer-Mediated Communication
  • 18.
  • 19. 1000 0 100 200 500 800 900 300 400 600 700 01-mei-2010 02-mei-2010 03-mei-2010 04-mei-2010 05-mei-2010 06-mei-2010 07-mei-2010 08-mei-2010 09-mei-2010 CDA PvdD 10-mei-2010 11-mei-2010 12-mei-2010 SGP PvdA 13-mei-2010 14-mei-2010 15-mei-2010 SP NN 16-mei-2010 17-mei-2010 18-mei-2010 VVD TON 19-mei-2010 20-mei-2010 21-mei-2010 PVV 22-mei-2010 MenS 23-mei-2010 24-mei-2010 GL HNL 25-mei-2010 26-mei-2010 27-mei-2010 CU 28-mei-2010 Partij1 29-mei-2010 30-mei-2010 31-mei-2010 D66 Piraten 01-jun-2010 02-jun-2010 03-jun-2010 04-jun-2010 05-jun-2010 06-jun-2010 07-jun-2010 08-jun-2010 09-jun-2010
  • 20. Date and time  For longitudinal analysis and cross-national comparisons ◦ take note of the time differences and correct if necessary.  Time zones  Daytime saving  What to do with countries having multiple time zones? ◦ Depends on RQs  Communication patterns: keep a single time zone  Focus on individual daily patterns: adjust for time zones
  • 21. Total tweets by candidates, followers and followed: ◦ 4,536,854 tweets  Breakdown ◦ Tweets among candidates: appr 2% ◦ Tweets to inner circles (followers or being followed) appr 18% ◦ Tweets to outer circle: appr 33% ◦ Tweets not directed to anyone in particular appr 49% ◦ Extracting users from tweets (@adresses) Communication network analysis
  • 22.  Communication network based on candidates identified in tweets  Excluding the general public Communication network analysis
  • 23.
  • 24. See http://tinyurl.com/blzajsl for animated version.
  • 25. Retrospective ◦ 3200 tweets back in time  Cost technical ◦ Access to firehose for real time data Limitations in data collection
  • 26. Date of tweet ◦ Minute fraction is time stamped with the wrong date  Solution ◦ Estimate date and time using the tweetid  Status of tweet as retweet ◦ RT  Solution:  Use text search operators to identify real retweets (“RT ”, “rt “) Also see http://tinyurl.com/bohhjzn  Reply to tweets ◦ Only the first address is identified  Solution ◦ Search for multiple @-addresses using text extraction methods Reliability of data as provided by the API
  • 27. BIG DATA The buzz word of these days
  • 28.  Not gigabyte, ot terabytes,  But petabytes and exabytes of data
  • 29.  Only for the few  Specific hardware requirements ◦ Computing power ◦ Data storage  The data presented in this presentation ◦ Appr 4.5 million records equals appr 1 gigabyte, not that Big
  • 30. There is still so much to be done with…
  • 31. Focus on specific cases  -political communication:  politicians – candidates in elections  -fan studies  celebrities  cast of popular soap opera’s ◦ -journalism studies  journalists and newspapers  Focus on specific cases
  • 32.  actor information  information on societal events  accumulate data over time using the same data structure ◦ Proonged analysis ◦ Multuple case studies, cross-national comparative analysis Enrich existing Twitter data with external data
  • 33. Traditional process (textbook approach) ◦ RQ -> research design  Practice, particularly with secondaire (i.e. third party) data ◦ Data  RQ  research design ◦ Data  research design  RQ Twitter  Content analysis  Longitudinal analysis  Network analysis  Different research designs requires different techniques  Collaborate Look at the data from different angles, i.e. research designs
  • 34. Thank you for your attention