Twittering Dissent
   Social Web Data Streams as a Basis for

Agent Based Models of Opinion Dynamics




Presentation held at GOR09 08/04/2009
Authors
   Pascal Jürgens              Andreas Jungherr
 pascal.juergens@gmail.com      andreas.jungherr@gmail.com
      Graduate Student               Graduate Student
Department of Communication    Department of Political Science
University of Mainz, Germany   University of Mainz, Germany
1


The Challenge
           To gain

valid, representative, relevant

insight into human behavior

      from online data.
2

     Audience Dissent at SXSW
           Conference

"Never, ever have I seen such a train wreck of
an interview," said Jason Pontin on Twitter.
"Talk about something interesting," one attendee
yelled about halfway through the keynote. The remark was met with
waves of cheering and applause
                                — Wired.com article about the event
3


The Event of Interest
•   Interview with Mark Zuckerberg of Facebook at
    SXSW conference, Austin (Tx), March 9th, 2008
•   During the interview, negative comments
    posted by audience members spread through
    the microblogging service Twitter
•   Unrest from the electronic backchannel spills
    over into the room, leading to booing, shouted
    interjections and eventually the abortion of the
    interview
4


        Microblogging Overview
                1. long term, one-way, mass audience communication


                                      normal
                                      message
user                                                                       subscribers
                       @
                     message




                                  non - subscribers


       2. ad hoc, bidirectional, mass audience + select recipient communication
5

             The Event:
         Interesting Aspects
•   Unique interaction of online/offline behavior

•   Backchannel (invisible communication)

•   Ad-hoc formation and group action in a small
    group setting

•   Possibly an example of “augmented reality”

•   All within strict spatial and temporal
    constraints
6


     Research Challenges
•   Problem: Ex Post proof of causality is difficult
    (qualitative in-depth interviews might be an
    option)
•   Goal: Relate online data to offline situation
    without data
•   Therefore: Our approach: Infuse online data into
    mathematical model on the basis of established
    social-psychological theories - then use model
    as basis for hypothesis generation
7


          Data — Benefits

•   (mostly) unique users

•   Central data repository

•   User-coded intents:
    replies (@user), topics (#topic)

•   Low syntactical complexity
    as a result of limited message length (140
    characters)
8


            Data — Risks

•   No guarantee of data availability

•   Usage limit — limited data acquisition

•   Message size may discourage complex issues as
    conversation topics

•   Low temporal resolution (time zone ambiguity,
    latencies)
9


                      Our Data Set




      Robert Scoble
(well connected blogger)   All users directly addressed    Messages during
 over 2700 subscribers      by R. Scoble (@-messages)     interview (N = 259)
10


                Our Data Set
    Interview                                      Replacement Session

                             all tweets incl. neutral
                             negative tweets




•   Messages hand-coded for relevance and
    negativity

•   Time series supports role of backchannel
11


             Our Data Set
60%                 58%   57%



45%                                                 Neutral Tweets
      39%                                           Positive Tweets
                                          36%
                                                    Negative Tweets

30%

                                                   N total = 814
                                                  N during = 259
15%
                                  7%
            3%
0%      During Keynote     Complete Time Series
12


              Modeling
•   Known from physics, econonomics, recently
    spawned field of “econophysics”
•   Roots in game theory, particle simulation
•   Limited number of variables, Monte Carlo
    approach to find out behavior
•   Good in conditions of high uncertainty
    regarding interplay of factors (explorative
    theory building)
•   Bad for forecasts, establishing causality
13

      Established Models of
         Communication

•   Models of Opinion Dynamics and Formation

•   Hegselmann / Krause Model: Repeated
    averaging of opinions with neighbors

•   “In the HK model, each agent moves to the
    average opinion of all agents which lie in her
    area of confidence (including herself ).” (Lorenz
    2007)
14


     Building the Model




Snapshot:               Time plot:
Grid with backchannel   Opinion space over time
15


     Model Step-by-Step
•   Start with random distribution of opinions
    (gauss distribution) on 32 x 32 grid
•   Twenty randomly chosen agents are
    interconnected via a backchannel. Each of those
    agents is connected to ten peer agents
•   On each tick, every agent builds a new opinion
    Oj from the current opinions Oi of her eight
    immediate neighbors and her own
•   Connected agents build a new opinion from the
    eight neighbors and all connected agents
16


    Introducing Tolerance

•   Tolerance = acceptance or rejection of opinions
•   Rejected opinions are disregarded in the process
    of opinion formation
•   In modeling: bounded confidence (ε)
•   Only agents within a certain distance on the
    one-dimensional opinion space are considered
17


Sample Model Run
18


                              Model Performance
                                                                           .4 .5
                       1000                                                .3

                       800
n of negative agents




                                                                      bounded
                       600             negative backchannel          confidence
                                       backchannel with positive /
                                       negative ratio from data
                       400
                                                                           0 1

                       200
                                                                           .2

                                 100      200                  300   400   t
19


       Model Findings
•   Tolerance of agents (bounded confidence) is the
    key factor for consensus / overwhelming

•   The number of connected agents (connection
    density) speeds up the process, but does not
    affect the final outcome

•   A surprisingly simple model captures social
    interaction under the influence of pervasive
    electronic communication
20

Modeling for Hypothesis Generation
        from Online Data



•   Similar operationalizations
•   Can be based on empirical data
•   Based on existing theories / models
•   Good science workflow (explicit assumptions)
•   Known / measured factors can be plugged in
21


    Further Investigations

•   Bifurcation analysis of model design: create
    general description and describe relevant
    factors and their turning points

•   Controlled experimental corroboration

•   Events with longer time-scale and without
    spatial restrictions
Thank you!
 For your interest
i


Logistics: Lessons Learned
In early summer 2008, Twitter cut access to
archives before april 2008 (subsequently
extended).
➡ Corroboration requires published data sets

Both data acquisition and modeling require
substantial computational capacity. Some data is
only accessible through special agreements with
companies.
➡ Joined research efforts might be needed
ii


                    Literature
•   Why we twitter: understanding microblogging usage and communities.
    Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on
    Web mining and social network analysis (2007) pp. 56-65

•   Rainer Hegselmann and Ulrich Krause. Opinion Dynamics and
    Bounded Confidence, Models, Analysis and Simulation. Journal of
    Artificial Societies and Social Simulation, 5(3), 2002.

•   Zheng and Hui. Dynamics of opinion formation in a small-world
    network. arXiv (2005) vol. physics.soc-ph

•   Lorenz. Continuous Opinion Dynamics under Bounded Confidence: A
    Survey. arXiv (2007) vol. physics.soc-phJava et al.

•   Photo credit: “Zuckerberg Keynote” by pescatello (CC Attribution 2.0)

Twittering Dissent

  • 1.
    Twittering Dissent Social Web Data Streams as a Basis for Agent Based Models of Opinion Dynamics Presentation held at GOR09 08/04/2009
  • 2.
    Authors Pascal Jürgens Andreas Jungherr pascal.juergens@gmail.com andreas.jungherr@gmail.com Graduate Student Graduate Student Department of Communication Department of Political Science University of Mainz, Germany University of Mainz, Germany
  • 3.
    1 The Challenge To gain valid, representative, relevant insight into human behavior from online data.
  • 4.
    2 Audience Dissent at SXSW Conference "Never, ever have I seen such a train wreck of an interview," said Jason Pontin on Twitter. "Talk about something interesting," one attendee yelled about halfway through the keynote. The remark was met with waves of cheering and applause — Wired.com article about the event
  • 5.
    3 The Event ofInterest • Interview with Mark Zuckerberg of Facebook at SXSW conference, Austin (Tx), March 9th, 2008 • During the interview, negative comments posted by audience members spread through the microblogging service Twitter • Unrest from the electronic backchannel spills over into the room, leading to booing, shouted interjections and eventually the abortion of the interview
  • 6.
    4 Microblogging Overview 1. long term, one-way, mass audience communication normal message user subscribers @ message non - subscribers 2. ad hoc, bidirectional, mass audience + select recipient communication
  • 7.
    5 The Event: Interesting Aspects • Unique interaction of online/offline behavior • Backchannel (invisible communication) • Ad-hoc formation and group action in a small group setting • Possibly an example of “augmented reality” • All within strict spatial and temporal constraints
  • 8.
    6 Research Challenges • Problem: Ex Post proof of causality is difficult (qualitative in-depth interviews might be an option) • Goal: Relate online data to offline situation without data • Therefore: Our approach: Infuse online data into mathematical model on the basis of established social-psychological theories - then use model as basis for hypothesis generation
  • 9.
    7 Data — Benefits • (mostly) unique users • Central data repository • User-coded intents: replies (@user), topics (#topic) • Low syntactical complexity as a result of limited message length (140 characters)
  • 10.
    8 Data — Risks • No guarantee of data availability • Usage limit — limited data acquisition • Message size may discourage complex issues as conversation topics • Low temporal resolution (time zone ambiguity, latencies)
  • 11.
    9 Our Data Set Robert Scoble (well connected blogger) All users directly addressed Messages during over 2700 subscribers by R. Scoble (@-messages) interview (N = 259)
  • 12.
    10 Our Data Set Interview Replacement Session all tweets incl. neutral negative tweets • Messages hand-coded for relevance and negativity • Time series supports role of backchannel
  • 13.
    11 Our Data Set 60% 58% 57% 45% Neutral Tweets 39% Positive Tweets 36% Negative Tweets 30% N total = 814 N during = 259 15% 7% 3% 0% During Keynote Complete Time Series
  • 14.
    12 Modeling • Known from physics, econonomics, recently spawned field of “econophysics” • Roots in game theory, particle simulation • Limited number of variables, Monte Carlo approach to find out behavior • Good in conditions of high uncertainty regarding interplay of factors (explorative theory building) • Bad for forecasts, establishing causality
  • 15.
    13 Established Models of Communication • Models of Opinion Dynamics and Formation • Hegselmann / Krause Model: Repeated averaging of opinions with neighbors • “In the HK model, each agent moves to the average opinion of all agents which lie in her area of confidence (including herself ).” (Lorenz 2007)
  • 16.
    14 Building the Model Snapshot: Time plot: Grid with backchannel Opinion space over time
  • 17.
    15 Model Step-by-Step • Start with random distribution of opinions (gauss distribution) on 32 x 32 grid • Twenty randomly chosen agents are interconnected via a backchannel. Each of those agents is connected to ten peer agents • On each tick, every agent builds a new opinion Oj from the current opinions Oi of her eight immediate neighbors and her own • Connected agents build a new opinion from the eight neighbors and all connected agents
  • 18.
    16 Introducing Tolerance • Tolerance = acceptance or rejection of opinions • Rejected opinions are disregarded in the process of opinion formation • In modeling: bounded confidence (ε) • Only agents within a certain distance on the one-dimensional opinion space are considered
  • 19.
  • 20.
    18 Model Performance .4 .5 1000 .3 800 n of negative agents bounded 600 negative backchannel confidence backchannel with positive / negative ratio from data 400 0 1 200 .2 100 200 300 400 t
  • 21.
    19 Model Findings • Tolerance of agents (bounded confidence) is the key factor for consensus / overwhelming • The number of connected agents (connection density) speeds up the process, but does not affect the final outcome • A surprisingly simple model captures social interaction under the influence of pervasive electronic communication
  • 22.
    20 Modeling for HypothesisGeneration from Online Data • Similar operationalizations • Can be based on empirical data • Based on existing theories / models • Good science workflow (explicit assumptions) • Known / measured factors can be plugged in
  • 23.
    21 Further Investigations • Bifurcation analysis of model design: create general description and describe relevant factors and their turning points • Controlled experimental corroboration • Events with longer time-scale and without spatial restrictions
  • 24.
    Thank you! Foryour interest
  • 25.
    i Logistics: Lessons Learned Inearly summer 2008, Twitter cut access to archives before april 2008 (subsequently extended). ➡ Corroboration requires published data sets Both data acquisition and modeling require substantial computational capacity. Some data is only accessible through special agreements with companies. ➡ Joined research efforts might be needed
  • 26.
    ii Literature • Why we twitter: understanding microblogging usage and communities. Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis (2007) pp. 56-65 • Rainer Hegselmann and Ulrich Krause. Opinion Dynamics and Bounded Confidence, Models, Analysis and Simulation. Journal of Artificial Societies and Social Simulation, 5(3), 2002. • Zheng and Hui. Dynamics of opinion formation in a small-world network. arXiv (2005) vol. physics.soc-ph • Lorenz. Continuous Opinion Dynamics under Bounded Confidence: A Survey. arXiv (2007) vol. physics.soc-phJava et al. • Photo credit: “Zuckerberg Keynote” by pescatello (CC Attribution 2.0)