SlideShare a Scribd company logo
1 of 8
Download to read offline
Validation of Dunbar’s number in Twitter conversations

                                                                         Bruno Gon¸alves1,2 , Nicola Perra1,3 , and Alessandro Vespignani1,2,4
                                                                                  c
                                                                                    1
                                                                                      Center for Complex Networks and Systems Research,
                                                                        School of Informatics and Computing, Indiana University, IN 47408, USA
                                                                            2
                                                                              Pervasive Technology Institute, Indiana University, IN 47404, USA
                                                                       3
                                                                         Linkalab, Complex Systems Computational Lab. - 09100 Cagliari Italy and
                                                                                   4
                                                                                     Institute for Scientific Interchange, Turin 10133, Italy

                                                               Modern society’s increasing dependency on online tools for both work and recreation opens up
                                                             unique opportunities for the study of social interactions. A large survey of online exchanges or
                                                             conversations on Twitter, collected across six months involving 1.7 million individuals is presented
                                                             here. We test the theoretical cognitive limit on the number of stable social relationships known as
arXiv:1105.5170v1 [physics.soc-ph] 25 May 2011




                                                             Dunbar’s number. We find that users can entertain a maximum of 100 − 200 stable relationships
                                                             in support for Dunbar’s prediction. The “economy of attention” is limited in the online world by
                                                             cognitive and biological constraints as predicted by Dunbar’s theory. Inspired by this empirical
                                                             evidence we propose a simple dynamical mechanism, based on finite priority queuing and time
                                                             resources, that reproduces the observed social behavior.


                                                                 I.   INTRODUCTION                               measure typical group size using two different methods
                                                                                                                 and obtained a number of 291, roughly twice Dunbar’s
                                                    Modern society’s increasing dependence on online             estimate.
                                                 tools for both work and recreation has generated an             Biological constraints on social interaction go along with
                                                 unprecedented amount of data regarding social behavior.         other real-world physical limitations. After all, a persons
                                                 While this dependence has made it possible to redefine           time is finite and each person must make her own choices
                                                 the way we study social behavior, new online commu-             about how best to use it given the priority of personal
                                                 nication tools and media are also constantly redefining          preferences, interests, needs, etc. The idea that attention
                                                 social acts and relations. Recently, the divide between         and time are scarce resources led H. Simon [10] to apply
                                                 the physical world and online social realities has been         standard economic tools to study these constraints
                                                 blurred by the new possibilities afforded by real-time           and introduce the concept of an Attention Economy
                                                 communication and broadcasting, which appear to                 with mechanisms similar to our everyday monetary
                                                 greatly enhance our social and cognitive capabilities in        economy. The increasingly fast pace of modern life and
                                                 establishing and maintaining social relations. The com-         overwhelming availability of information has brought
                                                 bination of mobile devices with new tools like Twitter,         a renewed interest in the study of the economy of
                                                 Foursquare, Blippy, Tumblr, Yahoo! Meme, Google                 attention with important applications both in business
                                                 Hotspot, etc., are defining a new era in which we can be         [11] and the study collective human behavior [12]. On
                                                 continuously connected with an ever-increasing number           the one hand it can be argued that microblogging tools
                                                 of individuals through constant digital communication           facilitate the way we handle social interactions and
                                                 composed of small messages and bits of information.             that this results in an online world where human social
                                                 Thus, while new data and computational approaches to            limits are finally lifted, making predictions such as the
                                                 social science [1–3] finally enable us to answer a large         Dunbar’s number obsolete. Microblogging and online
                                                 number of long-standing questions [4–6], we are also            tools on the other hand, might be analogous to a pocket
                                                 increasingly confronted with new questions related to           calculator that, while speeding up the way we can do
                                                 the way social interaction and communication change             simple math, does not improve our cognitive capabilities
                                                 in online social environments: What is the impact that          for mathematics. In this case, the basic cognitive
                                                 modern technology has on social interaction? How do             limits to social interactions are not surpassed in the
                                                 we manage the ever-increasing amount of information             digital world. In this paper we show that the latter
                                                 that demands our attention? In 1992, R. I. M. Dunbar            hypothesis is supported by the analysis of real world
                                                 [7] measured the correlation between neocortical volume         data that identify the presence of Dunbars limit in Twit-
                                                 and typical social group size in a wide range of primates       ter, one of the most successful online microblogging tools.
                                                 and human communities. The result was as surprising
                                                 as it was far-reaching. The limit imposed by neocortical
                                                 processing capacity appears to define the number of
                                                 individuals with whom it is possible to maintain stable                           II.   THE DATASET
                                                 interpersonal relationships. Therefore, the size of the
                                                 brain’s neocortex represents a biological constraint on            Having been granted temporary access to Twitters fire-
                                                 social interaction that limits humans’ social network size      hose we mined the stream for over 6 months to identify a
                                                 to between 100 and 200 individuals [8], i.e. Dunbar’s           large sample of active user accounts. Using the API, we
                                                 number. McCarty et al. [9] independently attempted to           then queried for the complete history of 3 million users,
2

        A)                                                                      B)                    k in = 3           k in = 1
                                                                                                      k out = 3          k out = 1
                                                                                                      w in = 4           w in = 1
                                   4                                                                  w out = 3          w out = 2
                                 Alice
                                                            10                                          B                    C
                                                           Ellie                          k in = 2
                        2                                                                 k out = 1
                       Bob                                                                w in = 2
                                               7                                          w out = 1
                                              Bob
           1                       5                                                        A
         Alice                   Cathy
                                                             9
                                                                                                                             E
                                                           Cathy
                                                                                                             k in = 1    k in = 0
                        3                                                                                    k out = 1   k out = 1
                       Dan                                                                                   w in = 1    w in = 0
                                                                       11                               D    w out = 1   w out = 1
                                                                      Bob
                                  6
                                 Bob




FIG. 1: Reply trees and user network. A) The set of all trees is a forest. Each time a user replies, the corresponding tweet
is connected to another one, resulting in a tree structure. B) Combining all the trees in the forest and projecting them onto
the users results in a directed and weighted network that can be used as a proxy for relationships between users. The number
of outgoing (incoming) connections of a given user is called the out (in) degree and is represented by kout (kin ). The number
of messages flowing along each edge is called the degree, ω. The probability density function P (kout ) (P (kin )) indicates the
probability that any given node has kout (kin ) out (in) degree and it is called the out (in) degree distribution and is a measure
of node diversity on the network.


resulting in a total of over 380 million individual tweets         about the id of the original tweet but also the user that
covering almost 4 years of user activity on Twitter. Ta-           sent it. Using this information, each reply tweet maps
ble I provides some basic statistics about our dataset.            directly to a directed edge. Individual trees can be iden-
Here we analyze this massive dataset of Twitter conver-            tified by using depth first search [17] to identify connected
sations accrued over the span of six months and investi-           components in the resulting tweet-tweet graph. To en-
gate the possibility of deviation from Dunbar’s number             sure that the full tree is found and not just a part of
in the number of stable social relations mediated by this          it, we treat each link as undirected for the purposes of
tool. The pervasive nature of Twitter, along with its              this identification. In this way we are able to extract the
widespread adoption by all layers of society, makes it an          complete tree even if we happen to start on one of the
ideal proxy for the study of social interactions [13–16].          leaves. For each tree the root is then found by locating
We have analyzed over 380 million tweets from which                the node with kin ≡ 0, and distances from the root are
we were able to extract 25 million conversations. Each             measured by rerunning the DFS algorithm starting from
Twitter conversation takes on the form of a tree of tweets,        the root and respecting the direction of each edge.
where each tweet comes as a reply to another. By pro-                 The underlying reply network can be extracted by pro-
jecting this forest of trees onto the users that author each       jecting the tweet trees to a user graph: User A is con-
tweet, we are able to generate a weighted social network           nected to user B by a directed outgoing edge if A replied
connecting over 1.7 million individuals (see Figure 1).            to a tweet sent by B. Over time, any pair of users can ex-
                                                                   change multiple replies either in a single “conversation”
      Tweets                           381, 652, 990               (tree) or through multiple conversations. The number of
      Timelined Users                    3, 006, 180               messages sent from one user to another is used as the
      Scraping Period Nov. 20, 2008 – May 29, 2009
                                                                   weight of the corresponding directed edge and is taken
                                                                   to signify the strength of the connection between the two
      Time span                              4 years
                                                                   users, with higher weights representing stronger connec-
                                                                   tions.
      Trees                                 25, 273, 871
      Tweets in Trees                       81, 728, 252
      User in Trees                          1, 720, 320                             B.     Online conversations
      User-User Edges                       68, 459, 592
                                                                     Each reply creates a connection between two tweets
                  TABLE I: Dataset Statistics.                     and their authors, so we can define a conversation as
                                                                   a branching process of consecutive replies, resulting in
                                                                   a tree of tweets. From our dataset we extracted and
                                                                   analyzed a forest of over 25 million trees. Trees vary
        A.       Tree Identification and Projection                 broadly in size and shape, with most conversations re-
                                                                   maining small while a few grow to include thousands of
  All tweets in our dataset that constituted a reply were          tweets and hundreds of users, as shown in Figure 2.
collected. Each such tweet contains information not only             A directed user-user network can be built by projecting
3



                 0                                                                         0
              10                                                                         10
                                                                       A)                                                               B)
                                                          slope = -3                                                       slope = -5
                 -2
           10                                                                        10
                                                                                          -2


                 -4
           10                                                                             -4
                                                                                     10
      P(Ns)




                                                                                 P(Ds)
                 -6
           10                                                                             -6
                                                                                     10
                 -8
           10                                                                             -8
                                                                                     10
                -10
           10
                                                                                         -10
                                                                                    10
                -12
           10          0    1         2          3            4              5                    0    1     2         3                     4
                 10        10        10        10         10            10                10          10   10         10                 10
                                          Ns                                                               Ds


                   0                                                                       4
              10                                                                         10
                                                                       C)                                                  slope = 1
                                                                                                                                        D)
                                                          slope = -3
                 -2
           10
                                                                                              3
                                                                                         10
                 -4
           10
      P(Nu)




                 -6
                                                                                 Nt      10
                                                                                           2
           10

                 -8
           10                                                                                 1
                                                                                         10

                -10
           10
                                                                                           0
                       0         1         2          3                     4            10 0          1     2         3                     4
                 10             10        10         10                 10                 10         10   10         10                 10
                                          Nu                                                               Ds


FIG. 2: Tree characterization. A)Distribution of the number of tweets in a tree. B) Distribution of the number of shells. C)
Distribution of the number of users. D) Tree size vs depth. The broad tailed nature of all of these quantities indicates the
diversity of behaviors displayed by the users in our dataset.


conversation trees to detail how users interact and estab-                         to the number of different nodes it receives a reply from.
lish relationships among themselves. Bidirectional edges                           When A follows B, A subscribes to receive all the up-
signify mutual interactions, with stronger weights imply-                          dates published by B. A is then one of B’s followers and
ing a more frequent or prolonged interaction between two                           B is one of A’s friends. Previous studies have mostly
individuals.                                                                       focused on the network induced by this follower-friend
   All of our analysis will be performed on this user-user                         relationship [15], [18–20]. In any study about stable so-
conversation network. We consider a user to have out                               cial relations in online media, as indicated by studies
degree kout if he or she replies to kout other users, re-                          about Dunbar’s number, it is important to discount oc-
gardless of the number of explicit followers or friends the                        casional social interactions. For this reason we focus on
given user has. By focusing on direct interactions we are                          stronger relationships in our study[13], considering just
able to eliminate the confounding effect of users that have                         active communication from one user to another by means
tens or hundreds of thousands of followers with whom                               of a genuine social interaction between them. In our net-
they have no contact and are able to focus on real person                          work [21, 22] we introduce the weight ωij of each edge,
to person interactions [13].                                                       defined as the number of times user i replies to user j as
                                                                                   a direct measurement of the interaction strength between
                                                                                   two users and stable relations will be those with a large
    III.      DUNBAR’S NUMBER IN OUR DATA                                          weight. A simple way to measure this effect is to calculate
                                                                                   the average weight of each interaction by a user as a func-
   In the generated network each node corresponds to a                             tion of his total number of interactions. Users that have
single user. The out-degree of the nodes is the number of                          only recently joined Twitter will have few friends and
users the node replies to, while the in-degree corresponds                         very few interactions with them. As time goes by, stable
4

users will acquire more and more friends, but the number       are the keys ingredients in fixing the Dunbar’s number.
of replies that they send to other users will increase con-    We model this considering that when user i receives a
sistently only in stable social interactions. Eventually, a    message it places it in an internal queue that allows up
point is reached where the number of contacts surpasses        to qmax,i messages to be handled at each time step. In
the user’s ability to keep in contact with them.               the presence of finite resources each agent has to make
This saturation process will necessarily lead to some rela-    decisions on what are the most important messages to
tionships being more valued than others. Each individual       answer. We set the priority of each message to be pro-
tries to optimize her resources by prioritizing these inter-   portional to the total degree of the sender j. For each
actions. To quantify the strength of these interactions,       user the we studied is the average number of interactions
                            out                                                 out
we studied the quantity ωi , defined as the average so-         per connection ωi (T ) as defined in the Eq. (1). At each
cial strength of active initiate relationship:                 time step each agent goes through its queue and performs
                                                               the following simple operations:
                   out          j   ωij (T )
                  ωi (T ) =       out                   (1)
                                 ki                               • The agent replies to a random number St of mes-
                                                                    sages between 0 and the number of messages qi
   This quantity corresponds to the average weight per              present in the queue. The messages to be replied
outgoing edge of each individual where T represents the             to are selected proportionally to the priority of the
time window for data aggregation. We measure this                   sending agent (its total degree). A message is then
quantity in our data set as shown in Figure 3A. The data            sent to j, the node we are replying to, and the cor-
shows that this quantity reaches a maximum between 100              responding weight ωij is incremented by one.
and 200 friends, in agreement with Dunbar’s prediction
(see figure 2A). This finding suggests that even though
                                                                  • Messages the agent has replied to are deleted from
modern social networks help us to log all the people with
                                                                    the queue and all incoming messages are added to
whom we meet and interact, they are unable to overcome
                                                                    the queue in a prioritized order until the number of
the biological and physical constraints that limit stable
                                                                    messages reaches qmax . Messages in excess of qmax
social relations. In Figure 2B, we plot , the number of
                                                                    are discarded.
reciprocated connections, as a function of the number
of the in-degree. saturates between 200 and 300 even
though the number of incoming connections continues to         The dynamic process is then repeated for a total num-
increase. This saturation indicates that after this point      ber of time steps T . In order to initialize the process
the system is in a new regime; new connections can be re-      and take into account the effect of endogenous random
ciprocated, but at a much smaller rate than before. This       effects, each agent can broadcast a message to all of its
can be accounted for by spurious exchanges we make with        contacts with some small probability p. One may think of
some contacts with whom we do not maintain an active           this message as a common status change, or a TV appear-
relationship.                                                  ance, news story, or any other information not necessarily
                                                               authored by the sending agent. Since these messages are
                                                               not specifically directed from one user to another, they do
                  IV.    THE MODEL                             not contribute to the weight of the edges through which
                                                               they flow. We have studied this simple model by using an
                                                               underlying network of N = 105 nodes and different scale-
  Let us consider a static network G, characterized by a
                                                               free topologies. For each simulation T = 2 × 104 time
degree distribution P (k). Each user (node) i is connected
                                                               steps have been considered and the plots are made evalu-
to all its nearest neighbors js through two weighted di-
                                                               ating the medians among at least 103 runs. In Figure 4 we
rected edges, i → j and j → i so that:
                                                               report the results of simulations in a directed heavy-tailed
                 out  in       ki                              network with a power-law tail similar to those observed
                ki = ki =         , ∀i ∈ G.             (2)    for the measured network [19]. The figures clearly show a
                               2
                                                               behavior compatible with the empirical data. The peak
         out
Where ki is the out degree, the number of out-going            that maximizes the information output per connection
             in
links, and ki is the in degree, the number of in-going         is linearly proportional to qmax , supporting the idea that
links, of the user i. Each node uses its out links to          the physical constraints entailed in the queue’s maximum
send messages to its contacts and it will receive messages     capacity along with the prioritization that gives impor-
from its contacts through its in links. In this way is easy    tance to popular senders are at the origin of the observed
to distinguish between incoming and outgoing messages.         behavior. We have also performed an extensive sensitiv-
Whenever a message is sent from node i to node j, the          ity analysis on the broadcasting probability p, the time
weight of the (i, j) edge, wij is increased by one. The to-    scale T , and have investigated the effect of agent hetero-
tal number of sent messages of each user is given by the       geneity by studying populations in where each agent’s
sum over all of its outgoing edges. Users communicate          capacity qmax ,i is randomly distributed according to a
with each other by replying to messages. The assump-           Gaussian distribution centered around qmax with stan-
tion of our model is that biological and time constraints      dard deviation σ.
5




                          A)                                                                                                                            B)




                                                                                                       50 100 150 200 250 300 350 400 450 500 550 600
                 8
                 7
                 6
                 5
          ωout




                                                                                                       ρ
                 4
                 3
                 2




                                                                                                       0
                 1




                          0     50   100 150 200 250 300 350 400 450 500 550 600                                                                        0    50   100 150 200 250 300 350 400 450 500 550 600
                                                                          out                                                                                                      in
                                                                      k                                                                                                          k

FIG. 3: A) Out-weight as a function of the out-degree. The average weight of each outward connection gradually increases
until it reaches a maximum near 150 − 200 contacts, signaling that a maximum level of social activity has been reached. Above
this point, an increase in the number of contacts can no longer be sustained with the same amount of dedication to each. The
red line corresponds to the average out-weight, while the gray shaded area illustrates the 50% confidence interval. B) Number of
reciprocated connections, ρ, as a function of kin . As the number of people demanding our attention increases, it will eventually
saturate our ability to reply leading to the flat behavior displayed in the dashed region.


                                                                                                           queues start to get messages in and out. After a while
                 3
         20
              x 10                                                                                         we can aspect that the system reaches a dynamical equi-
                                                        80
                                                                    slope = 0.2
                                                                                          qmax = 50        librium. In Figure (5-A) we show the behavior of our
                                                        60
                                                                                          qmax = 100       observable ω out for different values of T , in particular we
                                                 Peak




                                                        40
         15                                                                               qmax = 150
                                                        20
                                                                                          qmax = 200       chose T = 104 , 1.5 × 104 , 2 × 104 . The effect of time is
                                                         0
                                                                                          qmax = 250
                                                             0   50 100 150 200 250 300
                                                                         qmax
                                                                                          qmax = 300
                                                                                                           clearly a shift on the y axis and a small change in the
   out




         10                                                                                                position of the peak. The first effect is due to the fact
   ω




                                                                                                           that the number of messages circulating in the systems
          5                                                                                                increase linearly with T . The second effect is due to the
                                                                                                           reduction of fluctuations when more messages are sent.
          0
                                                                                                           The peak becomes more clear and defined.
           0          50             100       150                   200           250
                                           out
                                           k

                                                                                                                                                              B.     Effect of broadcast probability p
FIG. 4: Result of running our model on a heterogeneous
network made of N = 105 , nodes with degree distribution
P (k) k−γ with γ = −2.4 and σ = 10. Different curves cor-                                                      The effect of the broadcast probability is different on
respond to different queue size. The inset shows the linear                                                 respect to the effect of the time window T . First of all
dependence of the peak on the queue size q. Each curve is                                                  our observable ω out is linearly proportional to T in all
the median of 103 to 2 × 103 runs of T = 2 × 104 time steps                                                regimes of k out this is not true for p. The effect of p is
                                                                                                           crucial for users with a small number of contacts. As the
                                                                                                           p increases they will receive more messages and their ac-
                                                                                                           tivity will increase too, this does not occur in the other
                     A.        Effect of the time window T
                                                                                                           limit. When the saturation takes place the ω out becomes
                                                                                                           completely independent of p. As show in details in a
   One of the parameters of our model is the time window                                                   mean-field approach (Section (IV D)) for values of k out
T during which we study the dynamics. This parame-                                                         small with respect to the queue size, ω out scales linearly
ter regulates the maximum number of messages that will                                                     with p. Instead for a number of contacts much bigger
circulate in the network. In the first time steps the first                                                  than the queue size ω out is independent of p. These con-
messages will start to being sent among users and the                                                      siderations are validated by our simulations as shown in
6

Figure (5-B). We see a clear dependence on p for small           as the number of in-coming connections, the number of
values of k out instead the same behavior for bigger values      messages it get scale as the square of its degree.
of k out .                                                       Two different regimes are easily found: ki      qmax,i and
                                                                 vice versa.
                                                                 In the first case the user is not popular. The number
          C.    Effect of network’s properties                    of messages that the user will receive is small then. In
                                                                 principle it can reply to all of them at each time step.
   Inspired by several studies [13, 15, 19] we fix the base-      We can assume that in this regime its queue is never
line of our model using scale-free networks. It is im-           completely full. We will refer to Rt as the number of
portant then to study how differences in the network              messages that the user reiceive at the time step t. After
structure affect the results. In this section we consider         one time step the number of replies is:
the effect of the exponent γ. As show in Figure (5-C)
we run our model on top of scale-free networks with                                       S1 = ξ1 R1 ,                               (6)
γ = −2.2, −2.4, −2.6, −2.8. As clear from the plot for
smaller values of γ (bigger value in absolute value) gaps        where ξ1 is a random number uniformly distributed be-
on k out start to emerge. These are due to the network           tween 0 and 1. The number of messages, S2 that the user
structure. The shape and position of the peak is the             send at the second time step is a random fraction of the
same for all the curves. The differences are evident just         messages present in its queue:
on the peak height that increase as γ decreases. This is
                                                                                 S2 = [R1 (1 − ξ1 ) + R2 ] ξ2 .                      (7)
due the different redistribution of degrees and to the fact
that with small γ the selection effect is more and more           For t = 3 we get:
important. So we can say that the result are robust on
γ.                                                                   S3 = {[R1 (1 − ξ1 ) + R2 ] (1 − ξ2 ) + R3 } ξ3
                                                                        = [R1 (1 − ξ1 )(1 − ξ2 ) + R2 (1 − ξ2 ) + R3 ] ξ3 , (8)
        D.     Single user: analytical approach
                                                                 and so on. We can approximate these equations using
                                                                 the average number of received messages R . For the
   In order to get a better understanding of the mecha-          general t it is possible to show that:
nisms we describe, we analyzed, in a mean-field approach,                                                
the behavior of a single user i.                                                                       t   t−1
Let us focus on a user i characterized by degree ki and                      St ∼ R ξt 1 +                     (1 − ξi )
          out
qmax,i . ki = ki /2 are the out-going links that it uses                                            j=1 i=j
                                          in
to send messages to its ki /2 contacts. ki = ki /2 are
the in-coming through which it receives messages from                            = R ξt 2 − ξt−1 + O(ξ 2 )
it contacts. We set as kj the priority of each neighbor                          = R 2ξt − ξt ξt−1 + O(ξ 3 ) .                       (9)
that we extract for a distribution P(k). The rules of the
model that we described in the previous sections are ap-         The total number of messages sent is the numerator of
plied for T time steps. The probability that a neighbor          our measure ω out and the sum of all the St :
j will send a message to the user i is:                                                 t=T
                                  ki                                                          St ∼ T R ,                            (10)
                     pji = p +          ,                  (3)                          t=0
                               < k > kj
where p is the broadcast probability. We can evaluate the        considering that each sum of product random numbers is
average number of messages that the user will receive at         order T . We can write then:
each time step t:                                                                    t=T                          out
                                                                                     t=0 St        1             ki     (k out )2
                                                                    out
                                                                   ωi (T ) =          out     ∼    out T    p         +c i
                              in       ki           1                                ki           ki              2     2<k>
          R =         pji = pki +                      .   (4)
                 j
                                      <k>      in
                                                    kj                       ∼ T       out
                                                                                     ki .                                           (11)
                                            j∈ki

We extracted kj from the same distribution, the sum
                                                                                                                      out
                                                  in
scales then linearly with the number of element: ki . We           In this regime we get a linear increase with ki of the
can write:                                                       average number of replies per connections. As show in
                                            2
                                                                 Figure (6) this is confirmed in the simulations.
                                    ki     ki                    The other regime is found for a number of contacts bigger
               R =        pji = p      +c      ,           (5)
                      j
                                    2     2<k>                   than the queue size. In this case the user is very popular
                                                                 and at each time step it gets a lot of messages and is not
where c is a constant fixed by the distribution. Since the        able to handle all of it. In this limit the saturation process
priority of the user is proportional to its degree as well       takes place and it will reply just to a small fraction of
7

                             3                                                                            3                                                                     3
                          x 10                                                                         x 10                                                                  x 10
               A)    10

                                                             T = 10
                                                                    4                   B)        20

                                                                                                                                             p = 10
                                                                                                                                                      -4
                                                                                                                                                                 C)                                        γ = -2.2
                      8                                                  4                                                                                 -4                                              γ = -2.4
                                                             T = 1.5 x 10                                                                    p = 5x10                  10
                                                                       4                          15                                                                                                       γ = -2.6
                                                             T = 2 x 10                                                                      p = 10
                                                                                                                                                      -3
                                                                                                                                                                                                           γ = -2.8
                      6




                                                                                                                                                                 out
               out




                                                                                            out
                                                                                                  10




                                                                                                                                                                     ω
                ω




                                                                                             ω
                      4                                                                                                                                                  5

                                                                                                   5
                      2


                      0                                                                            0                                                                     0
                       0         50    100       150         200         250                        0         50     100       150     200        250                     0         50   100   150   200     250      300
                                             out                                                                           out                                                                 out
                                             k                                                                             k                                                                   k


FIG. 5: A) ω out as a function of kout for qmax = 100, σ = 10, scale-free network with γ = −2.4 and different values of T . We
present the medians over 500 runs B) ω out as a function of kout for qmax = 100, σ = 10, scale-free network with γ = −2.4,
T = 104 and different values of p. We present the medians over 500 runs C) ω out as a function of kout for qmax = 100, σ = 10,
T = 104 , p = 5 × 10−4 , scale-free network with different values of γ. We present the median over 500 runs.


                                                                                                                           In this regime then we get a different scaling behavior
           x 10
                 3                                                                                                         typical of saturation problems. As shown in Figure (6)
          1
                                                                                                                           these arguments are in perfect agreement with the nu-
                                                                               σ = 20
                                                                               σ = 10                                      merical results.
                                                                               σ=5
                                                                               slope = +1                                  We have shown two different regimes. A linear increasing
                                                                               slope = -1
                                                                                                                           behavior and a decreasing one. In the between of these
                                                                                                                           opposite cases we will find a maximum of the function.
   out




                                                                                                                           The position of these peak is in general function of the
   ω




                                                                                                                           queue size.

         0.1



           1                          10                           100                            1000
                                                   out
                                                 k

FIG. 6: Results for the single user and different values of σ,                                                                                                   V.       CONCLUSIONS
the inter-user queue size variance. We fixed the average queue
size at qmax,i = 50 and extracted the priorities of user neigh-
bors from a power-law statistical distribution with exponent
γ = −2.1. For each ki we run T = 500 time steps and present                                                                   Social networks have changed they way we use to com-
the medians among 103 runs                                                                                                 municate. It is now easy to be connected with a huge
                                                                                                                           number of other individuals. In this paper we show that
                                                                                                                           social networks did not change human social capabili-
the total number of messages prioritizing them. At each                                                                    ties. We analyze a large dataset of Twitter conversations
time step this number is a random variable uniformly                                                                       collected across six months involving millions of individ-
distributed between 0 and qmax,i . We have then:                                                                           uals to test the theoretical cognitive limit on the number
                                                                                                                           of stable social relationships known as Dunbar’s num-
                                                         T                                                                 ber. We found that even in the online world cognitive
                             out                 1
                            ωi (T ) ∼         out              ξt qmax,i .                                    (12)         and biological constraints holds as predicted by Dunbar’s
                                             ki          t=0                                                               theory limiting users social activities. We propose a sim-
                                                                                                                           ple model for users’ behavior that includes finite priority
The ξt s are random variable uniformly distributed be-                                                                     queuing and time resources that reproduces the observed
tween 0 and 1. At each time step the number of replies is                                                                  social behavior. This simple model offers a basic explana-
a random fraction of the queue size. For T large enough                                                                    tion of a seemingly complex phenomena observed in the
we get:                                                                                                                    empirical patterns on Twitter data and offers support to
                                                     T                                                                     Dunbar’s hypothesis of a biological limit to the number
                                  out
                                 ωi (T ) ∼           out qmax,i .                                             (13)         of relationships.
                                                   2ki



 [1] David Lazer, Alex Pentland, Lada Adamic, Sinan Aral,                                                                            takis, Noshir Contractor, James Fowler, Myron Gut-
     Albert-Laszlo Barabasi, Devon Brewer, Nicholas Chris-
8

       mann, Tony Jebara, Gary King, Michael Macy, Deb Roy,              2007.
       and Marshall Van Alstyne. Computational social science.      [13] B. A. Huberman, D. M. Romero, and F. Wu. Social net-
       Science, 323:721, Feb 2009.                                       works that matter: Twitter under the microscope. First
 [2]   D. J. Watts. The new science of networks. Ann. Rev.               Monday, 14:1, 2008.
       Socio., 30:243, 2004.                                        [14] A. Lashinsky. Discover. decipher. disrupt. the true mean-
 [3]   Adrian Cho. Ourselves and our interactions: The ulti-             ing of twitter. Fortune Magazine, 158:39, 2008.
       mate physics problem? Science, 325:406, Nov 2009.            [15] D. Boyd, S. Golder, and G. Lotan. Tweet, tweet, retweet:
 [4]   A.-L. Barab´si. The origin of bursts and heavy tails in
                    a                                                    Conversational aspects of retweeting on twitter. In 43rd
       human dynamics. Nature, 435:207, 2005.                            Hawaii International Conference on System Sciences,
 [5]   C. Castellano, S. Fortunato, and V. Loreto. Statisti-             page 412, 2008.
       cal physics of social dynamics. Rev. Mod. Phys., 81:591,     [16] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue
       2009.                                                             Moon. What is twitter, a social network or a news me-
 [6]   M. C. Gonz´lez, C. A. Hidalgo, and A.-L. Barab´si. Un-
                    a                                    a               dia?, 2010.
       derstanding individual human mobility patterns. Nature,      [17] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein.
       453:779, 2008.                                                    Introduction To Algorithms. The MIT Press, 2nd edition,
 [7]   R. I. M. Dunbar. Neocortex size as a constraint on group          2001.
       size in primates. J. Human Evo., 22:469, 1992.               [18] B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps
 [8]   R. I. M Dunbar. The social brain hypothesis. Evo. An-             about twitter. In Proceedings of the first workshop on
       thro., 6:178, 1998.                                               Online social networks, page 19, 2008.
 [9]   Christopher McCarty, Peter D Killworth, H Russell            [19] A. Java, X. Song, T. Finin, and B. Tseng. Why we twit-
       Bernard, Eugene C Johnsen, and Gene A Shelley. Com-               ter: understanding microblogging usage and communi-
       paring two methods for estimating network size. Human             ties. In Proceedings of the 9th WebKDD and 1st SNA-
       Organization, 60(1):28, Nov 2001.                                 KDD 2007 workshop on Web mining and social network
[10]   H. Simon. Computers, Communication, and the Pub-                  analysis, page 56, 2007.
       lic Interest, chapter Designing Organizations for an         [20] Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto,
       Information-Rich World, pages 38–52. Johns Hopkins                and Krishna P. Gummadi. Measuring user influence in
       Press, 1971.                                                      twitter: The million follower fallacy, 2010.
[11]   T. Davenport and J. C Beck. The Attention Economy:           [21] A. Barrat, M. Barthelemy, and A. Vespignani. Dynamical
       Understanding the New Currency of Business. Harvard               Processes On Complex Networks. Cambridge University
       Business Press, 2002.                                             Press, 2008.
[12]   B. A. Huberman and F. Wu. The economics of attention:        [22] Mark Newman. Networks: An Introduction. Oxford Uni-
       maximizing user value in information-rich environments.           versity Press, 2010.
       In Proceedings of the 1st international workshop on Data
       mining and audience intelligence for advertising, page 16,

More Related Content

What's hot

2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network AnalysisMarc Smith
 
Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"summersocialwebshop
 
Data protection and social networks
Data protection and social networksData protection and social networks
Data protection and social networksblogzilla
 
Connecting To Congress
Connecting To CongressConnecting To Congress
Connecting To CongressInes Mergel
 
UCAS - what is social media? Plenary session Michelle Goodall
UCAS - what is social media? Plenary session Michelle Goodall UCAS - what is social media? Plenary session Michelle Goodall
UCAS - what is social media? Plenary session Michelle Goodall Michelle Goodall
 
Suazo, martínez & elgueta english version
Suazo, martínez & elgueta english versionSuazo, martínez & elgueta english version
Suazo, martínez & elgueta english version2011990
 
2009-HICSS-42-Best paper-A conceptual and operational definition of ‘social r...
2009-HICSS-42-Best paper-A conceptual and operational definition of ‘social r...2009-HICSS-42-Best paper-A conceptual and operational definition of ‘social r...
2009-HICSS-42-Best paper-A conceptual and operational definition of ‘social r...Marc Smith
 
Sensory transformation
Sensory transformationSensory transformation
Sensory transformationKarlos Svoboda
 
Preconditions for Participation
Preconditions for ParticipationPreconditions for Participation
Preconditions for ParticipationThe New School
 
Building Online Communities
Building Online CommunitiesBuilding Online Communities
Building Online CommunitiesLisa Trager
 
Crisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeCrisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeAxel101
 
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc SmithMarc Smith
 
Social computation of emergent networks on user generated content
Social computation of emergent networks on user generated contentSocial computation of emergent networks on user generated content
Social computation of emergent networks on user generated contentMarkus Strohmaier
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3Dave King
 
Twitter Poster
Twitter PosterTwitter Poster
Twitter PosterjiMMUni
 
Social computing meet & greet
Social computing meet & greetSocial computing meet & greet
Social computing meet & greetAngela Brandt
 
Future of the Internet Predictions March 2014 PIP Report
Future of the Internet Predictions March 2014 PIP ReportFuture of the Internet Predictions March 2014 PIP Report
Future of the Internet Predictions March 2014 PIP ReportVasily Ryzhonkov
 

What's hot (20)

2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis
 
Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"
 
Data protection and social networks
Data protection and social networksData protection and social networks
Data protection and social networks
 
Social Intranet
Social IntranetSocial Intranet
Social Intranet
 
Connecting To Congress
Connecting To CongressConnecting To Congress
Connecting To Congress
 
UCAS - what is social media? Plenary session Michelle Goodall
UCAS - what is social media? Plenary session Michelle Goodall UCAS - what is social media? Plenary session Michelle Goodall
UCAS - what is social media? Plenary session Michelle Goodall
 
Suazo, martínez & elgueta english version
Suazo, martínez & elgueta english versionSuazo, martínez & elgueta english version
Suazo, martínez & elgueta english version
 
2009-HICSS-42-Best paper-A conceptual and operational definition of ‘social r...
2009-HICSS-42-Best paper-A conceptual and operational definition of ‘social r...2009-HICSS-42-Best paper-A conceptual and operational definition of ‘social r...
2009-HICSS-42-Best paper-A conceptual and operational definition of ‘social r...
 
Sensory transformation
Sensory transformationSensory transformation
Sensory transformation
 
Preconditions for Participation
Preconditions for ParticipationPreconditions for Participation
Preconditions for Participation
 
Building Online Communities
Building Online CommunitiesBuilding Online Communities
Building Online Communities
 
Thematic unit 2
Thematic unit 2Thematic unit 2
Thematic unit 2
 
Crisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeCrisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 Age
 
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
 
Social computation of emergent networks on user generated content
Social computation of emergent networks on user generated contentSocial computation of emergent networks on user generated content
Social computation of emergent networks on user generated content
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3
 
Twitter Poster
Twitter PosterTwitter Poster
Twitter Poster
 
Social computing meet & greet
Social computing meet & greetSocial computing meet & greet
Social computing meet & greet
 
Dv31821825
Dv31821825Dv31821825
Dv31821825
 
Future of the Internet Predictions March 2014 PIP Report
Future of the Internet Predictions March 2014 PIP ReportFuture of the Internet Predictions March 2014 PIP Report
Future of the Internet Predictions March 2014 PIP Report
 

Similar to Validation of Dunbar's number in Twitter conversations

THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...cscpconf
 
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...csandit
 
Towards Decision Support and Goal AchievementIdentifying Ac.docx
Towards Decision Support and Goal AchievementIdentifying Ac.docxTowards Decision Support and Goal AchievementIdentifying Ac.docx
Towards Decision Support and Goal AchievementIdentifying Ac.docxturveycharlyn
 
Community Evolution in the Digital Space and Creation of Social Information C...
Community Evolution in the Digital Space and Creation of SocialInformation C...Community Evolution in the Digital Space and Creation of SocialInformation C...
Community Evolution in the Digital Space and Creation of Social Information C...Saptarshi Ghosh
 
Estudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social MediaEstudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social Mediaeliasvillagran
 
Selas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedSelas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedZiya NISANOGLU
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-ResearchDavid De Roure
 
Emerged computer interaction with humanity social computing
Emerged computer interaction with humanity social computingEmerged computer interaction with humanity social computing
Emerged computer interaction with humanity social computingijcsa
 
Sociological Research Methods – Group Research ProjectThe Ev.docx
Sociological Research Methods – Group Research ProjectThe Ev.docxSociological Research Methods – Group Research ProjectThe Ev.docx
Sociological Research Methods – Group Research ProjectThe Ev.docxjensgosney
 
Social networking .ppt
Social networking .pptSocial networking .ppt
Social networking .pptPRANJAL SAIKIA
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorIAEME Publication
 
303 final project-Communication Technology in the year 2024
303 final project-Communication Technology in the year 2024303 final project-Communication Technology in the year 2024
303 final project-Communication Technology in the year 2024jonmitch502
 
2014 IE Application- Social Interaction SJW
2014 IE Application- Social Interaction SJW2014 IE Application- Social Interaction SJW
2014 IE Application- Social Interaction SJWSeebas85
 
Unit 1 cape sociology
Unit 1 cape sociologyUnit 1 cape sociology
Unit 1 cape sociologyAndreen18
 
Using ICTs to Promote Cultural Change: A Study from a Higher Education Context
Using ICTs to Promote Cultural Change: A Study from a Higher Education ContextUsing ICTs to Promote Cultural Change: A Study from a Higher Education Context
Using ICTs to Promote Cultural Change: A Study from a Higher Education Contextac2182
 
2021 adt imc_u_ba19_t0563_abah_stephany_mbong_680829564_task_9
2021 adt imc_u_ba19_t0563_abah_stephany_mbong_680829564_task_92021 adt imc_u_ba19_t0563_abah_stephany_mbong_680829564_task_9
2021 adt imc_u_ba19_t0563_abah_stephany_mbong_680829564_task_9PuwaCalvin
 
Suazo%2c martínez & elgueta english version
Suazo%2c martínez & elgueta english versionSuazo%2c martínez & elgueta english version
Suazo%2c martínez & elgueta english version2011990
 

Similar to Validation of Dunbar's number in Twitter conversations (20)

THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
 
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
 
Towards Decision Support and Goal AchievementIdentifying Ac.docx
Towards Decision Support and Goal AchievementIdentifying Ac.docxTowards Decision Support and Goal AchievementIdentifying Ac.docx
Towards Decision Support and Goal AchievementIdentifying Ac.docx
 
Community Evolution in the Digital Space and Creation of Social Information C...
Community Evolution in the Digital Space and Creation of SocialInformation C...Community Evolution in the Digital Space and Creation of SocialInformation C...
Community Evolution in the Digital Space and Creation of Social Information C...
 
Estudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social MediaEstudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social Media
 
Selas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedSelas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media Excerpted
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-Research
 
Emerged computer interaction with humanity social computing
Emerged computer interaction with humanity social computingEmerged computer interaction with humanity social computing
Emerged computer interaction with humanity social computing
 
Sociological Research Methods – Group Research ProjectThe Ev.docx
Sociological Research Methods – Group Research ProjectThe Ev.docxSociological Research Methods – Group Research ProjectThe Ev.docx
Sociological Research Methods – Group Research ProjectThe Ev.docx
 
Social networking .ppt
Social networking .pptSocial networking .ppt
Social networking .ppt
 
The Internet of Things and what it mean for librarians
The Internet of Things and what it mean for librariansThe Internet of Things and what it mean for librarians
The Internet of Things and what it mean for librarians
 
CIC Networked Learning Practices Workshop - Caroline Haythornthwaite
CIC Networked Learning Practices Workshop - Caroline HaythornthwaiteCIC Networked Learning Practices Workshop - Caroline Haythornthwaite
CIC Networked Learning Practices Workshop - Caroline Haythornthwaite
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behavior
 
303 final project-Communication Technology in the year 2024
303 final project-Communication Technology in the year 2024303 final project-Communication Technology in the year 2024
303 final project-Communication Technology in the year 2024
 
2014 IE Application- Social Interaction SJW
2014 IE Application- Social Interaction SJW2014 IE Application- Social Interaction SJW
2014 IE Application- Social Interaction SJW
 
Unit 1 cape sociology
Unit 1 cape sociologyUnit 1 cape sociology
Unit 1 cape sociology
 
Using ICTs to Promote Cultural Change: A Study from a Higher Education Context
Using ICTs to Promote Cultural Change: A Study from a Higher Education ContextUsing ICTs to Promote Cultural Change: A Study from a Higher Education Context
Using ICTs to Promote Cultural Change: A Study from a Higher Education Context
 
Informatics moments
Informatics momentsInformatics moments
Informatics moments
 
2021 adt imc_u_ba19_t0563_abah_stephany_mbong_680829564_task_9
2021 adt imc_u_ba19_t0563_abah_stephany_mbong_680829564_task_92021 adt imc_u_ba19_t0563_abah_stephany_mbong_680829564_task_9
2021 adt imc_u_ba19_t0563_abah_stephany_mbong_680829564_task_9
 
Suazo%2c martínez & elgueta english version
Suazo%2c martínez & elgueta english versionSuazo%2c martínez & elgueta english version
Suazo%2c martínez & elgueta english version
 

More from augustodefranco .

Franco, Augusto (2017) Conservadorismo, liberalismo econômico e democracia
Franco, Augusto (2017) Conservadorismo, liberalismo econômico e democraciaFranco, Augusto (2017) Conservadorismo, liberalismo econômico e democracia
Franco, Augusto (2017) Conservadorismo, liberalismo econômico e democraciaaugustodefranco .
 
Franco, Augusto (2018) Os diferentes adversários da democracia no brasil
Franco, Augusto (2018) Os diferentes adversários da democracia no brasilFranco, Augusto (2018) Os diferentes adversários da democracia no brasil
Franco, Augusto (2018) Os diferentes adversários da democracia no brasilaugustodefranco .
 
A democracia sob ataque terá de ser reinventada
A democracia sob ataque terá de ser reinventadaA democracia sob ataque terá de ser reinventada
A democracia sob ataque terá de ser reinventadaaugustodefranco .
 
Algumas notas sobre os desafios de empreender em rede
Algumas notas sobre os desafios de empreender em redeAlgumas notas sobre os desafios de empreender em rede
Algumas notas sobre os desafios de empreender em redeaugustodefranco .
 
APRENDIZAGEM OU DERIVA ONTOGENICA
APRENDIZAGEM OU DERIVA ONTOGENICA APRENDIZAGEM OU DERIVA ONTOGENICA
APRENDIZAGEM OU DERIVA ONTOGENICA augustodefranco .
 
CONDORCET, Marquês de (1792). Relatório de projeto de decreto sobre a organiz...
CONDORCET, Marquês de (1792). Relatório de projeto de decreto sobre a organiz...CONDORCET, Marquês de (1792). Relatório de projeto de decreto sobre a organiz...
CONDORCET, Marquês de (1792). Relatório de projeto de decreto sobre a organiz...augustodefranco .
 
NIETZSCHE, Friederich (1888). Os "melhoradores" da humanidade, Parte 2 e O qu...
NIETZSCHE, Friederich (1888). Os "melhoradores" da humanidade, Parte 2 e O qu...NIETZSCHE, Friederich (1888). Os "melhoradores" da humanidade, Parte 2 e O qu...
NIETZSCHE, Friederich (1888). Os "melhoradores" da humanidade, Parte 2 e O qu...augustodefranco .
 
100 DIAS DE VERÃO BOOK DO PROGRAMA
100 DIAS DE VERÃO BOOK DO PROGRAMA100 DIAS DE VERÃO BOOK DO PROGRAMA
100 DIAS DE VERÃO BOOK DO PROGRAMAaugustodefranco .
 
Nunca a humanidade dependeu tanto da rede social
Nunca a humanidade dependeu tanto da rede socialNunca a humanidade dependeu tanto da rede social
Nunca a humanidade dependeu tanto da rede socialaugustodefranco .
 
Um sistema estatal de participação social?
Um sistema estatal de participação social?Um sistema estatal de participação social?
Um sistema estatal de participação social?augustodefranco .
 
Quando as eleições conspiram contra a democracia
Quando as eleições conspiram contra a democraciaQuando as eleições conspiram contra a democracia
Quando as eleições conspiram contra a democraciaaugustodefranco .
 
Democracia cooperativa: escritos políticos escolhidos de John Dewey
Democracia cooperativa: escritos políticos escolhidos de John DeweyDemocracia cooperativa: escritos políticos escolhidos de John Dewey
Democracia cooperativa: escritos políticos escolhidos de John Deweyaugustodefranco .
 
RELATÓRIO DO HUMAN RIGHTS WATCH SOBRE A VENEZUELA
RELATÓRIO DO HUMAN RIGHTS WATCH SOBRE A VENEZUELARELATÓRIO DO HUMAN RIGHTS WATCH SOBRE A VENEZUELA
RELATÓRIO DO HUMAN RIGHTS WATCH SOBRE A VENEZUELAaugustodefranco .
 
Diálogo democrático: um manual para practicantes
Diálogo democrático: um manual para practicantesDiálogo democrático: um manual para practicantes
Diálogo democrático: um manual para practicantesaugustodefranco .
 

More from augustodefranco . (20)

Franco, Augusto (2017) Conservadorismo, liberalismo econômico e democracia
Franco, Augusto (2017) Conservadorismo, liberalismo econômico e democraciaFranco, Augusto (2017) Conservadorismo, liberalismo econômico e democracia
Franco, Augusto (2017) Conservadorismo, liberalismo econômico e democracia
 
Franco, Augusto (2018) Os diferentes adversários da democracia no brasil
Franco, Augusto (2018) Os diferentes adversários da democracia no brasilFranco, Augusto (2018) Os diferentes adversários da democracia no brasil
Franco, Augusto (2018) Os diferentes adversários da democracia no brasil
 
Hiérarchie
Hiérarchie Hiérarchie
Hiérarchie
 
A democracia sob ataque terá de ser reinventada
A democracia sob ataque terá de ser reinventadaA democracia sob ataque terá de ser reinventada
A democracia sob ataque terá de ser reinventada
 
JERARQUIA
JERARQUIAJERARQUIA
JERARQUIA
 
Algumas notas sobre os desafios de empreender em rede
Algumas notas sobre os desafios de empreender em redeAlgumas notas sobre os desafios de empreender em rede
Algumas notas sobre os desafios de empreender em rede
 
AS EMPRESAS DIANTE DA CRISE
AS EMPRESAS DIANTE DA CRISEAS EMPRESAS DIANTE DA CRISE
AS EMPRESAS DIANTE DA CRISE
 
APRENDIZAGEM OU DERIVA ONTOGENICA
APRENDIZAGEM OU DERIVA ONTOGENICA APRENDIZAGEM OU DERIVA ONTOGENICA
APRENDIZAGEM OU DERIVA ONTOGENICA
 
CONDORCET, Marquês de (1792). Relatório de projeto de decreto sobre a organiz...
CONDORCET, Marquês de (1792). Relatório de projeto de decreto sobre a organiz...CONDORCET, Marquês de (1792). Relatório de projeto de decreto sobre a organiz...
CONDORCET, Marquês de (1792). Relatório de projeto de decreto sobre a organiz...
 
NIETZSCHE, Friederich (1888). Os "melhoradores" da humanidade, Parte 2 e O qu...
NIETZSCHE, Friederich (1888). Os "melhoradores" da humanidade, Parte 2 e O qu...NIETZSCHE, Friederich (1888). Os "melhoradores" da humanidade, Parte 2 e O qu...
NIETZSCHE, Friederich (1888). Os "melhoradores" da humanidade, Parte 2 e O qu...
 
100 DIAS DE VERÃO BOOK DO PROGRAMA
100 DIAS DE VERÃO BOOK DO PROGRAMA100 DIAS DE VERÃO BOOK DO PROGRAMA
100 DIAS DE VERÃO BOOK DO PROGRAMA
 
Nunca a humanidade dependeu tanto da rede social
Nunca a humanidade dependeu tanto da rede socialNunca a humanidade dependeu tanto da rede social
Nunca a humanidade dependeu tanto da rede social
 
Um sistema estatal de participação social?
Um sistema estatal de participação social?Um sistema estatal de participação social?
Um sistema estatal de participação social?
 
Quando as eleições conspiram contra a democracia
Quando as eleições conspiram contra a democraciaQuando as eleições conspiram contra a democracia
Quando as eleições conspiram contra a democracia
 
100 DIAS DE VERÃO
100 DIAS DE VERÃO100 DIAS DE VERÃO
100 DIAS DE VERÃO
 
Democracia cooperativa: escritos políticos escolhidos de John Dewey
Democracia cooperativa: escritos políticos escolhidos de John DeweyDemocracia cooperativa: escritos políticos escolhidos de John Dewey
Democracia cooperativa: escritos políticos escolhidos de John Dewey
 
MULTIVERSIDADE NA ESCOLA
MULTIVERSIDADE NA ESCOLAMULTIVERSIDADE NA ESCOLA
MULTIVERSIDADE NA ESCOLA
 
DEMOCRACIA E REDES SOCIAIS
DEMOCRACIA E REDES SOCIAISDEMOCRACIA E REDES SOCIAIS
DEMOCRACIA E REDES SOCIAIS
 
RELATÓRIO DO HUMAN RIGHTS WATCH SOBRE A VENEZUELA
RELATÓRIO DO HUMAN RIGHTS WATCH SOBRE A VENEZUELARELATÓRIO DO HUMAN RIGHTS WATCH SOBRE A VENEZUELA
RELATÓRIO DO HUMAN RIGHTS WATCH SOBRE A VENEZUELA
 
Diálogo democrático: um manual para practicantes
Diálogo democrático: um manual para practicantesDiálogo democrático: um manual para practicantes
Diálogo democrático: um manual para practicantes
 

Validation of Dunbar's number in Twitter conversations

  • 1. Validation of Dunbar’s number in Twitter conversations Bruno Gon¸alves1,2 , Nicola Perra1,3 , and Alessandro Vespignani1,2,4 c 1 Center for Complex Networks and Systems Research, School of Informatics and Computing, Indiana University, IN 47408, USA 2 Pervasive Technology Institute, Indiana University, IN 47404, USA 3 Linkalab, Complex Systems Computational Lab. - 09100 Cagliari Italy and 4 Institute for Scientific Interchange, Turin 10133, Italy Modern society’s increasing dependency on online tools for both work and recreation opens up unique opportunities for the study of social interactions. A large survey of online exchanges or conversations on Twitter, collected across six months involving 1.7 million individuals is presented here. We test the theoretical cognitive limit on the number of stable social relationships known as arXiv:1105.5170v1 [physics.soc-ph] 25 May 2011 Dunbar’s number. We find that users can entertain a maximum of 100 − 200 stable relationships in support for Dunbar’s prediction. The “economy of attention” is limited in the online world by cognitive and biological constraints as predicted by Dunbar’s theory. Inspired by this empirical evidence we propose a simple dynamical mechanism, based on finite priority queuing and time resources, that reproduces the observed social behavior. I. INTRODUCTION measure typical group size using two different methods and obtained a number of 291, roughly twice Dunbar’s Modern society’s increasing dependence on online estimate. tools for both work and recreation has generated an Biological constraints on social interaction go along with unprecedented amount of data regarding social behavior. other real-world physical limitations. After all, a persons While this dependence has made it possible to redefine time is finite and each person must make her own choices the way we study social behavior, new online commu- about how best to use it given the priority of personal nication tools and media are also constantly redefining preferences, interests, needs, etc. The idea that attention social acts and relations. Recently, the divide between and time are scarce resources led H. Simon [10] to apply the physical world and online social realities has been standard economic tools to study these constraints blurred by the new possibilities afforded by real-time and introduce the concept of an Attention Economy communication and broadcasting, which appear to with mechanisms similar to our everyday monetary greatly enhance our social and cognitive capabilities in economy. The increasingly fast pace of modern life and establishing and maintaining social relations. The com- overwhelming availability of information has brought bination of mobile devices with new tools like Twitter, a renewed interest in the study of the economy of Foursquare, Blippy, Tumblr, Yahoo! Meme, Google attention with important applications both in business Hotspot, etc., are defining a new era in which we can be [11] and the study collective human behavior [12]. On continuously connected with an ever-increasing number the one hand it can be argued that microblogging tools of individuals through constant digital communication facilitate the way we handle social interactions and composed of small messages and bits of information. that this results in an online world where human social Thus, while new data and computational approaches to limits are finally lifted, making predictions such as the social science [1–3] finally enable us to answer a large Dunbar’s number obsolete. Microblogging and online number of long-standing questions [4–6], we are also tools on the other hand, might be analogous to a pocket increasingly confronted with new questions related to calculator that, while speeding up the way we can do the way social interaction and communication change simple math, does not improve our cognitive capabilities in online social environments: What is the impact that for mathematics. In this case, the basic cognitive modern technology has on social interaction? How do limits to social interactions are not surpassed in the we manage the ever-increasing amount of information digital world. In this paper we show that the latter that demands our attention? In 1992, R. I. M. Dunbar hypothesis is supported by the analysis of real world [7] measured the correlation between neocortical volume data that identify the presence of Dunbars limit in Twit- and typical social group size in a wide range of primates ter, one of the most successful online microblogging tools. and human communities. The result was as surprising as it was far-reaching. The limit imposed by neocortical processing capacity appears to define the number of individuals with whom it is possible to maintain stable II. THE DATASET interpersonal relationships. Therefore, the size of the brain’s neocortex represents a biological constraint on Having been granted temporary access to Twitters fire- social interaction that limits humans’ social network size hose we mined the stream for over 6 months to identify a to between 100 and 200 individuals [8], i.e. Dunbar’s large sample of active user accounts. Using the API, we number. McCarty et al. [9] independently attempted to then queried for the complete history of 3 million users,
  • 2. 2 A) B) k in = 3 k in = 1 k out = 3 k out = 1 w in = 4 w in = 1 4 w out = 3 w out = 2 Alice 10 B C Ellie k in = 2 2 k out = 1 Bob w in = 2 7 w out = 1 Bob 1 5 A Alice Cathy 9 E Cathy k in = 1 k in = 0 3 k out = 1 k out = 1 Dan w in = 1 w in = 0 11 D w out = 1 w out = 1 Bob 6 Bob FIG. 1: Reply trees and user network. A) The set of all trees is a forest. Each time a user replies, the corresponding tweet is connected to another one, resulting in a tree structure. B) Combining all the trees in the forest and projecting them onto the users results in a directed and weighted network that can be used as a proxy for relationships between users. The number of outgoing (incoming) connections of a given user is called the out (in) degree and is represented by kout (kin ). The number of messages flowing along each edge is called the degree, ω. The probability density function P (kout ) (P (kin )) indicates the probability that any given node has kout (kin ) out (in) degree and it is called the out (in) degree distribution and is a measure of node diversity on the network. resulting in a total of over 380 million individual tweets about the id of the original tweet but also the user that covering almost 4 years of user activity on Twitter. Ta- sent it. Using this information, each reply tweet maps ble I provides some basic statistics about our dataset. directly to a directed edge. Individual trees can be iden- Here we analyze this massive dataset of Twitter conver- tified by using depth first search [17] to identify connected sations accrued over the span of six months and investi- components in the resulting tweet-tweet graph. To en- gate the possibility of deviation from Dunbar’s number sure that the full tree is found and not just a part of in the number of stable social relations mediated by this it, we treat each link as undirected for the purposes of tool. The pervasive nature of Twitter, along with its this identification. In this way we are able to extract the widespread adoption by all layers of society, makes it an complete tree even if we happen to start on one of the ideal proxy for the study of social interactions [13–16]. leaves. For each tree the root is then found by locating We have analyzed over 380 million tweets from which the node with kin ≡ 0, and distances from the root are we were able to extract 25 million conversations. Each measured by rerunning the DFS algorithm starting from Twitter conversation takes on the form of a tree of tweets, the root and respecting the direction of each edge. where each tweet comes as a reply to another. By pro- The underlying reply network can be extracted by pro- jecting this forest of trees onto the users that author each jecting the tweet trees to a user graph: User A is con- tweet, we are able to generate a weighted social network nected to user B by a directed outgoing edge if A replied connecting over 1.7 million individuals (see Figure 1). to a tweet sent by B. Over time, any pair of users can ex- change multiple replies either in a single “conversation” Tweets 381, 652, 990 (tree) or through multiple conversations. The number of Timelined Users 3, 006, 180 messages sent from one user to another is used as the Scraping Period Nov. 20, 2008 – May 29, 2009 weight of the corresponding directed edge and is taken to signify the strength of the connection between the two Time span 4 years users, with higher weights representing stronger connec- tions. Trees 25, 273, 871 Tweets in Trees 81, 728, 252 User in Trees 1, 720, 320 B. Online conversations User-User Edges 68, 459, 592 Each reply creates a connection between two tweets TABLE I: Dataset Statistics. and their authors, so we can define a conversation as a branching process of consecutive replies, resulting in a tree of tweets. From our dataset we extracted and analyzed a forest of over 25 million trees. Trees vary A. Tree Identification and Projection broadly in size and shape, with most conversations re- maining small while a few grow to include thousands of All tweets in our dataset that constituted a reply were tweets and hundreds of users, as shown in Figure 2. collected. Each such tweet contains information not only A directed user-user network can be built by projecting
  • 3. 3 0 0 10 10 A) B) slope = -3 slope = -5 -2 10 10 -2 -4 10 -4 10 P(Ns) P(Ds) -6 10 -6 10 -8 10 -8 10 -10 10 -10 10 -12 10 0 1 2 3 4 5 0 1 2 3 4 10 10 10 10 10 10 10 10 10 10 10 Ns Ds 0 4 10 10 C) slope = 1 D) slope = -3 -2 10 3 10 -4 10 P(Nu) -6 Nt 10 2 10 -8 10 1 10 -10 10 0 0 1 2 3 4 10 0 1 2 3 4 10 10 10 10 10 10 10 10 10 10 Nu Ds FIG. 2: Tree characterization. A)Distribution of the number of tweets in a tree. B) Distribution of the number of shells. C) Distribution of the number of users. D) Tree size vs depth. The broad tailed nature of all of these quantities indicates the diversity of behaviors displayed by the users in our dataset. conversation trees to detail how users interact and estab- to the number of different nodes it receives a reply from. lish relationships among themselves. Bidirectional edges When A follows B, A subscribes to receive all the up- signify mutual interactions, with stronger weights imply- dates published by B. A is then one of B’s followers and ing a more frequent or prolonged interaction between two B is one of A’s friends. Previous studies have mostly individuals. focused on the network induced by this follower-friend All of our analysis will be performed on this user-user relationship [15], [18–20]. In any study about stable so- conversation network. We consider a user to have out cial relations in online media, as indicated by studies degree kout if he or she replies to kout other users, re- about Dunbar’s number, it is important to discount oc- gardless of the number of explicit followers or friends the casional social interactions. For this reason we focus on given user has. By focusing on direct interactions we are stronger relationships in our study[13], considering just able to eliminate the confounding effect of users that have active communication from one user to another by means tens or hundreds of thousands of followers with whom of a genuine social interaction between them. In our net- they have no contact and are able to focus on real person work [21, 22] we introduce the weight ωij of each edge, to person interactions [13]. defined as the number of times user i replies to user j as a direct measurement of the interaction strength between two users and stable relations will be those with a large III. DUNBAR’S NUMBER IN OUR DATA weight. A simple way to measure this effect is to calculate the average weight of each interaction by a user as a func- In the generated network each node corresponds to a tion of his total number of interactions. Users that have single user. The out-degree of the nodes is the number of only recently joined Twitter will have few friends and users the node replies to, while the in-degree corresponds very few interactions with them. As time goes by, stable
  • 4. 4 users will acquire more and more friends, but the number are the keys ingredients in fixing the Dunbar’s number. of replies that they send to other users will increase con- We model this considering that when user i receives a sistently only in stable social interactions. Eventually, a message it places it in an internal queue that allows up point is reached where the number of contacts surpasses to qmax,i messages to be handled at each time step. In the user’s ability to keep in contact with them. the presence of finite resources each agent has to make This saturation process will necessarily lead to some rela- decisions on what are the most important messages to tionships being more valued than others. Each individual answer. We set the priority of each message to be pro- tries to optimize her resources by prioritizing these inter- portional to the total degree of the sender j. For each actions. To quantify the strength of these interactions, user the we studied is the average number of interactions out out we studied the quantity ωi , defined as the average so- per connection ωi (T ) as defined in the Eq. (1). At each cial strength of active initiate relationship: time step each agent goes through its queue and performs the following simple operations: out j ωij (T ) ωi (T ) = out (1) ki • The agent replies to a random number St of mes- sages between 0 and the number of messages qi This quantity corresponds to the average weight per present in the queue. The messages to be replied outgoing edge of each individual where T represents the to are selected proportionally to the priority of the time window for data aggregation. We measure this sending agent (its total degree). A message is then quantity in our data set as shown in Figure 3A. The data sent to j, the node we are replying to, and the cor- shows that this quantity reaches a maximum between 100 responding weight ωij is incremented by one. and 200 friends, in agreement with Dunbar’s prediction (see figure 2A). This finding suggests that even though • Messages the agent has replied to are deleted from modern social networks help us to log all the people with the queue and all incoming messages are added to whom we meet and interact, they are unable to overcome the queue in a prioritized order until the number of the biological and physical constraints that limit stable messages reaches qmax . Messages in excess of qmax social relations. In Figure 2B, we plot , the number of are discarded. reciprocated connections, as a function of the number of the in-degree. saturates between 200 and 300 even though the number of incoming connections continues to The dynamic process is then repeated for a total num- increase. This saturation indicates that after this point ber of time steps T . In order to initialize the process the system is in a new regime; new connections can be re- and take into account the effect of endogenous random ciprocated, but at a much smaller rate than before. This effects, each agent can broadcast a message to all of its can be accounted for by spurious exchanges we make with contacts with some small probability p. One may think of some contacts with whom we do not maintain an active this message as a common status change, or a TV appear- relationship. ance, news story, or any other information not necessarily authored by the sending agent. Since these messages are not specifically directed from one user to another, they do IV. THE MODEL not contribute to the weight of the edges through which they flow. We have studied this simple model by using an underlying network of N = 105 nodes and different scale- Let us consider a static network G, characterized by a free topologies. For each simulation T = 2 × 104 time degree distribution P (k). Each user (node) i is connected steps have been considered and the plots are made evalu- to all its nearest neighbors js through two weighted di- ating the medians among at least 103 runs. In Figure 4 we rected edges, i → j and j → i so that: report the results of simulations in a directed heavy-tailed out in ki network with a power-law tail similar to those observed ki = ki = , ∀i ∈ G. (2) for the measured network [19]. The figures clearly show a 2 behavior compatible with the empirical data. The peak out Where ki is the out degree, the number of out-going that maximizes the information output per connection in links, and ki is the in degree, the number of in-going is linearly proportional to qmax , supporting the idea that links, of the user i. Each node uses its out links to the physical constraints entailed in the queue’s maximum send messages to its contacts and it will receive messages capacity along with the prioritization that gives impor- from its contacts through its in links. In this way is easy tance to popular senders are at the origin of the observed to distinguish between incoming and outgoing messages. behavior. We have also performed an extensive sensitiv- Whenever a message is sent from node i to node j, the ity analysis on the broadcasting probability p, the time weight of the (i, j) edge, wij is increased by one. The to- scale T , and have investigated the effect of agent hetero- tal number of sent messages of each user is given by the geneity by studying populations in where each agent’s sum over all of its outgoing edges. Users communicate capacity qmax ,i is randomly distributed according to a with each other by replying to messages. The assump- Gaussian distribution centered around qmax with stan- tion of our model is that biological and time constraints dard deviation σ.
  • 5. 5 A) B) 50 100 150 200 250 300 350 400 450 500 550 600 8 7 6 5 ωout ρ 4 3 2 0 1 0 50 100 150 200 250 300 350 400 450 500 550 600 0 50 100 150 200 250 300 350 400 450 500 550 600 out in k k FIG. 3: A) Out-weight as a function of the out-degree. The average weight of each outward connection gradually increases until it reaches a maximum near 150 − 200 contacts, signaling that a maximum level of social activity has been reached. Above this point, an increase in the number of contacts can no longer be sustained with the same amount of dedication to each. The red line corresponds to the average out-weight, while the gray shaded area illustrates the 50% confidence interval. B) Number of reciprocated connections, ρ, as a function of kin . As the number of people demanding our attention increases, it will eventually saturate our ability to reply leading to the flat behavior displayed in the dashed region. queues start to get messages in and out. After a while 3 20 x 10 we can aspect that the system reaches a dynamical equi- 80 slope = 0.2 qmax = 50 librium. In Figure (5-A) we show the behavior of our 60 qmax = 100 observable ω out for different values of T , in particular we Peak 40 15 qmax = 150 20 qmax = 200 chose T = 104 , 1.5 × 104 , 2 × 104 . The effect of time is 0 qmax = 250 0 50 100 150 200 250 300 qmax qmax = 300 clearly a shift on the y axis and a small change in the out 10 position of the peak. The first effect is due to the fact ω that the number of messages circulating in the systems 5 increase linearly with T . The second effect is due to the reduction of fluctuations when more messages are sent. 0 The peak becomes more clear and defined. 0 50 100 150 200 250 out k B. Effect of broadcast probability p FIG. 4: Result of running our model on a heterogeneous network made of N = 105 , nodes with degree distribution P (k) k−γ with γ = −2.4 and σ = 10. Different curves cor- The effect of the broadcast probability is different on respond to different queue size. The inset shows the linear respect to the effect of the time window T . First of all dependence of the peak on the queue size q. Each curve is our observable ω out is linearly proportional to T in all the median of 103 to 2 × 103 runs of T = 2 × 104 time steps regimes of k out this is not true for p. The effect of p is crucial for users with a small number of contacts. As the p increases they will receive more messages and their ac- tivity will increase too, this does not occur in the other A. Effect of the time window T limit. When the saturation takes place the ω out becomes completely independent of p. As show in details in a One of the parameters of our model is the time window mean-field approach (Section (IV D)) for values of k out T during which we study the dynamics. This parame- small with respect to the queue size, ω out scales linearly ter regulates the maximum number of messages that will with p. Instead for a number of contacts much bigger circulate in the network. In the first time steps the first than the queue size ω out is independent of p. These con- messages will start to being sent among users and the siderations are validated by our simulations as shown in
  • 6. 6 Figure (5-B). We see a clear dependence on p for small as the number of in-coming connections, the number of values of k out instead the same behavior for bigger values messages it get scale as the square of its degree. of k out . Two different regimes are easily found: ki qmax,i and vice versa. In the first case the user is not popular. The number C. Effect of network’s properties of messages that the user will receive is small then. In principle it can reply to all of them at each time step. Inspired by several studies [13, 15, 19] we fix the base- We can assume that in this regime its queue is never line of our model using scale-free networks. It is im- completely full. We will refer to Rt as the number of portant then to study how differences in the network messages that the user reiceive at the time step t. After structure affect the results. In this section we consider one time step the number of replies is: the effect of the exponent γ. As show in Figure (5-C) we run our model on top of scale-free networks with S1 = ξ1 R1 , (6) γ = −2.2, −2.4, −2.6, −2.8. As clear from the plot for smaller values of γ (bigger value in absolute value) gaps where ξ1 is a random number uniformly distributed be- on k out start to emerge. These are due to the network tween 0 and 1. The number of messages, S2 that the user structure. The shape and position of the peak is the send at the second time step is a random fraction of the same for all the curves. The differences are evident just messages present in its queue: on the peak height that increase as γ decreases. This is S2 = [R1 (1 − ξ1 ) + R2 ] ξ2 . (7) due the different redistribution of degrees and to the fact that with small γ the selection effect is more and more For t = 3 we get: important. So we can say that the result are robust on γ. S3 = {[R1 (1 − ξ1 ) + R2 ] (1 − ξ2 ) + R3 } ξ3 = [R1 (1 − ξ1 )(1 − ξ2 ) + R2 (1 − ξ2 ) + R3 ] ξ3 , (8) D. Single user: analytical approach and so on. We can approximate these equations using the average number of received messages R . For the In order to get a better understanding of the mecha- general t it is possible to show that: nisms we describe, we analyzed, in a mean-field approach,   the behavior of a single user i. t t−1 Let us focus on a user i characterized by degree ki and St ∼ R ξt 1 + (1 − ξi ) out qmax,i . ki = ki /2 are the out-going links that it uses j=1 i=j in to send messages to its ki /2 contacts. ki = ki /2 are the in-coming through which it receives messages from = R ξt 2 − ξt−1 + O(ξ 2 ) it contacts. We set as kj the priority of each neighbor = R 2ξt − ξt ξt−1 + O(ξ 3 ) . (9) that we extract for a distribution P(k). The rules of the model that we described in the previous sections are ap- The total number of messages sent is the numerator of plied for T time steps. The probability that a neighbor our measure ω out and the sum of all the St : j will send a message to the user i is: t=T ki St ∼ T R , (10) pji = p + , (3) t=0 < k > kj where p is the broadcast probability. We can evaluate the considering that each sum of product random numbers is average number of messages that the user will receive at order T . We can write then: each time step t: t=T out t=0 St 1 ki (k out )2 out ωi (T ) = out ∼ out T p +c i in ki 1 ki ki 2 2<k> R = pji = pki + . (4) j <k> in kj ∼ T out ki . (11) j∈ki We extracted kj from the same distribution, the sum out in scales then linearly with the number of element: ki . We In this regime we get a linear increase with ki of the can write: average number of replies per connections. As show in 2 Figure (6) this is confirmed in the simulations. ki ki The other regime is found for a number of contacts bigger R = pji = p +c , (5) j 2 2<k> than the queue size. In this case the user is very popular and at each time step it gets a lot of messages and is not where c is a constant fixed by the distribution. Since the able to handle all of it. In this limit the saturation process priority of the user is proportional to its degree as well takes place and it will reply just to a small fraction of
  • 7. 7 3 3 3 x 10 x 10 x 10 A) 10 T = 10 4 B) 20 p = 10 -4 C) γ = -2.2 8 4 -4 γ = -2.4 T = 1.5 x 10 p = 5x10 10 4 15 γ = -2.6 T = 2 x 10 p = 10 -3 γ = -2.8 6 out out out 10 ω ω ω 4 5 5 2 0 0 0 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 300 out out out k k k FIG. 5: A) ω out as a function of kout for qmax = 100, σ = 10, scale-free network with γ = −2.4 and different values of T . We present the medians over 500 runs B) ω out as a function of kout for qmax = 100, σ = 10, scale-free network with γ = −2.4, T = 104 and different values of p. We present the medians over 500 runs C) ω out as a function of kout for qmax = 100, σ = 10, T = 104 , p = 5 × 10−4 , scale-free network with different values of γ. We present the median over 500 runs. In this regime then we get a different scaling behavior x 10 3 typical of saturation problems. As shown in Figure (6) 1 these arguments are in perfect agreement with the nu- σ = 20 σ = 10 merical results. σ=5 slope = +1 We have shown two different regimes. A linear increasing slope = -1 behavior and a decreasing one. In the between of these opposite cases we will find a maximum of the function. out The position of these peak is in general function of the ω queue size. 0.1 1 10 100 1000 out k FIG. 6: Results for the single user and different values of σ, V. CONCLUSIONS the inter-user queue size variance. We fixed the average queue size at qmax,i = 50 and extracted the priorities of user neigh- bors from a power-law statistical distribution with exponent γ = −2.1. For each ki we run T = 500 time steps and present Social networks have changed they way we use to com- the medians among 103 runs municate. It is now easy to be connected with a huge number of other individuals. In this paper we show that social networks did not change human social capabili- the total number of messages prioritizing them. At each ties. We analyze a large dataset of Twitter conversations time step this number is a random variable uniformly collected across six months involving millions of individ- distributed between 0 and qmax,i . We have then: uals to test the theoretical cognitive limit on the number of stable social relationships known as Dunbar’s num- T ber. We found that even in the online world cognitive out 1 ωi (T ) ∼ out ξt qmax,i . (12) and biological constraints holds as predicted by Dunbar’s ki t=0 theory limiting users social activities. We propose a sim- ple model for users’ behavior that includes finite priority The ξt s are random variable uniformly distributed be- queuing and time resources that reproduces the observed tween 0 and 1. At each time step the number of replies is social behavior. This simple model offers a basic explana- a random fraction of the queue size. For T large enough tion of a seemingly complex phenomena observed in the we get: empirical patterns on Twitter data and offers support to T Dunbar’s hypothesis of a biological limit to the number out ωi (T ) ∼ out qmax,i . (13) of relationships. 2ki [1] David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, takis, Noshir Contractor, James Fowler, Myron Gut- Albert-Laszlo Barabasi, Devon Brewer, Nicholas Chris-
  • 8. 8 mann, Tony Jebara, Gary King, Michael Macy, Deb Roy, 2007. and Marshall Van Alstyne. Computational social science. [13] B. A. Huberman, D. M. Romero, and F. Wu. Social net- Science, 323:721, Feb 2009. works that matter: Twitter under the microscope. First [2] D. J. Watts. The new science of networks. Ann. Rev. Monday, 14:1, 2008. Socio., 30:243, 2004. [14] A. Lashinsky. Discover. decipher. disrupt. the true mean- [3] Adrian Cho. Ourselves and our interactions: The ulti- ing of twitter. Fortune Magazine, 158:39, 2008. mate physics problem? Science, 325:406, Nov 2009. [15] D. Boyd, S. Golder, and G. Lotan. Tweet, tweet, retweet: [4] A.-L. Barab´si. The origin of bursts and heavy tails in a Conversational aspects of retweeting on twitter. In 43rd human dynamics. Nature, 435:207, 2005. Hawaii International Conference on System Sciences, [5] C. Castellano, S. Fortunato, and V. Loreto. Statisti- page 412, 2008. cal physics of social dynamics. Rev. Mod. Phys., 81:591, [16] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue 2009. Moon. What is twitter, a social network or a news me- [6] M. C. Gonz´lez, C. A. Hidalgo, and A.-L. Barab´si. Un- a a dia?, 2010. derstanding individual human mobility patterns. Nature, [17] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 453:779, 2008. Introduction To Algorithms. The MIT Press, 2nd edition, [7] R. I. M. Dunbar. Neocortex size as a constraint on group 2001. size in primates. J. Human Evo., 22:469, 1992. [18] B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps [8] R. I. M Dunbar. The social brain hypothesis. Evo. An- about twitter. In Proceedings of the first workshop on thro., 6:178, 1998. Online social networks, page 19, 2008. [9] Christopher McCarty, Peter D Killworth, H Russell [19] A. Java, X. Song, T. Finin, and B. Tseng. Why we twit- Bernard, Eugene C Johnsen, and Gene A Shelley. Com- ter: understanding microblogging usage and communi- paring two methods for estimating network size. Human ties. In Proceedings of the 9th WebKDD and 1st SNA- Organization, 60(1):28, Nov 2001. KDD 2007 workshop on Web mining and social network [10] H. Simon. Computers, Communication, and the Pub- analysis, page 56, 2007. lic Interest, chapter Designing Organizations for an [20] Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, Information-Rich World, pages 38–52. Johns Hopkins and Krishna P. Gummadi. Measuring user influence in Press, 1971. twitter: The million follower fallacy, 2010. [11] T. Davenport and J. C Beck. The Attention Economy: [21] A. Barrat, M. Barthelemy, and A. Vespignani. Dynamical Understanding the New Currency of Business. Harvard Processes On Complex Networks. Cambridge University Business Press, 2002. Press, 2008. [12] B. A. Huberman and F. Wu. The economics of attention: [22] Mark Newman. Networks: An Introduction. Oxford Uni- maximizing user value in information-rich environments. versity Press, 2010. In Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, page 16,