Validation of Dunbar's number in Twitter conversations


Published on

Bruno Goncalves1;2, Nicola Perra1;3, and Alessandro Vespignani1;2;4
1Center for Complex Networks and Systems Research,
School of Informatics and Computing, Indiana University, IN 47408, USA
2Pervasive Technology Institute, Indiana University, IN 47404, USA
3Linkalab, Complex Systems Computational Lab. - 09100 Cagliari Italy and
4Institute for Scienti c Interchange, Turin 10133, Italy
Maio 2011

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Validation of Dunbar's number in Twitter conversations

  1. 1. Validation of Dunbar’s number in Twitter conversations Bruno Gon¸alves1,2 , Nicola Perra1,3 , and Alessandro Vespignani1,2,4 c 1 Center for Complex Networks and Systems Research, School of Informatics and Computing, Indiana University, IN 47408, USA 2 Pervasive Technology Institute, Indiana University, IN 47404, USA 3 Linkalab, Complex Systems Computational Lab. - 09100 Cagliari Italy and 4 Institute for Scientific Interchange, Turin 10133, Italy Modern society’s increasing dependency on online tools for both work and recreation opens up unique opportunities for the study of social interactions. A large survey of online exchanges or conversations on Twitter, collected across six months involving 1.7 million individuals is presented here. We test the theoretical cognitive limit on the number of stable social relationships known asarXiv:1105.5170v1 [physics.soc-ph] 25 May 2011 Dunbar’s number. We find that users can entertain a maximum of 100 − 200 stable relationships in support for Dunbar’s prediction. The “economy of attention” is limited in the online world by cognitive and biological constraints as predicted by Dunbar’s theory. Inspired by this empirical evidence we propose a simple dynamical mechanism, based on finite priority queuing and time resources, that reproduces the observed social behavior. I. INTRODUCTION measure typical group size using two different methods and obtained a number of 291, roughly twice Dunbar’s Modern society’s increasing dependence on online estimate. tools for both work and recreation has generated an Biological constraints on social interaction go along with unprecedented amount of data regarding social behavior. other real-world physical limitations. After all, a persons While this dependence has made it possible to redefine time is finite and each person must make her own choices the way we study social behavior, new online commu- about how best to use it given the priority of personal nication tools and media are also constantly redefining preferences, interests, needs, etc. The idea that attention social acts and relations. Recently, the divide between and time are scarce resources led H. Simon [10] to apply the physical world and online social realities has been standard economic tools to study these constraints blurred by the new possibilities afforded by real-time and introduce the concept of an Attention Economy communication and broadcasting, which appear to with mechanisms similar to our everyday monetary greatly enhance our social and cognitive capabilities in economy. The increasingly fast pace of modern life and establishing and maintaining social relations. The com- overwhelming availability of information has brought bination of mobile devices with new tools like Twitter, a renewed interest in the study of the economy of Foursquare, Blippy, Tumblr, Yahoo! Meme, Google attention with important applications both in business Hotspot, etc., are defining a new era in which we can be [11] and the study collective human behavior [12]. On continuously connected with an ever-increasing number the one hand it can be argued that microblogging tools of individuals through constant digital communication facilitate the way we handle social interactions and composed of small messages and bits of information. that this results in an online world where human social Thus, while new data and computational approaches to limits are finally lifted, making predictions such as the social science [1–3] finally enable us to answer a large Dunbar’s number obsolete. Microblogging and online number of long-standing questions [4–6], we are also tools on the other hand, might be analogous to a pocket increasingly confronted with new questions related to calculator that, while speeding up the way we can do the way social interaction and communication change simple math, does not improve our cognitive capabilities in online social environments: What is the impact that for mathematics. In this case, the basic cognitive modern technology has on social interaction? How do limits to social interactions are not surpassed in the we manage the ever-increasing amount of information digital world. In this paper we show that the latter that demands our attention? In 1992, R. I. M. Dunbar hypothesis is supported by the analysis of real world [7] measured the correlation between neocortical volume data that identify the presence of Dunbars limit in Twit- and typical social group size in a wide range of primates ter, one of the most successful online microblogging tools. and human communities. The result was as surprising as it was far-reaching. The limit imposed by neocortical processing capacity appears to define the number of individuals with whom it is possible to maintain stable II. THE DATASET interpersonal relationships. Therefore, the size of the brain’s neocortex represents a biological constraint on Having been granted temporary access to Twitters fire- social interaction that limits humans’ social network size hose we mined the stream for over 6 months to identify a to between 100 and 200 individuals [8], i.e. Dunbar’s large sample of active user accounts. Using the API, we number. McCarty et al. [9] independently attempted to then queried for the complete history of 3 million users,
  2. 2. 2 A) B) k in = 3 k in = 1 k out = 3 k out = 1 w in = 4 w in = 1 4 w out = 3 w out = 2 Alice 10 B C Ellie k in = 2 2 k out = 1 Bob w in = 2 7 w out = 1 Bob 1 5 A Alice Cathy 9 E Cathy k in = 1 k in = 0 3 k out = 1 k out = 1 Dan w in = 1 w in = 0 11 D w out = 1 w out = 1 Bob 6 BobFIG. 1: Reply trees and user network. A) The set of all trees is a forest. Each time a user replies, the corresponding tweetis connected to another one, resulting in a tree structure. B) Combining all the trees in the forest and projecting them ontothe users results in a directed and weighted network that can be used as a proxy for relationships between users. The numberof outgoing (incoming) connections of a given user is called the out (in) degree and is represented by kout (kin ). The numberof messages flowing along each edge is called the degree, ω. The probability density function P (kout ) (P (kin )) indicates theprobability that any given node has kout (kin ) out (in) degree and it is called the out (in) degree distribution and is a measureof node diversity on the network.resulting in a total of over 380 million individual tweets about the id of the original tweet but also the user thatcovering almost 4 years of user activity on Twitter. Ta- sent it. Using this information, each reply tweet mapsble I provides some basic statistics about our dataset. directly to a directed edge. Individual trees can be iden-Here we analyze this massive dataset of Twitter conver- tified by using depth first search [17] to identify connectedsations accrued over the span of six months and investi- components in the resulting tweet-tweet graph. To en-gate the possibility of deviation from Dunbar’s number sure that the full tree is found and not just a part ofin the number of stable social relations mediated by this it, we treat each link as undirected for the purposes oftool. The pervasive nature of Twitter, along with its this identification. In this way we are able to extract thewidespread adoption by all layers of society, makes it an complete tree even if we happen to start on one of theideal proxy for the study of social interactions [13–16]. leaves. For each tree the root is then found by locatingWe have analyzed over 380 million tweets from which the node with kin ≡ 0, and distances from the root arewe were able to extract 25 million conversations. Each measured by rerunning the DFS algorithm starting fromTwitter conversation takes on the form of a tree of tweets, the root and respecting the direction of each edge.where each tweet comes as a reply to another. By pro- The underlying reply network can be extracted by pro-jecting this forest of trees onto the users that author each jecting the tweet trees to a user graph: User A is con-tweet, we are able to generate a weighted social network nected to user B by a directed outgoing edge if A repliedconnecting over 1.7 million individuals (see Figure 1). to a tweet sent by B. Over time, any pair of users can ex- change multiple replies either in a single “conversation” Tweets 381, 652, 990 (tree) or through multiple conversations. The number of Timelined Users 3, 006, 180 messages sent from one user to another is used as the Scraping Period Nov. 20, 2008 – May 29, 2009 weight of the corresponding directed edge and is taken to signify the strength of the connection between the two Time span 4 years users, with higher weights representing stronger connec- tions. Trees 25, 273, 871 Tweets in Trees 81, 728, 252 User in Trees 1, 720, 320 B. Online conversations User-User Edges 68, 459, 592 Each reply creates a connection between two tweets TABLE I: Dataset Statistics. and their authors, so we can define a conversation as a branching process of consecutive replies, resulting in a tree of tweets. From our dataset we extracted and analyzed a forest of over 25 million trees. Trees vary A. Tree Identification and Projection broadly in size and shape, with most conversations re- maining small while a few grow to include thousands of All tweets in our dataset that constituted a reply were tweets and hundreds of users, as shown in Figure 2.collected. Each such tweet contains information not only A directed user-user network can be built by projecting
  3. 3. 3 0 0 10 10 A) B) slope = -3 slope = -5 -2 10 10 -2 -4 10 -4 10 P(Ns) P(Ds) -6 10 -6 10 -8 10 -8 10 -10 10 -10 10 -12 10 0 1 2 3 4 5 0 1 2 3 4 10 10 10 10 10 10 10 10 10 10 10 Ns Ds 0 4 10 10 C) slope = 1 D) slope = -3 -2 10 3 10 -4 10 P(Nu) -6 Nt 10 2 10 -8 10 1 10 -10 10 0 0 1 2 3 4 10 0 1 2 3 4 10 10 10 10 10 10 10 10 10 10 Nu DsFIG. 2: Tree characterization. A)Distribution of the number of tweets in a tree. B) Distribution of the number of shells. C)Distribution of the number of users. D) Tree size vs depth. The broad tailed nature of all of these quantities indicates thediversity of behaviors displayed by the users in our dataset.conversation trees to detail how users interact and estab- to the number of different nodes it receives a reply from.lish relationships among themselves. Bidirectional edges When A follows B, A subscribes to receive all the up-signify mutual interactions, with stronger weights imply- dates published by B. A is then one of B’s followers anding a more frequent or prolonged interaction between two B is one of A’s friends. Previous studies have mostlyindividuals. focused on the network induced by this follower-friend All of our analysis will be performed on this user-user relationship [15], [18–20]. In any study about stable so-conversation network. We consider a user to have out cial relations in online media, as indicated by studiesdegree kout if he or she replies to kout other users, re- about Dunbar’s number, it is important to discount oc-gardless of the number of explicit followers or friends the casional social interactions. For this reason we focus ongiven user has. By focusing on direct interactions we are stronger relationships in our study[13], considering justable to eliminate the confounding effect of users that have active communication from one user to another by meanstens or hundreds of thousands of followers with whom of a genuine social interaction between them. In our net-they have no contact and are able to focus on real person work [21, 22] we introduce the weight ωij of each edge,to person interactions [13]. defined as the number of times user i replies to user j as a direct measurement of the interaction strength between two users and stable relations will be those with a large III. DUNBAR’S NUMBER IN OUR DATA weight. A simple way to measure this effect is to calculate the average weight of each interaction by a user as a func- In the generated network each node corresponds to a tion of his total number of interactions. Users that havesingle user. The out-degree of the nodes is the number of only recently joined Twitter will have few friends andusers the node replies to, while the in-degree corresponds very few interactions with them. As time goes by, stable
  4. 4. 4users will acquire more and more friends, but the number are the keys ingredients in fixing the Dunbar’s number.of replies that they send to other users will increase con- We model this considering that when user i receives asistently only in stable social interactions. Eventually, a message it places it in an internal queue that allows uppoint is reached where the number of contacts surpasses to qmax,i messages to be handled at each time step. Inthe user’s ability to keep in contact with them. the presence of finite resources each agent has to makeThis saturation process will necessarily lead to some rela- decisions on what are the most important messages totionships being more valued than others. Each individual answer. We set the priority of each message to be pro-tries to optimize her resources by prioritizing these inter- portional to the total degree of the sender j. For eachactions. To quantify the strength of these interactions, user the we studied is the average number of interactions out outwe studied the quantity ωi , defined as the average so- per connection ωi (T ) as defined in the Eq. (1). At eachcial strength of active initiate relationship: time step each agent goes through its queue and performs the following simple operations: out j ωij (T ) ωi (T ) = out (1) ki • The agent replies to a random number St of mes- sages between 0 and the number of messages qi This quantity corresponds to the average weight per present in the queue. The messages to be repliedoutgoing edge of each individual where T represents the to are selected proportionally to the priority of thetime window for data aggregation. We measure this sending agent (its total degree). A message is thenquantity in our data set as shown in Figure 3A. The data sent to j, the node we are replying to, and the cor-shows that this quantity reaches a maximum between 100 responding weight ωij is incremented by one.and 200 friends, in agreement with Dunbar’s prediction(see figure 2A). This finding suggests that even though • Messages the agent has replied to are deleted frommodern social networks help us to log all the people with the queue and all incoming messages are added towhom we meet and interact, they are unable to overcome the queue in a prioritized order until the number ofthe biological and physical constraints that limit stable messages reaches qmax . Messages in excess of qmaxsocial relations. In Figure 2B, we plot , the number of are discarded.reciprocated connections, as a function of the numberof the in-degree. saturates between 200 and 300 eventhough the number of incoming connections continues to The dynamic process is then repeated for a total num-increase. This saturation indicates that after this point ber of time steps T . In order to initialize the processthe system is in a new regime; new connections can be re- and take into account the effect of endogenous randomciprocated, but at a much smaller rate than before. This effects, each agent can broadcast a message to all of itscan be accounted for by spurious exchanges we make with contacts with some small probability p. One may think ofsome contacts with whom we do not maintain an active this message as a common status change, or a TV appear-relationship. ance, news story, or any other information not necessarily authored by the sending agent. Since these messages are not specifically directed from one user to another, they do IV. THE MODEL not contribute to the weight of the edges through which they flow. We have studied this simple model by using an underlying network of N = 105 nodes and different scale- Let us consider a static network G, characterized by a free topologies. For each simulation T = 2 × 104 timedegree distribution P (k). Each user (node) i is connected steps have been considered and the plots are made evalu-to all its nearest neighbors js through two weighted di- ating the medians among at least 103 runs. In Figure 4 werected edges, i → j and j → i so that: report the results of simulations in a directed heavy-tailed out in ki network with a power-law tail similar to those observed ki = ki = , ∀i ∈ G. (2) for the measured network [19]. The figures clearly show a 2 behavior compatible with the empirical data. The peak outWhere ki is the out degree, the number of out-going that maximizes the information output per connection inlinks, and ki is the in degree, the number of in-going is linearly proportional to qmax , supporting the idea thatlinks, of the user i. Each node uses its out links to the physical constraints entailed in the queue’s maximumsend messages to its contacts and it will receive messages capacity along with the prioritization that gives impor-from its contacts through its in links. In this way is easy tance to popular senders are at the origin of the observedto distinguish between incoming and outgoing messages. behavior. We have also performed an extensive sensitiv-Whenever a message is sent from node i to node j, the ity analysis on the broadcasting probability p, the timeweight of the (i, j) edge, wij is increased by one. The to- scale T , and have investigated the effect of agent hetero-tal number of sent messages of each user is given by the geneity by studying populations in where each agent’ssum over all of its outgoing edges. Users communicate capacity qmax ,i is randomly distributed according to awith each other by replying to messages. The assump- Gaussian distribution centered around qmax with stan-tion of our model is that biological and time constraints dard deviation σ.
  5. 5. 5 A) B) 50 100 150 200 250 300 350 400 450 500 550 600 8 7 6 5 ωout ρ 4 3 2 0 1 0 50 100 150 200 250 300 350 400 450 500 550 600 0 50 100 150 200 250 300 350 400 450 500 550 600 out in k kFIG. 3: A) Out-weight as a function of the out-degree. The average weight of each outward connection gradually increasesuntil it reaches a maximum near 150 − 200 contacts, signaling that a maximum level of social activity has been reached. Abovethis point, an increase in the number of contacts can no longer be sustained with the same amount of dedication to each. Thered line corresponds to the average out-weight, while the gray shaded area illustrates the 50% confidence interval. B) Number ofreciprocated connections, ρ, as a function of kin . As the number of people demanding our attention increases, it will eventuallysaturate our ability to reply leading to the flat behavior displayed in the dashed region. queues start to get messages in and out. After a while 3 20 x 10 we can aspect that the system reaches a dynamical equi- 80 slope = 0.2 qmax = 50 librium. In Figure (5-A) we show the behavior of our 60 qmax = 100 observable ω out for different values of T , in particular we Peak 40 15 qmax = 150 20 qmax = 200 chose T = 104 , 1.5 × 104 , 2 × 104 . The effect of time is 0 qmax = 250 0 50 100 150 200 250 300 qmax qmax = 300 clearly a shift on the y axis and a small change in the out 10 position of the peak. The first effect is due to the fact ω that the number of messages circulating in the systems 5 increase linearly with T . The second effect is due to the reduction of fluctuations when more messages are sent. 0 The peak becomes more clear and defined. 0 50 100 150 200 250 out k B. Effect of broadcast probability pFIG. 4: Result of running our model on a heterogeneousnetwork made of N = 105 , nodes with degree distributionP (k) k−γ with γ = −2.4 and σ = 10. Different curves cor- The effect of the broadcast probability is different onrespond to different queue size. The inset shows the linear respect to the effect of the time window T . First of alldependence of the peak on the queue size q. Each curve is our observable ω out is linearly proportional to T in allthe median of 103 to 2 × 103 runs of T = 2 × 104 time steps regimes of k out this is not true for p. The effect of p is crucial for users with a small number of contacts. As the p increases they will receive more messages and their ac- tivity will increase too, this does not occur in the other A. Effect of the time window T limit. When the saturation takes place the ω out becomes completely independent of p. As show in details in a One of the parameters of our model is the time window mean-field approach (Section (IV D)) for values of k outT during which we study the dynamics. This parame- small with respect to the queue size, ω out scales linearlyter regulates the maximum number of messages that will with p. Instead for a number of contacts much biggercirculate in the network. In the first time steps the first than the queue size ω out is independent of p. These con-messages will start to being sent among users and the siderations are validated by our simulations as shown in
  6. 6. 6Figure (5-B). We see a clear dependence on p for small as the number of in-coming connections, the number ofvalues of k out instead the same behavior for bigger values messages it get scale as the square of its degree.of k out . Two different regimes are easily found: ki qmax,i and vice versa. In the first case the user is not popular. The number C. Effect of network’s properties of messages that the user will receive is small then. In principle it can reply to all of them at each time step. Inspired by several studies [13, 15, 19] we fix the base- We can assume that in this regime its queue is neverline of our model using scale-free networks. It is im- completely full. We will refer to Rt as the number ofportant then to study how differences in the network messages that the user reiceive at the time step t. Afterstructure affect the results. In this section we consider one time step the number of replies is:the effect of the exponent γ. As show in Figure (5-C)we run our model on top of scale-free networks with S1 = ξ1 R1 , (6)γ = −2.2, −2.4, −2.6, −2.8. As clear from the plot forsmaller values of γ (bigger value in absolute value) gaps where ξ1 is a random number uniformly distributed be-on k out start to emerge. These are due to the network tween 0 and 1. The number of messages, S2 that the userstructure. The shape and position of the peak is the send at the second time step is a random fraction of thesame for all the curves. The differences are evident just messages present in its queue:on the peak height that increase as γ decreases. This is S2 = [R1 (1 − ξ1 ) + R2 ] ξ2 . (7)due the different redistribution of degrees and to the factthat with small γ the selection effect is more and more For t = 3 we get:important. So we can say that the result are robust onγ. S3 = {[R1 (1 − ξ1 ) + R2 ] (1 − ξ2 ) + R3 } ξ3 = [R1 (1 − ξ1 )(1 − ξ2 ) + R2 (1 − ξ2 ) + R3 ] ξ3 , (8) D. Single user: analytical approach and so on. We can approximate these equations using the average number of received messages R . For the In order to get a better understanding of the mecha- general t it is possible to show that:nisms we describe, we analyzed, in a mean-field approach,  the behavior of a single user i. t t−1Let us focus on a user i characterized by degree ki and St ∼ R ξt 1 + (1 − ξi ) outqmax,i . ki = ki /2 are the out-going links that it uses j=1 i=j into send messages to its ki /2 contacts. ki = ki /2 arethe in-coming through which it receives messages from = R ξt 2 − ξt−1 + O(ξ 2 )it contacts. We set as kj the priority of each neighbor = R 2ξt − ξt ξt−1 + O(ξ 3 ) . (9)that we extract for a distribution P(k). The rules of themodel that we described in the previous sections are ap- The total number of messages sent is the numerator ofplied for T time steps. The probability that a neighbor our measure ω out and the sum of all the St :j will send a message to the user i is: t=T ki St ∼ T R , (10) pji = p + , (3) t=0 < k > kjwhere p is the broadcast probability. We can evaluate the considering that each sum of product random numbers isaverage number of messages that the user will receive at order T . We can write then:each time step t: t=T out t=0 St 1 ki (k out )2 out ωi (T ) = out ∼ out T p +c i in ki 1 ki ki 2 2<k> R = pji = pki + . (4) j <k> in kj ∼ T out ki . (11) j∈kiWe extracted kj from the same distribution, the sum out inscales then linearly with the number of element: ki . We In this regime we get a linear increase with ki of thecan write: average number of replies per connections. As show in 2 Figure (6) this is confirmed in the simulations. ki ki The other regime is found for a number of contacts bigger R = pji = p +c , (5) j 2 2<k> than the queue size. In this case the user is very popular and at each time step it gets a lot of messages and is notwhere c is a constant fixed by the distribution. Since the able to handle all of it. In this limit the saturation processpriority of the user is proportional to its degree as well takes place and it will reply just to a small fraction of
  7. 7. 7 3 3 3 x 10 x 10 x 10 A) 10 T = 10 4 B) 20 p = 10 -4 C) γ = -2.2 8 4 -4 γ = -2.4 T = 1.5 x 10 p = 5x10 10 4 15 γ = -2.6 T = 2 x 10 p = 10 -3 γ = -2.8 6 out out out 10 ω ω ω 4 5 5 2 0 0 0 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 300 out out out k k kFIG. 5: A) ω out as a function of kout for qmax = 100, σ = 10, scale-free network with γ = −2.4 and different values of T . Wepresent the medians over 500 runs B) ω out as a function of kout for qmax = 100, σ = 10, scale-free network with γ = −2.4,T = 104 and different values of p. We present the medians over 500 runs C) ω out as a function of kout for qmax = 100, σ = 10,T = 104 , p = 5 × 10−4 , scale-free network with different values of γ. We present the median over 500 runs. In this regime then we get a different scaling behavior x 10 3 typical of saturation problems. As shown in Figure (6) 1 these arguments are in perfect agreement with the nu- σ = 20 σ = 10 merical results. σ=5 slope = +1 We have shown two different regimes. A linear increasing slope = -1 behavior and a decreasing one. In the between of these opposite cases we will find a maximum of the function. out The position of these peak is in general function of the ω queue size. 0.1 1 10 100 1000 out kFIG. 6: Results for the single user and different values of σ, V. CONCLUSIONSthe inter-user queue size variance. We fixed the average queuesize at qmax,i = 50 and extracted the priorities of user neigh-bors from a power-law statistical distribution with exponentγ = −2.1. For each ki we run T = 500 time steps and present Social networks have changed they way we use to com-the medians among 103 runs municate. It is now easy to be connected with a huge number of other individuals. In this paper we show that social networks did not change human social capabili-the total number of messages prioritizing them. At each ties. We analyze a large dataset of Twitter conversationstime step this number is a random variable uniformly collected across six months involving millions of individ-distributed between 0 and qmax,i . We have then: uals to test the theoretical cognitive limit on the number of stable social relationships known as Dunbar’s num- T ber. We found that even in the online world cognitive out 1 ωi (T ) ∼ out ξt qmax,i . (12) and biological constraints holds as predicted by Dunbar’s ki t=0 theory limiting users social activities. We propose a sim- ple model for users’ behavior that includes finite priorityThe ξt s are random variable uniformly distributed be- queuing and time resources that reproduces the observedtween 0 and 1. At each time step the number of replies is social behavior. This simple model offers a basic explana-a random fraction of the queue size. For T large enough tion of a seemingly complex phenomena observed in thewe get: empirical patterns on Twitter data and offers support to T Dunbar’s hypothesis of a biological limit to the number out ωi (T ) ∼ out qmax,i . (13) of relationships. 2ki [1] David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, takis, Noshir Contractor, James Fowler, Myron Gut- Albert-Laszlo Barabasi, Devon Brewer, Nicholas Chris-
  8. 8. 8 mann, Tony Jebara, Gary King, Michael Macy, Deb Roy, 2007. and Marshall Van Alstyne. Computational social science. [13] B. A. Huberman, D. M. Romero, and F. Wu. Social net- Science, 323:721, Feb 2009. works that matter: Twitter under the microscope. First [2] D. J. Watts. The new science of networks. Ann. Rev. Monday, 14:1, 2008. Socio., 30:243, 2004. [14] A. Lashinsky. Discover. decipher. disrupt. the true mean- [3] Adrian Cho. Ourselves and our interactions: The ulti- ing of twitter. Fortune Magazine, 158:39, 2008. mate physics problem? Science, 325:406, Nov 2009. [15] D. Boyd, S. Golder, and G. Lotan. Tweet, tweet, retweet: [4] A.-L. Barab´si. The origin of bursts and heavy tails in a Conversational aspects of retweeting on twitter. In 43rd human dynamics. Nature, 435:207, 2005. Hawaii International Conference on System Sciences, [5] C. Castellano, S. Fortunato, and V. Loreto. Statisti- page 412, 2008. cal physics of social dynamics. Rev. Mod. Phys., 81:591, [16] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue 2009. Moon. What is twitter, a social network or a news me- [6] M. C. Gonz´lez, C. A. Hidalgo, and A.-L. Barab´si. Un- a a dia?, 2010. derstanding individual human mobility patterns. Nature, [17] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 453:779, 2008. Introduction To Algorithms. The MIT Press, 2nd edition, [7] R. I. M. Dunbar. Neocortex size as a constraint on group 2001. size in primates. J. Human Evo., 22:469, 1992. [18] B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps [8] R. I. M Dunbar. The social brain hypothesis. Evo. An- about twitter. In Proceedings of the first workshop on thro., 6:178, 1998. Online social networks, page 19, 2008. [9] Christopher McCarty, Peter D Killworth, H Russell [19] A. Java, X. Song, T. Finin, and B. Tseng. Why we twit- Bernard, Eugene C Johnsen, and Gene A Shelley. Com- ter: understanding microblogging usage and communi- paring two methods for estimating network size. Human ties. In Proceedings of the 9th WebKDD and 1st SNA- Organization, 60(1):28, Nov 2001. KDD 2007 workshop on Web mining and social network[10] H. Simon. Computers, Communication, and the Pub- analysis, page 56, 2007. lic Interest, chapter Designing Organizations for an [20] Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, Information-Rich World, pages 38–52. Johns Hopkins and Krishna P. Gummadi. Measuring user influence in Press, 1971. twitter: The million follower fallacy, 2010.[11] T. Davenport and J. C Beck. The Attention Economy: [21] A. Barrat, M. Barthelemy, and A. Vespignani. Dynamical Understanding the New Currency of Business. Harvard Processes On Complex Networks. Cambridge University Business Press, 2002. Press, 2008.[12] B. A. Huberman and F. Wu. The economics of attention: [22] Mark Newman. Networks: An Introduction. Oxford Uni- maximizing user value in information-rich environments. versity Press, 2010. In Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, page 16,