Inﬂuence and Homophily inNetworked User BehaviorEytan BakshyFacebookmining social network dynamics workshop @ www2012April 16, 2012
Motivation▪ To what extent do social networks shape our behaviors online?▪ Homophily and heterogeneity confound social inﬂuence effects. ▪ Online behavior resembles well-studied forms of contagion ▪ Statistical controls are not enough (Shalizi & Thomas, 2011)▪ How do we measure inﬂuence? ▪ Experiments.
Outline▪ What is a reasonable model of social contagion on the Web?▪ The homophily confound▪ Study 1: Inﬂuence in information diffusion▪ Study 2: Inﬂuence in sharing decisions▪ Implications
Information as biological contagion▪ Standard models assume constant probability of infection
Information as biological contagion▪ Standard models assume constant probability of infection▪ Interesting things happen when reproduction rates are high R ≥ β/γ
Information as biological contagion▪ Standard models assume constant probability of *+!* ! -.4 ! infection *+!1 ! ! ! -.3 ! *+!0 -.2 &"()"*+, ! ! %$&"()▪ Interesting things happen ! *+!/ -.1 ! ! ! when reproduction rates ! *+!. -.0 ! ! *+!- ! -./ are high *+!, ! ! ! -.- *++ *+* *+1 *+0 *+/ On the web, most . / 1▪ where the !"#$ !"#$% left (r information doesn’t !* ! -. 4 (a) Cascade Sizes isﬁed (violated). (b) Cascade ! measured by (log appear to spread *+ *+!1 ! ! -.3 partition. Thus, ! ! Figure 4: (a). Frequency distribution ofu that users with *+!0 -.2 &"()"*+, ! ! sizes. (b). Distribution of cascade depths. b age 6.2 reposts %$&"() ! *+!/ -.1 ! ! ! predicted to hav ! *+!. -.0 ! ating cascades o R ≥ β/γ ! ! *+!- we study size or depth, therefore, the implicatio ! -./ Unsurprisingly ! ! *+ !, most-.events do not spread at all, and even modera - ! ! provides the mo ! ! ! cascades are extremely rare. the local, not th *++ *+* *+1 *+0 *+/ . / 1 3 5 !"#$ To identify consistently inﬂuential individuals, !"#$% this is likely due gated Bakshy, Hofman, Mason, Watts 2011 computed indivi all URL posts by user and are of depth 1, s (a) Cascade Sizes (b) Cascade Depths inﬂuence as the logarithm of the average of total al dictor size of ad for which that user was a seed. followers is anain We then ﬁt r Figure 4: (a). Frequency distribution of cascade tree model , in which a greedy optimization proc are the only two
Threshold models of social contagion▪ Threshold models: become activated after k contacts are activated ▪ Not clear that local consensus factors into individual decisions in sharing content
Threshold models of social contagion▪ Threshold models: become activated after k contacts are activated ▪ Not clear that local consensus factors into individual decisions in sharing content▪ Positive externalities: e.g. adoption of a technology ▪ Utility of visiting to a page is often unrelated to number of visiting friends
Diffusion of innovations▪ Focuses on the spread of ideas and technologies ▪ Entail costly decisions
Diffusion of innovations▪ Focuses on the spread of ideas and technologies ▪ Entail costly decisions▪ Embeddedness, authority, interpersonal trust, play important role
Diffusion of innovations▪ Focuses on the spread of ideas and technologies ▪ Entail costly decisions▪ Embeddedness, authority, interpersonal trust, play important role▪ Much of online activity is cheap and informal
inviter. Next we turn to the invitees, as they interact invitee, and test the relationship between conne with potentially multiple inviters, as illustrated by the (CC: clustering coefficient) of inviters and like invitation network in Figure 6 and Figure 8 b). When adoption. For both games, we find significant users join a game after receiving invitations from mul- correlation between the CC of the inviters and tiple sources, we can attribute their conversion to a col- Some Similarities ceptance rate of the invitee (YL: ! = 0.21*, lective effort. We define the acceptance_rate to be the 0.14*), which indicates that a more strongly c ratio of the number of families one invitee joins over all game network is more attractive for others to jo distinct families that user was invited to join. We fur- sidering the above result, we suspect that soci ther examine how a structured group of inviters exerts might be different from other diffusion agents collective influence over a common invitee. utility is primarily in their social function: th might be significantly amplified by the socialUENTIALS AND NETWORKS Theory FIGURE 11 Figure 453 shows that users are more likely to join as 11 they receive invitations from more inviters. For both YL and DL, the marginal rate is declining and the ac- Data it accommodates, rather than its inherent uti game. In the next section, we will discuss how Research Track Paper cial utility is exerted as one kind of network ext ceptance_rate becomes more saturated. INFLUENCE RESPONSE FUNCTIONS FOR (A) THE SIR MODEL AND (B) THE DETERMINISTIC THRESHOLD RULE changes in membership consistently precede or lag changes in in- terest? While such questions are extremely natural at a qualitative 0.025 5.2. Network Effect of Families in Ga Probability of joining a community when k friends are already members level, it is highly challenging to turn them into precise quantita- tive ones, even on data as detailed as we have here. We approach Above we investigated the pointwise features o 0.02 this through a novel methodology based on burst analysis ; we identify bursts both in term usage within a group and in its mem- tions. We now look at how the adoption tak 0.015 from a dynamic perspective: how invitation e probability bership. We ﬁnd that these are aligned in time to a statistically signiﬁcant extent; furthermore, for CS conference data in DBLP, we present evidence that topics of interest tend to cross between varies across time, or different stages of the 0.01 conferences earlier than people do. network. First we are interested in whether th 0.005 network externality effect in the game contag Watts & Dodds 2007 Related Work. As discussed above, there is a large body of work on identifying tightly-connected clusters within a given graph (see local community. Family membership can yi 0 0 5 10 15 20 25 30 35 40 45 50 e.g. [14, 15, 16, 20, 28]). While such clusters are often referred to explicit utility such as battle collaboration and k as “communities”, it is important et note2010 is a very different Wei to al that this type of problem fromThe we consider here —Viral Marketing what Dynamics of while this cluster- of benefit by p of joining 2006 19 of belonging” Backstrom et al “feeling Figure 11 Acceptance rate over different number Figure 1: The probabilityimparting aaLiveJournal commu- invitations received by one user in a network based ing work seeks to infer potential communities nity as a function of the number of friends k already inmembership ers. In fact, we observe that family the community. Error bars represent two standard errors. on density of linkage, we start with a network in which the com- E.—Each function reﬂects the probability of choosing alternative B as a function of the number n or as a fraction b of others choosing B, respectively. i i better performance and higher commitmen Watts & Dodds 2007 munities of interest have already been explicitly identiﬁed and seek 0.06 0.08 small assets Heterogeneity of inviter networks: wegrow and to model the mechanisms by which these communities are interested game. We hypothesize these different kinds of 0.20 large assets rate of adoptionhold model but rather that average individuals trigger an important role as early adopters, when networks are suf- et al.  study implicitly-deﬁned “communities” of change. Dill whether a heterogeneous group of recommenders exerts 0.06 0.05 integrate and accumulate as a network external 0.1 Probability of joining a conference when k coauthors are already ’members’ of that conference r ones. ﬁciently sparse, but not as initiators. Finally, group structure For a variety of features (e.g. a particular keyword, a different sort: Probability of Buying Probability of Buying family size gets larger.nother qualitative difference between the SIR and appears to generally impede the effectiveness ofhigher influence or a0.04an individual than subgraph homo- inﬂuentials locality, over code), they consider the a more a name of a ZIP 0.08hold models is that early adopters are consistently more both as initiators and early adopters. geneous group. Oneall pages containing this if a user is receiv- of the Web consisting of might argue that feature. Such 0.10 communities of Web pages are still quite different from explicitly- 0.03 0.04 We use the time interval between when theential than average in both low- and high-density re- ing the same where participants deliberately join,butwe study identiﬁed groups signal to join a game, as the signal is 0.06 (n+1)th members join a family to measure the p probability s, as shown in ﬁgure 13 (again, for low- and high- DISCUSSION coming fromthe questions here; moreover, a wide variety of  are quite is more likely 0.02considered infriends, it different nce networks in the top row and the bottom row, re- of a new member to join. Figure 12a presents tofrom our focus here. the0.01 will perceive the game to be 0.02 ively). The reason, once again, is that more inﬂuential be heeded as user Whether these results should be regarded as undermining on-line social networking sites for data mining ap- The use of 0.04 age time interval for all families which had 1~ 0.00 iduals are more, not less, susceptible to inﬂuence them- what we have called the inﬂuentials hypothesis or as sup- and thesubject of a number of recent papers; see ubiquitous been the advice to join the game independent. plications has 0.02 s; thus, there is no trade-off between inﬂuence and porting it is ultimately an empirical question. Our main To 0 [1, measure recent examples. These recentof a group 15 inviters 0 5 10 6 26] for two the diversity2 level 4 papers have focused of 8 10 bers.10We 20 a clear drop in 60 time interval 0 see 30 40 50 the on different questions, and have not ofIncoming Recommendations number directly exploited the structure neighbors (k) th Incoming Recommendationsenceability. Given this difference, one might expect thattheall send communities embeddedain these systems. invitee, we point, in fact, is not so much that the inﬂuentials hypothesis of user-deﬁned invitations to (a)common Stud- 7 member joins each family. This suggests 0 4 Books (b) DVD 0 2 4 6 8 10 12 14 16 18 in the SIR model, the population of early adopters will is either right or wrong but that its microfoundations, by computerelationship between different al 2009FB Usenet [4, of this 0.2 ies of the the Shannon’s entropy of Bakshy et newsgroups on profiles 0.2 first Leskovec et al 2007 at increasing speed, few members join k de more inﬂuentials. As ﬁgure 13 shows, this intuition which we mean the details of who inﬂuences whom and advantage of the self-identiﬁed nature of these on- t supported for the case of “ordinary” inﬂuentials (top how, require very careful articulation in order for its validity 35] has taken line communities, although again of inviters can influence this 2: The probabilitythe joining a DBLP community as as more m group. Since the number the speciﬁc questions are quite Figure after that p of process slows down a function ofjoin. but is supported for hyperinﬂuentials when the cor- different. the of 0.15 the number of friends k already in the community. to be meaningfully assessed. Whether stated explicitly orwe tested 0.15 correlation between entropyError bars represent two standard errors. measure, ility of Buying ility of Buying nding network is also sufﬁciently sparse: the ﬁrst two not, any claim to the effect that inﬂuentials are important earlier, the questions we consider are closely invitees who inviters and acceptance_rate of groups of related As noted to the diffusion of innovations, a broad area of study in the socialerations” of early adopters in ﬁgure 13C are clearly necessarily makes a number of assumptions regarding the 33, 34]; same number ofthat is “diffusing” in all received the the particular property invites. However the 0.1 sciences [31, This might imply that a tightly knit group isentials. Thus, although hyperinﬂuentials do not appear nature of interpersonal inﬂuence, the structure of our work is membership in0.1given group. The question of how a inﬂuence a absorbed but members peripheral to the social
The Homophily Confound Unknown correlation between friends’ characteristics (expected to be stronger for closer friends)Known characteristics Xi Ui Xj Uj Unknown characteristics (e.g. Web browsing behavior, interests) Yja(t0) Alter’s sharing behavior Dija Ego’s sharing behavior Yia(t1) ﬁgure stolen from Bakshy, Eckles, Yan & Rosenn, 2012
Inﬂuence (and homophily) Unknown correlation between friends’ characteristics (expected to be stronger for closer friends)Known characteristics Xi Ui Xj Uj Unknown characteristics (e.g. Web browsing behavior, interests) Yja(t0) Alter’s sharing behavior Other forms of inﬂuence Dija Mechanism (e.g. News Feed, social cues) Ego’s sharing behavior Yia(t1) ﬁgure stolen from Bakshy, Eckles, Yan & Rosenn, 2012
Study 1: Effect of Feed on InformationDiffusion with Itamar Rosenn, Cameron Marlow, and Lada Adamic published as The Role of Social Networks in Information Diffusion. WWW 2012.
Study 1: Outline▪ Field experiment tests how much sharing would occur in the absence of exposure via the Facebook feed▪ Answers causal questions about inﬂuence & diffusion: ▪ To what extent does feed increase sharing? ▪ Are weak ties responsible for disseminating information?* ▪ How is tie strength predictive of user activity?* *to be continued on April 19th, The Role of Social Networks in Information Diffusion
By Web content change can be beneficial to the Web user looking for Correlated Information Sources External Inﬂuence er new information, but can also interfere with the re-finding ofwe previously viewed content . Understanding what a person is ly interested in when revisiting a page can enable us to build systems th that better satisfy those interests by, for example, highlightingWe changed content when the change is interesting, activelynd monitoring changes of particular interest, and providing cachedne information when changes interfere with re-finding. th In this paper, we characterize the relationship between revisitation regularly visit same and change by analyzing a large scale Web log trace for 2.3M users and a five week hourly crawl of over 40K Web pages. We begin our analysis by exploring how revisitation behavior relates visit sites that link to mass + interpersonal interpersonal site to change. The analysis validates a number of hypotheses (e.g., popularity and change are correlated), but also uncovers some surprising results. For example, we find that certain measures of the same content media communication page change (e.g., the amount of change) are not linearly related e, to measures of revisitation (e.g., inter-arrival times)ba result with consequences to monitoring tools that use these simple measures Web revisitation to watch for page changes to identify events of likely importance. Blogs Face-to-faceoraybe s.reedndge aedge is itat Telephonege stes r Figure 1. Web pagesKand their componentsKchange at e different rates. People revisit pages for many reasons, at Adar et al, 2009 and their reasons for revisiting can be affected by content y changes. In this paper we show that revisitation rates can News Aggregators s, resonate with the rate of change of interesting content. RSS IM Email
Inﬂuence on Feed Unknown correlation between friends’ characteristics (expected to be stronger for closer friends)Known characteristics Xi Ui Xj Uj Unknown characteristics (e.g. Web browsing behavior, interests) Yja(t0) Alter’s sharing behavior Other forms of inﬂuence Dija Facebook news feed Ego’s sharing behavior Yia(t1) ﬁgure stolen from Bakshy, Eckles, Yan & Rosenn, 2012
Details▪ Assignment procedure: ▪ (viewer, URL) pairs are deterministically assigned into the feed and no feed condition ▪ Directed shares (via messages, wall posts) are not subject treatment and are removed from experiment▪ Evaluating outcomes: ▪ Compare the likelihood of sharing in the feed (treatment) with the no feed (control) condition
Data▪ Random sample of all (user, URL) pairs eligible to be shown in the Facebook news feed between a 7 week period in 2010▪ 253,238,367 subjects▪ 75,888,466 URLs▪ 1,168,633,941 distinct subject-URL pairs (random trials)
Temporal Clustering shared within the ﬁrst hour of exposure users shared at exact same time within one day shared before seeing within one week story on feed 1.0 1.0 0.8 0.8 1.0 cumulative density cumulative density 0.6 0.6 0.8 condition condition feed feedcumulative density 0.4 no feed 0.4 no feed 0.6 condition feed 0.4 0.2 no feed 0.2 0.2 0.0 0.0 0 5 10 15 20 25 30 -5 0 5 10 15 20 25 30 0.0 share time - alters share time (days) share time - exposure time (days) -5 0 5 10 15 20 25 30 Absolute share time - exposure time (days) time Relative to ﬁrst exposure
What is the overall effect of feed on sharing?▪ Two methods for comparing probabilities: ▪ Average treatment effect of the treated: pfeed - pno feed ▪ Relative risk ratio: pfeed / pno feed▪ Average effect: +0.2047% increase in sharing▪ Risk Ratio: 7.3x more likely to share
How does sharing increase with exposure? Inﬂuence on feed + external correlation External correlation 0.025 ! ! ! ! feed ! ! 0.020 ! probability of sharing ! no feed ! 0.015 ! 0.010 ! 0.005 ! 0.000 1 2 3 4 5 6 number of sharing friends
How does sharing increase with exposure? 0.030 0.025 p f eed − p no feed 0.020 0.015 0.010 0.005 0.000 1 2 3 4 5 6 number of sharing friends
Study 1: Recap▪ Experiments are necessary to disentangle inﬂuence from other factors▪ Signiﬁcant temporal clustering exists even for unexposed users▪ Probability of sharing increases with number of friends ▪ Even you don’t see those friends! ▪ Inﬂuence appears stronger when more friends are shown
Study 2: Effect of Social Cues on SharingDecisions Chapter IV, Information Diffusion and Social Inﬂuence in Online Networks (dissertation chapter)
Motivation▪ Social inﬂuence in information diffusion occurs via two stages ▪ 1. Exposure (study 1) 0.030 ▪ 2. Decision to share 0.025 p f eed − p no feed 0.020▪ Trend in previous experiment is not causal 0.015▪ Need a way to experimentally manipulate 0.010 the number of social signals received by 0.005 the user 0.000 1 2 3 4 5 6 number of sharing friends
Study 2: Outline▪ Field experiment tests how the number of friends shown (social cues) increases sharing via randomization of cues▪ Answers causal questions about inﬂuence: ▪ How does seeing a certain number of peers effect information diffusion? ▪ How is tie strength predictive of user activity? ▪ Are strong ties more inﬂuential?
Experimental Design▪ Subjects: Users that arrive at pages independent of Facebook ▪ Are or would have been assigned to the no feed condition in Study 1 ▪ Not arriving via Facebook▪ Assignment procedure: randomly assign (viewer, URL) to a number of cues
Data▪ Same 7 week period as Study 1▪ 1,891,768 randomized trials (unique subject-page pairs) consisting of: ▪ 1,156,608 unique subjects ▪ 470,089 distinct web pages▪ Record demographic features, tie strength measures between subjects and their alters for each impression and click event
Social Correlation▪ Probability when number of friends liking = number shown 0.08 probability of sharing Not causal! 0.06 0.04 0.02 0.00 0 1 2 3 4 number of sharing friends (k)
baseline: homophily + heterogeneity (zero friends shown) observed effect number of actual liking friends } 0 1 2 3 0.08probability of sharing 0.06 0.04 0.02 0.00 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 number of friends shown (o)
Predictive Power of Strong Ties▪ Consider cases where 1 friend liked a page 0.028 cue shown probability of sharing 0.026 0.024 cue not shown 0.022 0.020 5 10 15 tie strength
Cues Matter More for Strong Ties 1.18 1.17 relative risk ratio 1.16 1.15 1.14 5 10 15 tie strength
Study 2: Recap▪ Experiments are necessary to understand the effect of cues on user behavior▪ We introduce the cue-response function which shows how the number of social signals received inﬂuences user behavior▪ Tie strength is predictive of sharing decisions▪ Strong ties appear more inﬂuential
Implications▪ Correlated activity cannot be attributed to inﬂuence alone!▪ Viral marketing and identifying “inﬂuencers” ▪ If it weren’t for the inﬂuential, would users still acquire the information? ▪ Do probability increases justify targeting clusters of individuals?▪ Relevance ▪ Social data (tie strength, number of friends) allows us to identify relevant content▪ Integrating social into Web products ▪ Social cues increase engagement rates
Upcoming Work▪ Similar to Study 2, but with ads: Social Inﬂuence in Social Advertising: Evidence from Field Experiments Eytan Bakshy, Dean Eckles, Rong Yan, Itamar Rosenn EC 2012. 1.30 1.25 1 friend shown normalized click rate 1.20 1.15 1.10 1.05 1.00 1 2 3 4 5 6 number of associated friends
Future Work▪ Information diffusion: ▪ Constructed observational study ▪ Effect of cues during exposure via feed▪ Social cues: Individual differences in persuasion ▪ Do certain users respond differently to social cues? ▪ Simple or complex contagion? ▪ Are strong ties actually more inﬂuential?
Thanks!▪ Collaborators: Lada Adamic, Dean Eckles, Cameron Marlow, Itamar Rosenn▪ More to come on Thursday!▪ Find out more about Data Science at Facebook here: ▪ http://www.facebook.com/data▪ Questions?