Your SlideShare is downloading. ×
  • Like
  • Save
MSND talk @ WWW 2012 (April 16, 2012)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

MSND talk @ WWW 2012 (April 16, 2012)

  • 222 views
Published

Invited talk given at the Mining Social Network Data workshop at WWW 2012 in Lyon

Invited talk given at the Mining Social Network Data workshop at WWW 2012 in Lyon

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
222
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Influence and Homophily inNetworked User BehaviorEytan BakshyFacebookmining social network dynamics workshop @ www2012April 16, 2012
  • 2. Motivation▪ To what extent do social networks shape our behaviors online?▪ Homophily and heterogeneity confound social influence effects. ▪ Online behavior resembles well-studied forms of contagion ▪ Statistical controls are not enough (Shalizi & Thomas, 2011)▪ How do we measure influence? ▪ Experiments.
  • 3. Outline▪ What is a reasonable model of social contagion on the Web?▪ The homophily confound▪ Study 1: Influence in information diffusion▪ Study 2: Influence in sharing decisions▪ Implications
  • 4. Information as biological contagion▪ Standard models assume constant probability of infection
  • 5. Information as biological contagion▪ Standard models assume constant probability of infection▪ Interesting things happen when reproduction rates are high R ≥ β/γ
  • 6. Information as biological contagion▪ Standard models assume constant probability of *+!* ! -.4 ! infection *+!1 ! ! ! -.3 ! *+!0 -.2 &"()"*+, ! ! %$&"()▪ Interesting things happen ! *+!/ -.1 ! ! ! when reproduction rates ! *+!. -.0 ! ! *+!- ! -./ are high *+!, ! ! ! -.- *++ *+* *+1 *+0 *+/ On the web, most . / 1▪ where the !"#$ !"#$% left (r information doesn’t !* ! -. 4 (a) Cascade Sizes isfied (violated). (b) Cascade ! measured by (log appear to spread *+ *+!1 ! ! -.3 partition. Thus, ! ! Figure 4: (a). Frequency distribution ofu that users with *+!0 -.2 &"()"*+, ! ! sizes. (b). Distribution of cascade depths. b age 6.2 reposts %$&"() ! *+!/ -.1 ! ! ! predicted to hav ! *+!. -.0 ! ating cascades o R ≥ β/γ ! ! *+!- we study size or depth, therefore, the implicatio ! -./ Unsurprisingly ! ! *+ !, most-.events do not spread at all, and even modera - ! ! provides the mo ! ! ! cascades are extremely rare. the local, not th *++ *+* *+1 *+0 *+/ . / 1 3 5 !"#$ To identify consistently influential individuals, !"#$% this is likely due gated Bakshy, Hofman, Mason, Watts 2011 computed indivi all URL posts by user and are of depth 1, s (a) Cascade Sizes (b) Cascade Depths influence as the logarithm of the average of total al dictor size of ad for which that user was a seed. followers is anain We then fit r Figure 4: (a). Frequency distribution of cascade tree model [6], in which a greedy optimization proc are the only two
  • 7. Threshold models of social contagion▪ Threshold models: become activated after k contacts are activated ▪ Not clear that local consensus factors into individual decisions in sharing content
  • 8. Threshold models of social contagion▪ Threshold models: become activated after k contacts are activated ▪ Not clear that local consensus factors into individual decisions in sharing content▪ Positive externalities: e.g. adoption of a technology ▪ Utility of visiting to a page is often unrelated to number of visiting friends
  • 9. Diffusion of innovations▪ Focuses on the spread of ideas and technologies ▪ Entail costly decisions
  • 10. Diffusion of innovations▪ Focuses on the spread of ideas and technologies ▪ Entail costly decisions▪ Embeddedness, authority, interpersonal trust, play important role
  • 11. Diffusion of innovations▪ Focuses on the spread of ideas and technologies ▪ Entail costly decisions▪ Embeddedness, authority, interpersonal trust, play important role▪ Much of online activity is cheap and informal
  • 12. inviter. Next we turn to the invitees, as they interact invitee, and test the relationship between conne with potentially multiple inviters, as illustrated by the (CC: clustering coefficient) of inviters and like invitation network in Figure 6 and Figure 8 b). When adoption. For both games, we find significant users join a game after receiving invitations from mul- correlation between the CC of the inviters and tiple sources, we can attribute their conversion to a col- Some Similarities ceptance rate of the invitee (YL: ! = 0.21*, lective effort. We define the acceptance_rate to be the 0.14*), which indicates that a more strongly c ratio of the number of families one invitee joins over all game network is more attractive for others to jo distinct families that user was invited to join. We fur- sidering the above result, we suspect that soci ther examine how a structured group of inviters exerts might be different from other diffusion agents collective influence over a common invitee. utility is primarily in their social function: th might be significantly amplified by the socialUENTIALS AND NETWORKS Theory FIGURE 11 Figure 453 shows that users are more likely to join as 11 they receive invitations from more inviters. For both YL and DL, the marginal rate is declining and the ac- Data it accommodates, rather than its inherent uti game. In the next section, we will discuss how Research Track Paper cial utility is exerted as one kind of network ext ceptance_rate becomes more saturated. INFLUENCE RESPONSE FUNCTIONS FOR (A) THE SIR MODEL AND (B) THE DETERMINISTIC THRESHOLD RULE changes in membership consistently precede or lag changes in in- terest? While such questions are extremely natural at a qualitative 0.025 5.2. Network Effect of Families in Ga Probability of joining a community when k friends are already members level, it is highly challenging to turn them into precise quantita- tive ones, even on data as detailed as we have here. We approach Above we investigated the pointwise features o 0.02 this through a novel methodology based on burst analysis [22]; we identify bursts both in term usage within a group and in its mem- tions. We now look at how the adoption tak 0.015 from a dynamic perspective: how invitation e probability bership. We find that these are aligned in time to a statistically significant extent; furthermore, for CS conference data in DBLP, we present evidence that topics of interest tend to cross between varies across time, or different stages of the 0.01 conferences earlier than people do. network. First we are interested in whether th 0.005 network externality effect in the game contag Watts & Dodds 2007 Related Work. As discussed above, there is a large body of work on identifying tightly-connected clusters within a given graph (see local community. Family membership can yi 0 0 5 10 15 20 25 30 35 40 45 50 e.g. [14, 15, 16, 20, 28]). While such clusters are often referred to explicit utility such as battle collaboration and k as “communities”, it is important et note2010 is a very different Wei to al that this type of problem fromThe we consider here —Viral Marketing what Dynamics of while this cluster- of benefit by p of joining 2006 19 of belonging” Backstrom et al “feeling Figure 11 Acceptance rate over different number Figure 1: The probabilityimparting aaLiveJournal commu- invitations received by one user in a network based ing work seeks to infer potential communities nity as a function of the number of friends k already inmembership ers. In fact, we observe that family the community. Error bars represent two standard errors. on density of linkage, we start with a network in which the com- E.—Each function reflects the probability of choosing alternative B as a function of the number n or as a fraction b of others choosing B, respectively. i i better performance and higher commitmen Watts & Dodds 2007 munities of interest have already been explicitly identified and seek 0.06 0.08 small assets Heterogeneity of inviter networks: wegrow and to model the mechanisms by which these communities are interested game. We hypothesize these different kinds of 0.20 large assets rate of adoptionhold model but rather that average individuals trigger an important role as early adopters, when networks are suf- et al. [11] study implicitly-defined “communities” of change. Dill whether a heterogeneous group of recommenders exerts 0.06 0.05 integrate and accumulate as a network external 0.1 Probability of joining a conference when k coauthors are already ’members’ of that conference r ones. ficiently sparse, but not as initiators. Finally, group structure For a variety of features (e.g. a particular keyword, a different sort: Probability of Buying Probability of Buying family size gets larger.nother qualitative difference between the SIR and appears to generally impede the effectiveness ofhigher influence or a0.04an individual than subgraph homo- influentials locality, over code), they consider the a more a name of a ZIP 0.08hold models is that early adopters are consistently more both as initiators and early adopters. geneous group. Oneall pages containing this if a user is receiv- of the Web consisting of might argue that feature. Such 0.10 communities of Web pages are still quite different from explicitly- 0.03 0.04 We use the time interval between when theential than average in both low- and high-density re- ing the same where participants deliberately join,butwe study identified groups signal to join a game, as the signal is 0.06 (n+1)th members join a family to measure the p probability s, as shown in figure 13 (again, for low- and high- DISCUSSION coming fromthe questions here; moreover, a wide variety of [11] are quite is more likely 0.02considered infriends, it different nce networks in the top row and the bottom row, re- of a new member to join. Figure 12a presents tofrom our focus here. the0.01 will perceive the game to be 0.02 ively). The reason, once again, is that more influential be heeded as user Whether these results should be regarded as undermining on-line social networking sites for data mining ap- The use of 0.04 age time interval for all families which had 1~ 0.00 iduals are more, not less, susceptible to influence them- what we have called the influentials hypothesis or as sup- and thesubject of a number of recent papers; see ubiquitous been the advice to join the game independent. plications has 0.02 s; thus, there is no trade-off between influence and porting it is ultimately an empirical question. Our main To 0 [1, measure recent examples. These recentof a group 15 inviters 0 5 10 6 26] for two the diversity2 level 4 papers have focused of 8 10 bers.10We 20 a clear drop in 60 time interval 0 see 30 40 50 the on different questions, and have not ofIncoming Recommendations number directly exploited the structure neighbors (k) th Incoming Recommendationsenceability. Given this difference, one might expect thattheall send communities embeddedain these systems. invitee, we point, in fact, is not so much that the influentials hypothesis of user-defined invitations to (a)common Stud- 7 member joins each family. This suggests 0 4 Books (b) DVD 0 2 4 6 8 10 12 14 16 18 in the SIR model, the population of early adopters will is either right or wrong but that its microfoundations, by computerelationship between different al 2009FB Usenet [4, of this 0.2 ies of the the Shannon’s entropy of Bakshy et newsgroups on profiles 0.2 first Leskovec et al 2007 at increasing speed, few members join k de more influentials. As figure 13 shows, this intuition which we mean the details of who influences whom and advantage of the self-identified nature of these on- t supported for the case of “ordinary” influentials (top how, require very careful articulation in order for its validity 35] has taken line communities, although again of inviters can influence this 2: The probabilitythe joining a DBLP community as as more m group. Since the number the specific questions are quite Figure after that p of process slows down a function ofjoin. but is supported for hyperinfluentials when the cor- different. the of 0.15 the number of friends k already in the community. to be meaningfully assessed. Whether stated explicitly orwe tested 0.15 correlation between entropyError bars represent two standard errors. measure, ility of Buying ility of Buying nding network is also sufficiently sparse: the first two not, any claim to the effect that influentials are important earlier, the questions we consider are closely invitees who inviters and acceptance_rate of groups of related As noted to the diffusion of innovations, a broad area of study in the socialerations” of early adopters in figure 13C are clearly necessarily makes a number of assumptions regarding the 33, 34]; same number ofthat is “diffusing” in all received the the particular property invites. However the 0.1 sciences [31, This might imply that a tightly knit group isentials. Thus, although hyperinfluentials do not appear nature of interpersonal influence, the structure of our work is membership in0.1given group. The question of how a influence a absorbed but members peripheral to the social
  • 13. The Homophily Confound Unknown correlation between friends’ characteristics (expected to be stronger for closer friends)Known characteristics Xi Ui Xj Uj Unknown characteristics (e.g. Web browsing behavior, interests) Yja(t0) Alter’s sharing behavior Dija Ego’s sharing behavior Yia(t1) figure stolen from Bakshy, Eckles, Yan & Rosenn, 2012
  • 14. Influence (and homophily) Unknown correlation between friends’ characteristics (expected to be stronger for closer friends)Known characteristics Xi Ui Xj Uj Unknown characteristics (e.g. Web browsing behavior, interests) Yja(t0) Alter’s sharing behavior Other forms of influence Dija Mechanism (e.g. News Feed, social cues) Ego’s sharing behavior Yia(t1) figure stolen from Bakshy, Eckles, Yan & Rosenn, 2012
  • 15. Study 1: Effect of Feed on InformationDiffusion with Itamar Rosenn, Cameron Marlow, and Lada Adamic published as The Role of Social Networks in Information Diffusion. WWW 2012.
  • 16. Study 1: Outline▪ Field experiment tests how much sharing would occur in the absence of exposure via the Facebook feed▪ Answers causal questions about influence & diffusion: ▪ To what extent does feed increase sharing? ▪ Are weak ties responsible for disseminating information?* ▪ How is tie strength predictive of user activity?* *to be continued on April 19th, The Role of Social Networks in Information Diffusion
  • 17. By Web content change can be beneficial to the Web user looking for Correlated Information Sources External Influence er new information, but can also interfere with the re-finding ofwe previously viewed content [31]. Understanding what a person is ly interested in when revisiting a page can enable us to build systems th that better satisfy those interests by, for example, highlightingWe changed content when the change is interesting, activelynd monitoring changes of particular interest, and providing cachedne information when changes interfere with re-finding. th In this paper, we characterize the relationship between revisitation regularly visit same and change by analyzing a large scale Web log trace for 2.3M users and a five week hourly crawl of over 40K Web pages. We begin our analysis by exploring how revisitation behavior relates visit sites that link to mass + interpersonal interpersonal site to change. The analysis validates a number of hypotheses (e.g., popularity and change are correlated), but also uncovers some surprising results. For example, we find that certain measures of the same content media communication page change (e.g., the amount of change) are not linearly related e, to measures of revisitation (e.g., inter-arrival times)ba result with consequences to monitoring tools that use these simple measures Web revisitation to watch for page changes to identify events of likely importance. Blogs Face-to-faceoraybe s.reedndge aedge is itat Telephonege stes r Figure 1. Web pagesKand their componentsKchange at e different rates. People revisit pages for many reasons, at Adar et al, 2009 and their reasons for revisiting can be affected by content y changes. In this paper we show that revisitation rates can News Aggregators s, resonate with the rate of change of interesting content. RSS IM Email
  • 18. Influence on Feed Unknown correlation between friends’ characteristics (expected to be stronger for closer friends)Known characteristics Xi Ui Xj Uj Unknown characteristics (e.g. Web browsing behavior, interests) Yja(t0) Alter’s sharing behavior Other forms of influence Dija Facebook news feed Ego’s sharing behavior Yia(t1) figure stolen from Bakshy, Eckles, Yan & Rosenn, 2012
  • 19. Details▪ Assignment procedure: ▪ (viewer, URL) pairs are deterministically assigned into the feed and no feed condition ▪ Directed shares (via messages, wall posts) are not subject treatment and are removed from experiment▪ Evaluating outcomes: ▪ Compare the likelihood of sharing in the feed (treatment) with the no feed (control) condition
  • 20. Data▪ Random sample of all (user, URL) pairs eligible to be shown in the Facebook news feed between a 7 week period in 2010▪ 253,238,367 subjects▪ 75,888,466 URLs▪ 1,168,633,941 distinct subject-URL pairs (random trials)
  • 21. Temporal Clustering shared within the first hour of exposure users shared at exact same time within one day shared before seeing within one week story on feed 1.0 1.0 0.8 0.8 1.0 cumulative density cumulative density 0.6 0.6 0.8 condition condition feed feedcumulative density 0.4 no feed 0.4 no feed 0.6 condition feed 0.4 0.2 no feed 0.2 0.2 0.0 0.0 0 5 10 15 20 25 30 -5 0 5 10 15 20 25 30 0.0 share time - alters share time (days) share time - exposure time (days) -5 0 5 10 15 20 25 30 Absolute share time - exposure time (days) time Relative to first exposure
  • 22. What is the overall effect of feed on sharing?▪ Two methods for comparing probabilities: ▪ Average treatment effect of the treated: pfeed - pno feed ▪ Relative risk ratio: pfeed / pno feed▪ Average effect: +0.2047% increase in sharing▪ Risk Ratio: 7.3x more likely to share
  • 23. How does sharing increase with exposure? Influence on feed + external correlation External correlation 0.025 ! ! ! ! feed ! ! 0.020 ! probability of sharing ! no feed ! 0.015 ! 0.010 ! 0.005 ! 0.000 1 2 3 4 5 6 number of sharing friends
  • 24. How does sharing increase with exposure? 0.030 0.025 p f eed − p no feed 0.020 0.015 0.010 0.005 0.000 1 2 3 4 5 6 number of sharing friends
  • 25. Study 1: Recap▪ Experiments are necessary to disentangle influence from other factors▪ Significant temporal clustering exists even for unexposed users▪ Probability of sharing increases with number of friends ▪ Even you don’t see those friends! ▪ Influence appears stronger when more friends are shown
  • 26. Study 2: Effect of Social Cues on SharingDecisions Chapter IV, Information Diffusion and Social Influence in Online Networks (dissertation chapter)
  • 27. Motivation▪ Social influence in information diffusion occurs via two stages ▪ 1. Exposure (study 1) 0.030 ▪ 2. Decision to share 0.025 p f eed − p no feed 0.020▪ Trend in previous experiment is not causal 0.015▪ Need a way to experimentally manipulate 0.010 the number of social signals received by 0.005 the user 0.000 1 2 3 4 5 6 number of sharing friends
  • 28. Study 2: Outline▪ Field experiment tests how the number of friends shown (social cues) increases sharing via randomization of cues▪ Answers causal questions about influence: ▪ How does seeing a certain number of peers effect information diffusion? ▪ How is tie strength predictive of user activity? ▪ Are strong ties more influential?
  • 29. Experimental Design▪ Subjects: Users that arrive at pages independent of Facebook ▪ Are or would have been assigned to the no feed condition in Study 1 ▪ Not arriving via Facebook▪ Assignment procedure: randomly assign (viewer, URL) to a number of cues
  • 30. Data▪ Same 7 week period as Study 1▪ 1,891,768 randomized trials (unique subject-page pairs) consisting of: ▪ 1,156,608 unique subjects ▪ 470,089 distinct web pages▪ Record demographic features, tie strength measures between subjects and their alters for each impression and click event
  • 31. Social Correlation▪ Probability when number of friends liking = number shown 0.08 probability of sharing Not causal! 0.06 0.04 0.02 0.00 0 1 2 3 4 number of sharing friends (k)
  • 32. baseline: homophily + heterogeneity (zero friends shown) observed effect number of actual liking friends } 0 1 2 3 0.08probability of sharing 0.06 0.04 0.02 0.00 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 number of friends shown (o)
  • 33. Predictive Power of Strong Ties▪ Consider cases where 1 friend liked a page 0.028 cue shown probability of sharing 0.026 0.024 cue not shown 0.022 0.020 5 10 15 tie strength
  • 34. Cues Matter More for Strong Ties 1.18 1.17 relative risk ratio 1.16 1.15 1.14 5 10 15 tie strength
  • 35. Study 2: Recap▪ Experiments are necessary to understand the effect of cues on user behavior▪ We introduce the cue-response function which shows how the number of social signals received influences user behavior▪ Tie strength is predictive of sharing decisions▪ Strong ties appear more influential
  • 36. Implications▪ Correlated activity cannot be attributed to influence alone!▪ Viral marketing and identifying “influencers” ▪ If it weren’t for the influential, would users still acquire the information? ▪ Do probability increases justify targeting clusters of individuals?▪ Relevance ▪ Social data (tie strength, number of friends) allows us to identify relevant content▪ Integrating social into Web products ▪ Social cues increase engagement rates
  • 37. Upcoming Work▪ Similar to Study 2, but with ads: Social Influence in Social Advertising: Evidence from Field Experiments Eytan Bakshy, Dean Eckles, Rong Yan, Itamar Rosenn EC 2012. 1.30 1.25 1 friend shown normalized click rate 1.20 1.15 1.10 1.05 1.00 1 2 3 4 5 6 number of associated friends
  • 38. Future Work▪ Information diffusion: ▪ Constructed observational study ▪ Effect of cues during exposure via feed▪ Social cues: Individual differences in persuasion ▪ Do certain users respond differently to social cues? ▪ Simple or complex contagion? ▪ Are strong ties actually more influential?
  • 39. Thanks!▪ Collaborators: Lada Adamic, Dean Eckles, Cameron Marlow, Itamar Rosenn▪ More to come on Thursday!▪ Find out more about Data Science at Facebook here: ▪ http://www.facebook.com/data▪ Questions?