Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Prasanta Bhattacharya - WESST - Social Networks and Causal Inference
1. Social Networks and
Causal Inference
Prasanta Bhattacharya
Scientist, Institute of High Performance Computing (Singapore)
PhD (Information Systems), National University of Singapore
prasanta_bhattacharya@ihpc.a-star.edu.sg
2. Princeton – 1, Facebook - 0
• In 2014, researchers at Princeton published a paper titled “Epidemiological modeling of online
social network dynamics”
“….we then applied the model to the
Google data for search query Facebook.
Extrapolating the best fit model into the
future suggests that Facebook will
undergo a rapid decline in the coming
years, losing 80% of its peak user base
between 2015 and 2017.”
(Cannarella and Spechler, 2014)
2
3. Princeton – 1, Facebook – 1
• Researchers at Facebook replied with a “Debunking Princeton” post
• “In keeping with the scientific principle "correlation equals causation," our research
unequivocally demonstrated that Princeton may be in danger of disappearing entirely”
3
4. Princeton – 1, Facebook – 2
• “This trend suggests that Princeton will have only half its current enrollment by
2018, and by 2021 it will have no students at all, agreeing with the previous
graph of scholarly scholarliness”
4
5. Princeton – 1, Facebook – Inf.
• “While we are concerned for Princeton University, we are even more concerned about the fate
of the planet — Google Trends for "air" have also been declining steadily, and our projections
show that by the year 2060 there will be no air left”
5
6. Association vs. Causation
• Association: Changes in outcome (Y) you
would expect to see when a certain factor (X)
changes
e.g. rainfall and umbrella sightings
• Causation: Changes in outcome (Y) you would
expect to see when you change a certain factor
(X), everything else held constant
e.g. opening umbrellas causes rainfall
6
7. What correlation is and isn’t..
• X and Y are associated if
- Knowing X provides some information about Y
- Observing the value of X changes the conditional distribution of the
observed Y i.e. P(Y/X) != P(Y)
- Observing X helps better predict observed Y (and vice versa)
- Observing X changes our beliefs about distribution of Y
- NO claims about conditional distribution of Y, if instead of simply
observing, we manipulate/disturb/intervene on X
(Tsamardinos et al. 2013) 7
8. What correlation is and isn’t..
• Causation doesn’t imply (linear) correlation either!
8
9. Why are we generally bad at this?
• Evolutionary bias: We are naturally inclined to think about causation. In
1940s, Albert Michotte theorized that we see causality, like colors!
9
10. Why are we generally bad at this?
• Confirmation bias: Sometimes we just assume things to be correct,
because they “sound” correct
10
11. Why are we generally bad at this?
• Attribution bias : We do not think about hidden factors
“The American Voter” (1960)
11
12. Why are we generally bad at this?
• Prevalent myths:
• Correlation implies causation unless proven otherwise
• Necessary but not sufficient condition
• We can uncover causation using observational data
• Only true in specialized contexts
• Correlation implies causation when the correlation is statistically
significant
• Nope
• We can test causation using regressions
• Only true if your data is “experimentally” generated
12
13. Causation checklist
1. Association (significant association exists between cause and effect)
2. Temporal precedence (cause precedes effect)
3. Isolation (no other external factor influences the cause and effect)
13
15. Counterfactuals & randomization
𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝐸𝑓𝑓𝑒𝑐𝑡, = 𝑂𝑢𝑡𝑐𝑜𝑚𝑒,,1234135 − 𝑂𝑢𝑡𝑐𝑜𝑚𝑒,,781 1234135
E.g. Effect of education on salary
𝑆𝑎𝑙𝑎𝑟𝑦, = 𝑆𝑎𝑙𝑎𝑟𝑦,,35<=4135 − 𝑆𝑎𝑙𝑎𝑟𝑦,,<735<=4135
But how can i be both educated and uneducated at the same time ?
• Randomization solves this problem! (RCTs)
• Construct 2 groups
• Randomly assign people to each group (identical construction)
• Treat one group while keeping the other group untreated
• Can you force people to go to grad school?
• Difference in salary outcomes between groups is the effect of education
15
18. Experimentation has a rich history
• Mention of treatment and control groups in the Book of Daniel (6th century
BC) (No randomization)
• Dr. James Lind (1747) experimented with different methods to treat scurvy
(No randomization)
• 17th-century Belgian physician Van Helmont
• Bloodletting vs. evacuation as a cure for everything (randomization)
• Famous streptomycin trial of 1948 (among first medical trials to actually do a RCT)
18
19. RCT limitations
• Ethical limitations
• Education -> Salary
• Is it fair to design such that some people do not get education ?
Famous debate: Smoking -> Cancer?
• Most studies use observational data
• RCT is impractical and unethical
• Theoretical challenges
• Randomization at dyadic and network level is non-trivial
19
20. From individual to dyads
• So far: intervention on Xi leads to outcome Yi
• What about intervention on Yj , where path(i,j) exists?
• Such peer-level causal processes are everywhere:
• Adoption of brands/products (Bapna and Umyarov 2015 ; Aral and Walker 2011; Reingen et al. 1984 )
• Diffusion of innovations (practices) (Coleman et al. 1957; Banerjee et al. 2012)
• Spread of disorders (Christakis and Fowler 2007; Valente 2010)
• Voting behavior (Jones et al. 2017; Lazarsfeld et al 1955; Watts and Dodds 2007)
• Content consumption (Bakshy et al. 2015; Kramer et al. 2014)
20
21. Joke.
• 3 kinds of social science researchers
1. Whose research focus on social influence
2. Whose research focus on social influence, but they choose to ignore it
3. Whose research focus on social influence, but they don’t know it yet
21
22. Peer influence vs. homophily
• Users in a dyad generally show correlated behavior (as compared to a
random pair)
• Hypothetical story from Shalizi and Thomas (2011):
“Suppose that there are two friends named Ian and Joey, and Ian’s parents
ask him : “If your friend Joey jumped off a bridge, would you jump too?”
Why might Ian answer “yes”?”
22
23. (at least) 6 reasons..
1. Joey’s example inspired Ian (social contagion/influence)
2. Joey infected Ian with a parasite which suppresses fear of falling (biological contagion)
3. Joey and Ian are friends on account of their shared fondness for jumping off bridges (observed
homophily on the focal behavior)
4. Joey and Ian became friends through a thrill-seeking club, whose membership rolls are publicly
available (observed homophily on a secondary behavior)
5. Joey and Ian became friends through their shared fondness for roller-coasters, which was caused
by their common thrill-seeking propensity (latent homophily)
6. Joey and Ian realize that the bridge is about to collapse and think jumping early is the safer option
(shared external context)
23
24. Disentangling influence from all else
• Proving existence of homophily and influence is easy
• Quantifying homophily and influence is possible in “special” contexts
• Isolating the effects of influence from any homophily is hardest
• “Homophily and contagion are generically confounded in observational social network
studies” (Shalizi and Thomas 2011)
24
25. An RCT example: Social advertising
(Bakshy et al. 2012)
• Instead of assigning behavior to peers, manipulate cues
25
29. Challenges
1. Sampling individual users during randomization
• e.g. treatment is video chat application but none of the peers are in
treatment group
2. Conventional A/B tests in networks are almost always vulnerable to
network interference
3. Preserving network structure for treatment vs. control groups
29
35. Graph cluster randomization
(Ugander et al. 2013)
• Natural partitions? E.g. countries
• Scalable graph cutting methods
• Community detection
• Label propagation (Ugander and
Backstrom 2013)
35
36. Observational approaches
• Modeling the counterfactual using structural approaches
• Propensity scores (Aral et al. 2009; Eckles et al. 2017)
• Bayesian/graphical approaches (Brodersen et al. 2015)
• Variants of random graph models (Snijders et al. 2010, Steglich et al. 2010)
• Relaxes the dyadic independence, but assumption intensive!
• E.g. strong assumptions on latent variables, reverse causality etc.
• Natural experiments (shocks to networks)
• Natural disasters (Phan et al. 2015)
• Migration (Munshi 2003)
• Group assignments (Algan et al. 2015; Sacerdote 2001)
36
37. Investigating The Impact Of Network Effects On Content Generation:
Evidence From A Large Online Student Network
(Bhattacharya, Phan and Airoldi, working)
? • Social media postings are a fast
changing behavior
• Link formation is endogenous e.g. there
might be preferential attachment
• Experimentally manipulating peer
behavior is neither realistic nor useful
37
38. The co-evolution dynamics
MS 1
Stage 1 (t0) Stage 2 (t1)
MS 2
Simulated state
Observed state
Q1. How frequent are the microsteps? Rate function.
Q2. How are microstep decisions made? Random utility function (objective function).
38
40. From dyads to topologies..
• So far: intervention on Xi leads to change in outcome Yi , intervention on Xj ,
Yj leads to change in outcome Yi
• But do network topologies matter? How about incompletely observed
structures?
40
41. Concluding thoughts..
• Finding associations is easy
• Attributing causation is hard
• Attributing causation on networks is
really hard
• Attributing causation on networks using
observational data and unbiased and
scalable estimators is really really hard
41
42. Thank you
Prasanta Bhattacharya
Scientist, Institute of High Performance Computing (Singapore)
PhD (Information Systems), National University of Singapore
prasanta_bhattacharya@ihpc.a-star.edu.sg