Your SlideShare is downloading. ×
0
Social Influence & Homophily
Nitish Upreti
nzu100@cse.psu.edu
OUTLINE
•
•
•
•
•
•
•
•

Introduction and Review.
Motivation
Related Work
Problem Definition
Statistics Background
Methodo...
PROBLEM DEFINITION
“Identifying and measuring
individual Homophily and Social
Influence effects on a dataset.”
Quick Review
• Social Influence : Our friendship and behavior
is affected by Social Influence (to conform to
our neighbors...
Quick Note before we start…
We will refer to Selection as Homophily
(Reason: Authors assume that if Homophily
effects are ...
MOTIVATION
Selection Vs social influence: Why do
we care?
• If Social Influence is a significant factor, then
targeting key individua...
REAL WORLD SCENARIO
• A firm selling products to consumers in a
social network.
• The firm knows that friends in the netwo...
How can the firm take advantage?
• If it is the taste similarity that drives the
similar decisions, the firm should direct...
SELF ANALYSIS
A Real World Problem worth Solving.
EXISTING WORK
• A lot of research has gone into understanding
“Homophily” and “Social Influence” in social
networks.
• Qui...
SURVEY OF RELATED WORK
RELATED WORK - 1
• “Homophily or Influence? – Analysis of
Purchase Decisions in a Social Network
Context”
http://people.st...
QUICK LOOK AT THE STUDY
• Phone call history dataset (3.7 Million) from
an Indian Telecom company over a 6 month
period fo...
RELATED WORK - 2
• “Social selection and peer influence in an
online social network.”
http://www.irle.berkeley.edu/culture...
QUICK LOOK AT THE STUDY
• Employs Facebook activity of college students.
• Coevolution of friendship and tastes in music,
...
RELATED WORK - 3
• “Distinguishing influence-based contagion
from Homophily driven diffusion in dynamic
networks.”
http://...
QUICK LOOK AT THE STUDY
• Employs the study of a longitudinal dataset
that combines the global network of daily
instant me...
ANALYSIS OF EXISTING APPROACHES
• Empirical Investigations
(Focuses on demonstrating the presence
Homophily and Influence ...
TODAY’S
FOCUS
“Randomization
Tests
for
Distinguishing Social Influence and
Homophily
Effects.”
https://www.cs.purdue.edu/h...
INTRODUCTION
• In Social Network, connected instances are
likely to have auto correlated attributes value.
• “Two friends ...
THE EXPERIMENT / SUPPORT
• A subset of data from a Facebook group in
Purdue.
• Time step from 2008(t) to 2009(t+1)
• Hypot...
PROBLEM DEFINITION
• Relational data represented as an undirected,
attributed graph G=(V,E)
• Each node v belongs to V, ha...
BACKGROUND
• In Statistics, an association is a relationship
between
two
statistically
dependent
quantities.
• ‘Relation A...
STATISTICS 101
CHI-SQUARE STATISTICS
• How likely is an observed distribution due to
chance?
• Observe 100 students to see “whether atten...
CHI-SQUARE Continued….
• The test compares the observed data to a model that
distributes the data according to the expecta...
Calculating Relational Autocorrelation
CORRELATION GAIN
gain(t,t+1) = C( Xt+1, Gt+1 ) – C( Xt , Gt)
(The gain could be due to Homophily or Social Influence)
HOMOPHILY Continued…
If a Homophily effect is present in the data, the
autocorrelation will increase when we consider
the ...
SOCIAL INFLUENCE Continued…
If an influence effect is present in the data, the
autocorrelation will increase when we consi...
METHODOLOGY
(Randomization Tests)
RANDOMIZATION TESTS
• Provide a robust statistical technique for
hypothesis testing.
• Generates several Pseudosamples (pe...
ANALYSIS OF KEY ISSUES
AND ASSUMPTIONS
(For Randomization Tests)

• Make an appropriate NULL Hypothesis.
• The data is per...
SELF ANALYSIS
The Approach is quite relevant and appropriate
as there are no assumptions on the underlying
model.
Also bot...
NULL HYPOTHESIS
• H0H : Link changes are random and are not due
to attribute values in t.
• H0I : Attribute changes are ra...
POSSIBLE PERMUTATIONS
CHOICE BASED RANDOMIZATION
• For H0H we can maintain the edge addition in t+1
but randomize the choice of target node so t...
CALCULATING CHOICE BASED
RANDOMIZATION
•
•
•
•

Non Trivial Problem.
A greedy assignment is involved.
Collect all the chan...
SELF ANALYSIS
Where to go from here?
• Changing the granularity of time step to
investigate deeper.
• Investigating why ce...
SUMMARY
• Successful Employed a Randomization Technique
for distinguishing Homophily and Social Influence.
• Tested the hy...
PERSONAL TAKEAWAY

Take a Statistics Class !
THANK YOU!
Socail Influence & Homophilly
Socail Influence & Homophilly
Socail Influence & Homophilly
Socail Influence & Homophilly
Upcoming SlideShare
Loading in...5
×

Socail Influence & Homophilly

1,775

Published on

Quantifying the individual effects of Social Influence and Homophily in a Dataset.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,775
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Socail Influence & Homophilly"

  1. 1. Social Influence & Homophily Nitish Upreti nzu100@cse.psu.edu
  2. 2. OUTLINE • • • • • • • • Introduction and Review. Motivation Related Work Problem Definition Statistics Background Methodology Where to go from here? Summary
  3. 3. PROBLEM DEFINITION “Identifying and measuring individual Homophily and Social Influence effects on a dataset.”
  4. 4. Quick Review • Social Influence : Our friendship and behavior is affected by Social Influence (to conform to our neighbors value). • Selection: We have a tendency to be friends with people who are like us. • Homophily: A widely observed social phenomena which states that “we tend to be similar to our friends”.
  5. 5. Quick Note before we start… We will refer to Selection as Homophily (Reason: Authors assume that if Homophily effects are present, we tend to select individuals with similar values)
  6. 6. MOTIVATION
  7. 7. Selection Vs social influence: Why do we care? • If Social Influence is a significant factor, then targeting key individuals and trying to modify undesirable behavior can be effective since we are then viewing such behavior as a process of influence spread. • Otherwise, focusing on a few individuals will at best change the behavior of a few individuals.
  8. 8. REAL WORLD SCENARIO • A firm selling products to consumers in a social network. • The firm knows that friends in the network often make similar purchases. • What is the reason behind this similarity? • Is it because they have similar tastes, since, after all, they are friends? • Is it because one influences the other’s decision, as they communicate frequently? Credits: (Homophily or Influence? – Analysis of Purchase Decisions in a Social Network Context Liye Ma, Alan Montgomery and Ramayya Krishnan )
  9. 9. How can the firm take advantage? • If it is the taste similarity that drives the similar decisions, the firm should directly target friends of that customer by offering discounts to them. • If, it is social influence that drives the similarity, the firm should incentivize that customer to promote the product or service to her friends. Credits: (Homophily or Influence? – Analysis of Purchase Decisions in a Social Network Context Liye Ma, Alan Montgomery and Ramayya Krishnan )
  10. 10. SELF ANALYSIS A Real World Problem worth Solving.
  11. 11. EXISTING WORK • A lot of research has gone into understanding “Homophily” and “Social Influence” in social networks. • Quickly mention studies which involve direct analysis of “Identifying and measuring Homophily and social influence effects”. • This problem area serves as one of the biggest open ended challenges to Social Scientists. ( will make a good class project as well :D )
  12. 12. SURVEY OF RELATED WORK
  13. 13. RELATED WORK - 1 • “Homophily or Influence? – Analysis of Purchase Decisions in a Social Network Context” http://people.stern.nyu.edu/bakos/wise/papers/wise2009-5b2_paper.pdf
  14. 14. QUICK LOOK AT THE STUDY • Phone call history dataset (3.7 Million) from an Indian Telecom company over a 6 month period for purchase records of monthly Caller Ring Back Tones (CRBT) subscription. • Social Influence & Homophily is studied. • Study builds a “Hierarchical Bayesian model” which simultaneously accounts for both Homophily and social influence effect in consumers’ decision process.
  15. 15. RELATED WORK - 2 • “Social selection and peer influence in an online social network.” http://www.irle.berkeley.edu/culture/conf2012/lewis_soc12.pdf
  16. 16. QUICK LOOK AT THE STUDY • Employs Facebook activity of college students. • Coevolution of friendship and tastes in music, movies and books over a 4 year time period is analyzed. • A “Stochastic actor-based” modeling is employed to analyze individual effects of Social Influence & Homophily.
  17. 17. RELATED WORK - 3 • “Distinguishing influence-based contagion from Homophily driven diffusion in dynamic networks.” http://www.pnas.org/content106/51/21544.full.pdf
  18. 18. QUICK LOOK AT THE STUDY • Employs the study of a longitudinal dataset that combines the global network of daily instant messaging (IM) traffic among 27.4 million users of Yahoo with day-by-day adoption of a mobile service application (Yahoo! Go) • A sample estimation framework to distinguish influence based on “Matched sample estimation” is developed.
  19. 19. ANALYSIS OF EXISTING APPROACHES • Empirical Investigations (Focuses on demonstrating the presence Homophily and Influence in real world data sets) of • Significance Tests for Relational and Social network data (Focuses mostly on static networks) • Modeling Techniques Homophily & Influence. for distinguishing (Accuracy is impacted by suitability of model)
  20. 20. TODAY’S FOCUS “Randomization Tests for Distinguishing Social Influence and Homophily Effects.” https://www.cs.purdue.edu/homes/neville/papers/lafond-neville-www2010.pdf
  21. 21. INTRODUCTION • In Social Network, connected instances are likely to have auto correlated attributes value. • “Two friends are more likely to share a common political belief than two random strangers.” • Presents a Randomization technique for temporal network data for measuring individual contribution of Homophily and Social Influence (details coming soon!).
  22. 22. THE EXPERIMENT / SUPPORT • A subset of data from a Facebook group in Purdue. • Time step from 2008(t) to 2009(t+1) • Hypothesis tested on : 1. Semi Synthetic Data with no Homophily & Social Influence. 2. Semi Synthetic Data with strong Homophily or Influence effect. 3. Actual experiment on real dataset. • Efficacy of the approach was proven for all conditions.
  23. 23. PROBLEM DEFINITION • Relational data represented as an undirected, attributed graph G=(V,E) • Each node v belongs to V, has a number of attributes (X1………….Xm) • For a time step ‘t’, the attributes and relationships can change. • Significant Influence : Attributes in t+1 depend on link structure at t. • Significant Homophily : Link structure in t+1 will depend on attributes at t. (Keep them in mind! We will come back to them)
  24. 24. BACKGROUND • In Statistics, an association is a relationship between two statistically dependent quantities. • ‘Relation Autocorrelation’ : Statistical dependency between values of the same variable on related object. ( Abundant in our dataset) Why? • In this work we use the Chi-Square statistics.
  25. 25. STATISTICS 101
  26. 26. CHI-SQUARE STATISTICS • How likely is an observed distribution due to chance? • Observe 100 students to see “whether attending class influences how students perform on exam?” • Four categories : – – – – Students who attend class and pass. Students who attend class and do not pass. Students who do not attend class and pass. Students who do not attend class and do not pass. • Null Hypothesis : There is no difference based on attending classes.
  27. 27. CHI-SQUARE Continued…. • The test compares the observed data to a model that distributes the data according to the expectation that the variables are independent. Wherever the observed data doesn't fit the model, the likelihood that the variables are dependent becomes stronger, thus proving the null hypothesis incorrect! • Degree of freedom : Values in final calculations that are free to vary. • Calculate the Chi Square value. (How?) • Calculate the more interesting ‘p’ value (Percentage likelihood that the null hypothesis is correct)
  28. 28. Calculating Relational Autocorrelation
  29. 29. CORRELATION GAIN gain(t,t+1) = C( Xt+1, Gt+1 ) – C( Xt , Gt) (The gain could be due to Homophily or Social Influence)
  30. 30. HOMOPHILY Continued… If a Homophily effect is present in the data, the autocorrelation will increase when we consider the link changes from time t to time t+ 1 : C( Xt , Gt+1 ) – C( Xt , Gt ) (The Chi-Square value is a single number that adds up all the differences between our actual data and the data expected.)
  31. 31. SOCIAL INFLUENCE Continued… If an influence effect is present in the data, the autocorrelation will increase when we consider the attribute changes from time t to time t + 1: C( Xt +1 , Gt ) – C( Xt , Gt ) (The Chi-Square value is a single number that adds up all the differences between our actual data and the data expected.)
  32. 32. METHODOLOGY (Randomization Tests)
  33. 33. RANDOMIZATION TESTS • Provide a robust statistical technique for hypothesis testing. • Generates several Pseudosamples (permutations of original data sets). • Correlation gain is calculated for each Pseudosample. • Value of observed gain is then compared to distribution of scores. • A high variance in comparison to the distribution is deemed significant.
  34. 34. ANALYSIS OF KEY ISSUES AND ASSUMPTIONS (For Randomization Tests) • Make an appropriate NULL Hypothesis. • The data is permuted in a way that accurately reflects the null hypothesis.
  35. 35. SELF ANALYSIS The Approach is quite relevant and appropriate as there are no assumptions on the underlying model. Also both the attribute values and link change over time which focuses on assessing both Influence and Homophily.
  36. 36. NULL HYPOTHESIS • H0H : Link changes are random and are not due to attribute values in t. • H0I : Attribute changes are random and are not due to friends in t. • H0F : Both attribute and link changes are random.
  37. 37. POSSIBLE PERMUTATIONS
  38. 38. CHOICE BASED RANDOMIZATION • For H0H we can maintain the edge addition in t+1 but randomize the choice of target node so that each node has the same number of additions and deletions. • For H0I we can randomized the choice of attribute value to replace in t+1, so that any similarity of the value is destroyed. • This is popularly referred to as “choice-based” randomization, as we are randomizing the result of choices(attribute/link changes)
  39. 39. CALCULATING CHOICE BASED RANDOMIZATION • • • • Non Trivial Problem. A greedy assignment is involved. Collect all the changes (edge & attributes). Sort the nodes and attributes from those with least number of random options to those with largest options. • Prevents abusing the underlying NULL hypothesis
  40. 40. SELF ANALYSIS Where to go from here? • Changing the granularity of time step to investigate deeper. • Investigating why certain groups had more of Homophily or Social Influence? • Apart from friendship, considering other influential effects.
  41. 41. SUMMARY • Successful Employed a Randomization Technique for distinguishing Homophily and Social Influence. • Tested the hypothesis on different synthetic-real world data sets. • Different groups had Influence and Homophily vary to different degree based on group properties.
  42. 42. PERSONAL TAKEAWAY Take a Statistics Class !
  43. 43. THANK YOU!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×