Identified points of information diffusion in physical social networks (villages in India).
Completed a network analysis over 75 villages in India and suggesting an algorithm to select
leaders for training initiatives accounting for diversity and economic cost.
2. India - Background
• 2nd most populated country
• Social networks play a big role in information diffusion and government
interventions
• Social networks characteristics
•Population - 1.3 billion
•Religion - 8 Different religions
•Caste (& sub-caste) - 4 castes
•Education
•Gender
•Languages - 122 major, 1600 other
3. Problem statement
• Who are the best k people to reach out if the government
wants to spread awareness about a policy?
• Do people interact more with people of their own community (homophily)?
• What are the characteristics of the important nodes of a network?
• How can we spread the information to diverse nodes in the network?
4. Our Research Approach
1. Understanding homophily (with respect to religion)
2. Understanding network density (with respect to religion)
3. Understanding the general characteristics of the most connected nodes
4. Finding the most influential nodes
5. • Social network data of 75 Indian villages
• Characteristics of edges:
• who gave advice to whom,
• who lent money to whom,
• who borrowed rice from whom, …
• Characteristics of nodes:
• religion
• caste
• gender
• Reference:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/21538
Data
11. Key Finding on Density
• The density of villages is associated with the homogeneity of religion.
• i.e. Villages with more than one religion tend to be more disperse
than villages with one religion.
• Policy Impact: More effort has to be made to spread information in
villages with more than one religion.
12. Homophily by religion
• Homophily = No. of same religion friends / Total no. of friends
• Challenge: Network structure effects
13. Procedure
• Calculated homophily of village graph
• Calculated homophily of random graph
• Normalize by removing the homophily of random graph
14. Normalized homophily
• The average homophily by
religion is 9.91%
• What does this mean?
• If a person interacts with 100
people, he interacts
irrespective of religion with 90
people and with 10 people of
the saame religion
15. Homophily in a particular religion
Average homophily = 10.25%
HINDUS MUSLIMS
Average homophily = 30.21%
16. Findings
• People prefer to interact more with people of same religion
• Minority groups have larger homophily on average
• Since there is homophily, the government should pass the
information to people from all religions
17. Characteristics of Most Connected Nodes
• Degree Centrality
• Characteristics shortlisted for analysis
• Gender
• Religion
• Caste
• Education
• Work
• Native
20. Observations
• Male twice as likely to be most connected than a female
• People with Degree or Diploma 20% more likely to be most
connected than people with early, mid and high school education
• People with no education are 20% less likely
• People who are working are 1.5 times more likely to be most
connected than unemployed
• Natives are twice as likely to be most connected that non natives
• About 98% of the high degree nodes were Hindus!
21. Challenges
• The nodes with highest centrality measures might not be scattered
in the graph
• How should one identify leaders among nodes with
underrepresented characteristics?
• Population Distributions are not balanced! Some characteristics
dominate over others.
• Any alternative approach ?
22. Our Proposed Solution
1.Algorithm
a.Calculate the degree centrality of all individuals
b.Each node’s relevance measure is defined by the following formula:
i. spread_calc = centrality / neighbor’s centrality
Maximum centrality ensures that you are connected to a lot of people, low
neighborhood centrality ensures that the neighbors are less connected.
ii. Calculate the score for all the nodes, and sort it.
iii.Pop the highest one, remove its neighbors.
1. Do this until no node exists or on a threshold on the number of people we can select.
23. Introducing Diversity
Select d as the number of underrepresented people that can act as
leaders.
Run the same formula :
i. spread_calc = centrality / neighbor’s centrality + difference of the factors that it has
and the factors that it shares with the majorities.
ii. Add weights based to add preference (this is a naive approach)
24. Leaders
• The authors identified leaders, and can our algorithm perform
better than them?
• Assuming that identified leaders spread information everywhere,
the Household leaders perform better
Household Leader Individual number of leaders from our algorithm
Median :23.00
Mean :24.51
Median :42.0
Mean :45.2
25. Leaders
• If the information can be easily given, Household leaders are the
best.
• However, long-term information spread (like training) can be a
different case.
Total Cost for household = Average Number of people in Household * Cost of training per person * Number of
leader household * effort (effort is the fraction required to teach the entire household as compared to an
individual, for now this is a constant but it can be a future research question)
Total Cost for individuals = Cost of training per person * Number of leader individual
26. Experiment tunings
• We experimented with Average Number of people in Household
and kept everything else constant.
• From a series of t-test is
• Average Number of people = 1, Households should be preferred
• Average Number of people = 2, No significance
• Average Number of people > 2, Our algorithm works better cost
wise.
27. Group efforts are cheaper
Average Indian household size is 4.8, so an effort reduction of almost
80% is required to make household costs significant.
28. Assumptions
• Long-term training vs short-term information
• The effort in educating groups
• Household data is on the lower bound since the survey was not a
complete depiction of the villages.