More Related Content
Similar to Social network analysis part ii
Similar to Social network analysis part ii (20)
Social network analysis part ii
- 2. Agenda
1. Intro
2. Measuring Networks
– Embedding Measures (Ties)
– Positions and Roles (Nodes)
– Group Concepts
3. Network Mechanisms
4. Network Theories
© Thomas Plotkowiak 2010
- 3. Introduction
Knoke information exchange network
In 1978, Knoke & Wood
collected data from
workers at 95 organizations in
Indianapolis. Respondents
indicated with which other
organizations their own
organization had any of 13
different types of relationships.
The exchange of information
among ten organizations that
were involved in the local
political economy of social
welfare services in a Midwestern
city.
© Thomas Plotkowiak 2010
- 5. Embedding Measures
• Reciprocity (Dyad Census)
• Transitivity (Triad Census)
• Clustering
• Density
• Group-external and group-internal Ties
• Other Network Mechanisms
© Thomas Plotkowiak 2010
- 6. Reciprocity
• With symmetric data two actors are either connected or not.
• With directed data there are four possible dyadic
relationships:
– A and B are not connected
– A sends to B
– B sends to A
– A and B send to each other.
© Thomas Plotkowiak 2010
- 7. Reciprocity II
• What is the reciprocity in this network?
– Answer 1: % of pairs that have reciprocated ties / all possible pairs
• AB of {AB,AC,BC} = 0.33
– Answer2: % of pairs that have reciprocated ties / existing pairs
• AB of {AB,BC} = 0.5
– Answer 3: % directed ties / all directed ties
• {AB,BA} of {AB, BA, AC, CA, BC, CA} = 0.33
© Thomas Plotkowiak 2010
- 8. Transitivity
• With undirected there are four possible types of triadic relations
– No ties
– One tie
– Two Ties
– Three Ties
• The count of the relative prevalence of these four types of relations is
called "triad census“. A population can be characterized by:
– "isolation"
– "couples only"
– "structural holes" (one actors is connected to two others, who are not
connected to each other)
– or "clusters"
© Thomas Plotkowiak 2010
- 9. Transitivity II
Directed Networks
M-A-N number:
M # of mutual positive dyads
A #asymmetric dyads
N #of null dyads
D =Down, U = Up, C = Cyclic, T= Transitive
© Thomas Plotkowiak 2010
- 10. Triad Census Models
(all) (all)
Linear Hierarchy Model
Every triad is 030T (all)
(all) (all)
Balance Model with Two Cliques
(Heider Balance)
Triads either 300 or 102 Ranked Clusters Model (Hierarchy of Cliques)
Triads: 300, 102, 003, 120D, 120U, 030T, 021D, 021U
© Thomas Plotkowiak 2010
- 11. Example
Directed information exchange network
9
8
7
6
1
5 3
10
2
4
The exchange of information among ten organizations that were involved in the local
political economy of social welfare services in a Midwestern city.
© Thomas Plotkowiak 2010
- 12. A
Transitivity III 1 3
B C
2
• How to measure transitivity?
– A) Divide the number of found transitive triads by the total number of
possible triplets (for 3 nodes there are 6 possibilities)
– B) Norm the number of transitive triads by the number of cases where
a single link could complete the triad.
Norm {AB, BC, AC} by {AB, BC, anything)
(for 3 nodes there are 4 possibilities)
© Thomas Plotkowiak 2010
- 14. Clustering
Most actors live in local neighborhoods and are connected to one
another. A large proportion of the total number of ties is highly
"clustered" into local neighborhoods.
VS.
© Thomas Plotkowiak 2010
- 16. Average Local Clustering coefficient
A measure to calculate how clustered the graph is we examine the local
neighborhood of an actor (all actors who are directly connected to ego) and
calculate the density in this neighborhood (leaving out the ego). After doing
this for all actors, we can characterize the degree of clustering as an average of
all the neighborhoods.
C=1 C = 1/3 C=0
© Thomas Plotkowiak 2010
- 17. Individual local clustering coefficient
(in this case for directed ties)
Clustering can also be examined for each actor:
– Notice actor 6 has three neighbors and hence only 3 possible ties. Of
these only one is present, so actor 6 is not highly clustered.
– Actor eight has 6 neighbors and hence 15 pairs of neighbors and is
highly clustered.
2 edges out of 6
edges
© Thomas Plotkowiak 2010
- 18. Density for groups
Instead of calculating the density of the whole network (last
lecture), we can calculate the density of partitions of the network.
Governmental agencies
Non-governmental generalist
Welfare specialists
A social structure in which individuals were highly clustered
would display a pattern of high densities on the diagonal, and
low densities elsewhere. © Thomas Plotkowiak 2010
- 19. Density for groups II
• Group 1 has dense in and out ties to one another and to the
other populations
• Group 2 have out-ties among themselves and with group 1
and have high densities of in-ties with all three sub populations
The density in the 1,1 block is .6667.That is, of
the six possible directed ties among actors 1, 3,
and 5, four are actually present
The extend of how those blocks characterize all the
individuals within those blocks can be assessed by
looking at the standard deviations. The standard
deviations measure the lack of homogeneity within
the partition, or the extent to which the actors vary.
© Thomas Plotkowiak 2010
- 20. E-I Index
• The E-I (external – internal) index takes the number of
ties of group members to outsiders, subtracts the number of
ties to other group members, and divides by the total number
of ties.
(1-4)/7 = -3/7 (1-2)/7 = -1/7
© Thomas Plotkowiak 2010
- 21. E-I Index II
• The resulting E-I index ranges from -1 (all ties internal) to +1
(all ties external). Ties between members of the same group
are ignored.
• The E-I index can be applied at three levels:
– entire population
– each group
– each individual
Notice: The relative size of sub populations (e.g. 10 vs. 1000) have dramatic
consequences for the degree of internal and external contacts, even when
individuals may choose contacts at random.
© Thomas Plotkowiak 2010
- 22. E-I Index for groups
Notice that the data has
been symmetrized
© Thomas Plotkowiak 2010
- 23. E-I Index for the entire population
Notice that the data has
been symmetrized
Internal: 7*2/64 = 21%
External 25*2/64 = 70%
E-I (50-14)/64 = 56%
© Thomas Plotkowiak 2010
- 24. Permutation Tests
To assess whether the E-I index value is significantly different that
what would be expected by random mixing a permutation test is
performed.
Notice: Under random distribution, the E-I Index would be expected to have a
value of .467 which is not much different from .563, especially given the standard
error .078 (given the result the difference of .10 could be just by chance)
© Thomas Plotkowiak 2010
- 25. E-I Index for individuals
Notice: Several actors (4,6,9) tend toward closure , while
others (10,1) tend toward creating ties outside their groups.
© Thomas Plotkowiak 2010
- 27. Positions & Roles
• Structural Equivalence
• Automorphic Equivalence
• Regular Equivalence
• Measuring similarity/dissimilarity
• Visualizing similarity and distance
• Measuring automorphic equivalence
• Measuring regular equivalence
• Blockmodelling
© Thomas Plotkowiak 2010
- 29. Positions and Roles
• Positions: Actors that show a similar structure of relationships
and are thus similarly embedded into the network.
• Roles: The pattern of relationships of members of same or
different positions.
• Note: Many of the category systems used by sociologists are
based on "attributes" of individual actors that are common
across actors.
© Thomas Plotkowiak 2010
- 30. Similarity
• The idea of "similarity" has to be rather precisely defined
• Nodes are similar if they fall in the same "equivalence class"
– We could come up with a equivalence class of out-degree of zero for
example
• There are three particular definitions of equivalence:
– Strucutral Equivalence
– Automorphic Equivalence (rarely used)
– Regular Equivalence
© Thomas Plotkowiak 2010
- 31. Strucutral Equivalence
• Structural Equivalence: Two structural equivalent actors could
exchange their positions in a network without changing their
connections to the other actors in the network.
• Structural equivalence is the "strongest" form of equivalence.
• Problem: Imagine two teachers in Toronto and St. Gallen.
Rather than looking for connections to exactly the same
persons we would like to find connection to similar persons
but not exactly the same ones.
© Thomas Plotkowiak 2010
- 32. Automorphic Equivalence
• Automorphic Equivalence: Two persons could change their
positions in the network, without changing the structure of
the network (Notice that after the exchange they would be
partially connected to other persons than before)
• Problem: How big do we have to define the radius in which
we analyze the structure of the network (1, 2, 3 … steps)
• For the One-Step Radius we consider the NUMBER of:
– asymetric outgoing,
– asymetric incoming,
– symetric in- and outgoing,
– and not existing ties.
© Thomas Plotkowiak 2010
- 33. 1 Step, 2 Step Equivalence
?
1
2
© Thomas Plotkowiak 2010
- 34. Regular Equivalence
• Regular Equivalence: Two positions are considered as similar,
if every important Aspect of the observed structure applies
(or does not apply)for both positions.
• For the One-Step Radius we consider the EXISTENCE of :
– asymetric outgoing,
– asymetric incoming,
– symetric in- and outgoing,
– and not existing ties.
© Thomas Plotkowiak 2010
- 35. 1 A
B and C are
B C regular equivalent
D E F G H
2 A 3 A
B and C are B and C are
B C B C
automorph structural
equivalent equivalent
D E F G H D E F G H
© Thomas Plotkowiak 2010
- 37. Measuring Similarity
Adjacency Matrix
1 Coun 2 Comm 3 Educ 4 Indu 5 Mayr 6 WRO 7 News 8 UWay 9 Welf 10 West
1 Coun --- 1 0 0 1 0 1 0 1 0
2 Comm 1 --- 1 1 1 0 1 1 1 0
3 Educ 0 1 --- 1 1 1 1 0 0 1
4 Indu 1 1 0 --- 1 0 1 0 0 0
5 Mayr 1 1 1 1 --- 0 1 1 1 1
6 WRO 0 0 1 0 0 --- 1 0 1 0
7 News 0 1 0 1 1 0 --- 0 0 0
8 UWay 1 1 0 1 1 0 1 --- 1 0
9 Welf 0 1 0 0 1 0 1 0 --- 0
10 West 1 1 1 0 1 0 1 0 0 ---
© Thomas Plotkowiak 2010
- 38. Measuring Similarity
Concatenated Row & Colum View
1 Coun 2 Comm 3 Educ 4 Indu 5 Mayr 6 WRO 7 News 8 UWay 9 Welf 10 West
--- 1 0 1 1 0 0 1 0 1
1 --- 1 1 1 0 1 1 1 1
0 1 --- 0 1 1 0 0 0 1
0 1 1 --- 1 0 1 1 0 0
1 1 1 1 --- 0 1 1 1 1
0 0 1 0 0 --- 0 0 0 0
1 1 1 1 1 1 --- 1 1 1
0 1 0 0 1 0 0 --- 0 0
1 1 0 0 1 1 0 1 --- 0
0 0 1 0 1 0 0 0 0 ---
--- 1 0 0 1 0 1 0 1 0
1 --- 1 1 1 0 1 1 1 0
0 1 --- 1 1 1 1 0 0 1
1 1 0 --- 1 0 1 0 0 0
1 1 1 1 --- 0 1 1 1 1
0 0 1 0 0 --- 1 0 1 0
0 1 0 1 1 0 --- 0 0 0
1 1 0 1 1 0 1 --- 1 0
0 1 0 0 1 0 1 0 --- 0
1 1 1 0 1 0 1 0 0 ---
© Thomas Plotkowiak 2010
- 39. Pearson correlation coefficients, covariances
and cross-products
• Person correlation (ranges from -1 to +1) summarize pair-
wise structural equivalence.
© Thomas Plotkowiak 2010
- 40. Pairwise Structural Equivalence
We can see, for example, that
9
node 1 and node 9 have
identical patterns of ties.
8
The Pearson correlation
measure does not pay
attention to the overall
7 prevalence of ties (the mean
6
of the row or column), and it
1 does not pay attention to
differences between actors in
5 3
the variances of their ties.
Often this is desirable to
10 focus only on the pattern,
2
4 rather than the mean and
variance as aspects of
similarity between actors.
© Thomas Plotkowiak 2010
- 41. Euclidean squared distances
Euclidean or squared Euclidean distances are not sensitive to the
linearity of association and can be used with valued or binary
data.
Other similar measures
can be Jaccard or
hamming distance.
© Thomas Plotkowiak 2010
- 42. Going from pairs to groups of structural
equivalence
It is often useful to examine the similarities or distances to try to
locate groupings of actors (that is, larger than a pair) who are
similar. By studying the bigger patterns of which groups of actors
are similar to which others, we may also gain some insight into
"what about" the actor's positions is most critical in making them
more similar or more distant.
In the next two sections we will cover how multi-dimensional
scaling and hierarchical cluster analysis can be used to identify
patterns in actor-by-actor similarity/distance matrices.
Both of these tools are widely used in non-network analysis; there are large and
excellent literatures on the many important complexities of using these methods. Our
goal here is just to provide just a very basic introduction.
© Thomas Plotkowiak 2010
- 43. Hierarchical Clustering
• Hierarchical Clustering:
– Initially places each case in its own cluster
– The two most similar cases are then combined
– This process is repeated until all cases are agglomerated into a single
cluster (once a case has been joined it is never re-classsified)
© Thomas Plotkowiak 2010
- 44. Multi Dimensional Scaling
• MDS represents the patterns of similarity or dissimilarity in
the profiles among the actors as a "map" in a multi-
dimensional space. This map lets us see how "close" actors are
and whether they "cluster".
– Stress is a measure of badness of fit
– The author has to determine the meaning of the dimensions
© Thomas Plotkowiak 2010
- 45. Finding automorphic equivalence
(for binary data)
• Brute Force Approach: All the nodes of a graph are
exchanged and the distances among all pairs of actors in the
new graph are compared to the original one. When the new
and the old graph have the same distances among nodes the
"swapping" that was done identified the automorphic position.
• Brute Force is expensive (363880 Permutations!!)
© Thomas Plotkowiak 2010
- 46. Regular Equivalence
Block Matrix
Informal Definition: Two actors are regularly equivalent if they
have similar patterns of ties to equivalent others.
Problem: Each definition of each position depends on its relations
with other positions. Where to start?
Sender
Repeater
Receiver
© Thomas Plotkowiak 2010
- 47. Regular Equivalence
Block Matrix Block Image
• Create a matrix so that each actor in each partition has the
same pattern of connection to actors in the other partition.
– Notice: We don’t care about ties among members of the same regular
class!
– A sends to {BCD} but none of {EFGHI}
– {BCD} does not send to A but to {EFGHI}
– {EFGHI} does not send to A or {BCD}
A B C D E F G H I
A --- 1 1 1 0 0 0 0 0
B 0 --- 0 0 1 1 0 0 0
C 0 0 --- 0 0 0 1 0 0 A B,C,D E,F,G,H,I
D 0 0 0 --- 0 0 0 1 1 A --- 1 0
E 0 0 0 0 --- 0 0 0 0
F 0 0 0 0 0 --- 0 0 0 B,C,D 0 --- 1
G 0 0 0 0 0 0 --- 0 0 E,F,G,H,I 0 0 ---
H 0 0 0 0 0 0 0 --- 0
I 0 0 0 0 0 0 0 0 ---
© Thomas Plotkowiak 2010
- 48. Algorithms for detection of Regular Equivalence
Tabu Search
• This method of blocking and relies on extensive use of the
computer. Tabu search is trying to implement the same idea of
grouping together actors who are most similar into a block.
• Tabu search does this by searching for sets of actors who, if
placed into a blocks, produce the smallest sum of within-block
variances in the tie profiles.
• If actors in a block have similar ties, their variance around the
block mean profile will be small.
• So, the partitioning that minimizes the sum of within block
variances is minimizing the overall variance in tie profiles
© Thomas Plotkowiak 2010
- 49. Algorithms for detection of Regular Equivalence
Tabu Search Results
9 (2,5) for example,
are pure "repeaters"
8
7
6
1
5 3
10
2
4
The set { 6, 10, 3 } send to only two other types (not all three
other types) and receive from only one other type. © Thomas Plotkowiak 2010
- 50. Blockmodeling
Blockmodeling is able to include all kinds of equivalences into one
analysis
Examples of blocks:
• Complete blocks (everybody is connected with each other
inside the block)
• Null blocks (people in this block are not connected to
anybody)
• Regular blocks, people share the same regular equivalence class
in this block
© Thomas Plotkowiak 2010
- 52. Blockmodels
Student Government. Discussion relation among the eleven students who were members of the student
government at the University of Ljubljana in Sloveninia. The students were asked to indicate with
whom of their fellows they discussed matters concerning the administration of the university
informally.
© Thomas Plotkowiak 2010
- 58. Cohesive Subgroups
Cohesive subgroups: We hypothesize that cohesive subgroups
are the basis for solidarity, shared norms, identity and
collective behavior. Perceived similarity, for instance,
membership of a social group, is expected to promote
interaction. We expect similar people to interact a lot, at least
more often than with dissimilar people.
© Thomas Plotkowiak 2010
- 59. Example – Families in Haciendas (1948)
Each arc represents "frequent visits" from one family to another.
© Thomas Plotkowiak 2010
- 60. Components
A semiwalk from vertex u to vertex v is a sequence of lines such
that the end vertex of one line is the starting vertex of the next
line and the sequence starts at vertex u and end at vertex v.
A walk is a semiwalk with the additional condition that none of its
lines are an arc of which the end vertex is the arc's tail
Note that v5 v3 v4 v5 v3
is also a walk to v3
© Thomas Plotkowiak 2010
- 61. Paths
A semipath is a semiwalk in which no vertex in between the first
and last vertex of the semiwalk occurs more than once.
A path is a walk in which no vertex in between the first and last
vertex of the walk occurs more than once.
© Thomas Plotkowiak 2010
- 62. Connectedness
A network is (weakly) connected if each pair of vertices is
connected by a semipath.
A network is strongly connected if each pair of vertices is
connected by a path.
This network is not connected
because v2 is isolated.
© Thomas Plotkowiak 2010
- 63. Connected Components
A (weak) component is a maximal (weakly) connected
subnetwork.
A strong component is a maximal strongly connected
subnetwork.
v1,v3,v4,v5 are a weak component v3,v4,v5 are a strong component
© Thomas Plotkowiak 2010
- 65. Cliques and Complete Subnetworks
A clique is a maximal complete subnetwork containing three
vertices or more. (cliques can overlap)
v2,v4,v5 is not a clique
v1,v6,v5 is a clique v2,v3,v4,v5 is a clique
© Thomas Plotkowiak 2010
- 66. n-Clique & n-Clan
n-Clique: Is a maximal complete subgraph, in the analyzed graph,
each node has maximally the distance n. A Clique is a n-Clique
with n=1.
n-Clan: Ist a maximal complete subgraph, where each node has
maximally the distance n in the resulting graph
2-Clique
2-Clan
© Thomas Plotkowiak 2010
- 67. n-Clans & n-Cliques
6 5
1 4
2 3
2-Clans: 123,234,345,456,561,612
2-Cliques: 123,234,345,456,561,612 and 135,246
© Thomas Plotkowiak 2010
- 68. k-Plexes
k-Plex: A k-Plex is a maximal complete subgraph with gs nodes, in
which each node has at least connections with gs-k nodes.
6 5
1 4
2 3
2-Plexe:s 1234, 2345, 3456, 4561, 5612, 6123
In general k-Plexes are more robust than Cliques und Clans.
© Thomas Plotkowiak 2010
- 69. Overview Subgroups
4 3 4 3 4 3
1 2 1 2 1 2
2 Components 1 Component 1 Component
2 2-Clans (341,412) 1 2-Clans (124)
2 2-Cliques (341,412) 1 2-Clique (124)
4 3 4 3 1 Component
1 Component 1 2-Clan (1234)
1 2-Clan (1234) 1 2-Clique (1234)
1 2-Clique (1234) 1 2-Plex (1234)
1 2-Plex (1234) 1 2 1 2 1 Clique
© Thomas Plotkowiak 2010
- 70. Overview Groupconcepts
• 1-Clique, 1-Clan und 1-Plex are identical
• A n-Clan is always included in a higher order n-Clique
Component
2-Clique
2-Clan
2-Plex
Clique
© Thomas Plotkowiak 2010
- 71. k-Cores
A •k-core is a maximal subnetwork in which each vertex has at
Net > Components > {Strong, Weak}
least degree k within the subnetwork.
© Thomas Plotkowiak 2010
- 72. k-Cores
k-cores are nested which means that a vertex in a 3-core is also
part of a 2-core but not all members of a 2-core belong to a 3-
core.
© Thomas Plotkowiak 2010
- 73. k-Cores Application
• K-cores help to detect cohesive subgroups by removing the
lowes k-cores from the network until the network breaks up
into relatively dense components.
• Net > Partitions > Core >{Input, Output, All}
© Thomas Plotkowiak 2010
- 76. Network Mechanisms
• Tie Outdegree Effect • In/Out Popularity Effect
• Reciprocity • In/Out Activity Effect
• Transitivity • In/Out Assortativity Effect
& Three-Cycles Effect • Covariate Similarity Effect
• Balance Effect • Covariate Ego-Effect
• Covariate Alter-Effect
• Same Covariate Effect
© Thomas Plotkowiak 2010
- 77. Outdegree Effect
• The most basic effect is defined by the outdegree of actor i. It
represents the basic tendency to have ties at all,
• In a decision-theoretic approach this effect can be regarded as
the balance of benefits and costs of an arbitrary tie.
• Most networks are sparse (i.e., they have a density well below 0.5)
which can be represented by saying that for a tie to an arbitrary other
actor – arbitrary meaning here that the other actor has no
characteristics or tie pattern making him/her especially attractive to i –,
the costs will usually outweigh the benefits. Indeed, in most cases a
negative parameter is obtained for the outdegree effect.
© Thomas Plotkowiak 2010
- 78. Reciprocity Effect
• Another quite basic effect is the tendency toward reciprocity,
represented by the number of reciprocated ties of actor i. This
is a basic feature of most social networks (cf. Wasserman and
Faust, 1994, Chapter 13)
i j
© Thomas Plotkowiak 2010
- 79. Transitivity and other triadic effects
• Next to reciprocity, an essential feature in most social
networks is the tendency toward transitivity, or transitive
closure (sometimes called clustering): friends of friends
become friends, or in graph-theoretic terminology: two-paths
tend to be, or to become, closed (e.g., Davis 1970, Holland
and Leinhardt 1971).
j j
i i
h h
Transitive triplet Three cycle
© Thomas Plotkowiak 2010
- 80. Balance Effect
• An effect closely related to transitivity is balance (Newcomb,
1962), which is the same as structural equivalence with
respect to out-ties (Burt, 1982), is the tendency to have and
create ties to other actors who make the same choices as
ego.
A D
B C
© Thomas Plotkowiak 2010
- 81. In/Out Popularity Effect
• The degree-related popularity effect is based on indegree or
outdegree of an actor. Nodes with higher indegree, or higher
outdegree, are more attractive for others to send a tie to.
• That implies that high indegrees reinforce themselves, which
will lead to a relatively high dispersion of the indegrees (a
Matthew effect in popularity as measured by indegrees, cf.
Merton, 1968 and Price, 1976).
A
B C D
© Thomas Plotkowiak 2010
- 82. In/Out Activity Effect
• Nodes with higher indegree, or higher outdegree respectively,
will have an extra propensity to form ties to others.
• The outdegree-related activity effect again is a self-reinforcing
effect: when it has a positive parameter, the dispersion of
outdegrees will tend to increase over time, or to be sustained
if it already is high.
A
B C D
© Thomas Plotkowiak 2010
- 83. Preferential Attachment
• Notice: These four degree-related effects can be regarded as
the analogues in the case of directed relations of what was
called cumulative advantage by Price (1976) and preferential
attachment by Barabasi and Albert (1999) in their models for
dynamics of non-directed networks: a self-reinforcing process
of degree differentiation.
© Thomas Plotkowiak 2010
- 84. In/Out Assortativity Effect
• Preferences of actors dependent on their degrees. Depending
on their own out- and in-degrees, actors can have differential
preferences for ties to others with also high or low out- and
in-degrees (Morris and Kretzschmar 1995; Newman 2002)
A D
B C E F
© Thomas Plotkowiak 2010
- 85. Covariate Similarity Effect
• The covariate similarity effect, describes whether ties tend to
occur more often between actors with similar values on a
value (homophily effect). Tendencies to homophily constitute
a fundamental characteristic of many social relations, see
McPherson, Smith-Lovin, and Cook (2001).
• Example: Ipad Owners tend to be friends with other Ipad
owners.
© Thomas Plotkowiak 2010
- 86. Covariate Ego Effect
• The covariate ego effect, describes that actors with higher
values on a covariate tend to nominate more friends and
hence have a higher outdegree.
• Example: Heavier smokers have more friends.
© Thomas Plotkowiak 2010
- 87. Covariate Alter Effect
• The alter effect describes whether actors with higher V values
will tend to be nominated by more others and hence have
higher indegrees.
• Example: Beautiful people have more friends.
© Thomas Plotkowiak 2010
- 88. Modeling networks
1. Actor Based modeling for longitudonal data
– SIENA (analysis of repeated measures on social networks and MCMC-
estimation of exponential random graphs)
2. Stochastic modeling for panels
– Pnet
objective function Model 1 Model 2 Model3
esti
estim s.e. p estim s.e p s.e. p
m
outdegree (density) -2,46 0,12 <0,0001* -4,04 0,23 <0,0001* -1,99 0,13 <0,0001*
reciprocity 2,57 0,20 <0,0001* 2,29 0,22 <0,0001* 3,02 0,21 <0,0001*
transitive triplets 0,07 0,01 <0,0001*
transitive mediated triplets -0,03 0,01 0,0005*
transitive ties 1,47 0,24 <0,0001*
3-cycles -0,06 0,02 0,0037*
attribute party 1,13 0,15 <0,0001* 0,73 0,15 <0,0001*
attribute gender -0,11 0,15 0.48
© Thomas Plotkowiak 2010
- 91. Homophily
• Homophily (i.e., love of the same) is the tendency of
individuals to associate and bond with similar others.
(Mechanisms of selection vs influence)
• In the study of networks, assortative mixing is a bias in favor of
connections between network nodes with similar characteristics. In the
specific case of social networks, assortative mixing is also known as
homophily. The rarer disassortative mixing is a bias in favor of connections
between dissimilar nodes.
Low Homophily High Homophily
© Thomas Plotkowiak 2010
- 92. Homophily II
Types (acc. to McPherson et. Al 2001):
– Race and Ethnicity (Marsden 1987, 88| Louch 2000, Kalleberg et al
1996, Laumann 1973…)
– Sex and Gender (Maccoby 1998, Eder & Hallinan 1978, Shrum et al
1988, Huckfeldt & Sprague 1995, Brass 1985 …)
– Age (Fischer 1977,82, Feld 1982, Blau et Al 1991, Burt 1990,91…)
– Religion (Laumann 1973, Verbrugge 1977, Fischer 1977,82, Marsden
1988, Louch 2000…)
– Education, Occupation and Social Class (Laumann 1973, Marsden 1987,
Verbrugge 1977, Wright 1997, Kalmijn 1998…)
– Network Positions (Brass 1985, Burt 1982, Friedkin 1993…)
– Behavior (Cohen 1977, Kandel 1978, Knocke 1990…)
– Attitudes, Abilities, Beliefs and Aspirations (Jussim & Osgood 1989,
Huckfeldt & Sprague 1995, Verbrugge 1977,83, Knocke 1990)
© Thomas Plotkowiak 2010
- 95. Power Law distribution
• As a function of k, what fraction of pages on the Web have k
in-links?
• A natural guess the normal, or Gaussian, distribution
• Central Limit Theorem (roughly): if we take any sequence of
small independent random quantities, then in the limit their sum
will be distributed according to the normal distribution
© Thomas Plotkowiak 2010
- 96. Power Law distribution
But when people measured the Web, they found something
very different: The fraction of Web pages that have k in-links is approximately
proportional to 1/k^2
• Power law function
• Popularity exhibits extreme imbalances: there are few very popular Web
pages that have extremely many in-links
True for other domains:
• the fraction of telephone numbers that receive k calls per day: 1/k^2
• the fraction of books bought by k people: 1/k^3
• the fraction of scientific papers that receive k citations: 1/k^3
© Thomas Plotkowiak 2010
- 97. Preferential attachment leads to power laws
• A preferential attachment process is any of a class of
processes in which some quantity, typically some form of
wealth or credit, is distributed among a number of individuals
or objects according to how much they already have, so that
those who are already wealthy receive more than those who
are not. Notice: "Preferential attachment" (A.L. Barabasi and
R.Albert 1999) is only the most recent of many names that
have been given to such processes.
• Notice: Preferential attachment can, under suitable
circumstances, generate power law distributions.
© Thomas Plotkowiak 2010
- 100. Balance Theory
Franz Heider
Franz Heider (1940): A person (P) feels uncomfortable whe he
ore she disagrees with his ore her friend(O) on a topic (X).
P feels an urge to change this imbalance. He can adjust his
opinion, change his affection for O, or convince himself that O is
not really opposed to X.
© Thomas Plotkowiak 2010
- 101. Balance Theory
(a) + + + : three people are
mutual friends
(c) - - + : two people are friends,
and they have mutual enemy in the
third
(b) + + - : A is a friend with B and
C; but B and C – enemies
(d) - - - : all enemies; motivates two
of them to “team up” against the
third
b and d represent unstable
relationship
© Thomas Plotkowiak 2010
- 102. Balance Theory
Community in a New England Monastery
Young Turks (1), Loyal Opposition (2), Outcasts (3) Interstitial Group (4)
© Thomas Plotkowiak 2010
- 105. Strength of Weak Ties
Mark Granovetter
• “One of the most influential sociology papers ever
written” (Barabasi)
– One of the most cited (Current Contents, 1986)
• Accepted by the American Journal of Sociology after
4 years of unsuccessful attempts elsewhere.
• Interviewed people and asked: “How did you find
your job?”
– Kept getting the same answer: “through an acquaintance,
not a friend”
© Thomas Plotkowiak 2010
- 106. Basic Argument
• Classify interpersonal relations as “strong”, “weak”, or “absent”
• Strength is (vaguely) defined as “a (probably linear)
combination of…
– the amount of time,
– the emotional intensity,
– the intimacy (mutual confiding),
– and the reciprocal services which characterize the tie
• The stronger the tie between two individuals, the larger the
proportion of people to which they are both tied (weakly or
strongly)
© Thomas Plotkowiak 2010
- 107. Strong Ties
• If person A has a strong tie to both B and C, then it is unlikely
for B and C not to share a tie.
A
B C
© Thomas Plotkowiak 2010
- 108. Weak Ties for Information Diffusion
„Intuitively speaking, this means that
whatever is to be diffused can reach a
larger number of people, and traverse
greater social distance, when passed
through weak ties rather than strong.“
© Thomas Plotkowiak 2010
- 110. Connectivity and the Small World
1. Travers and Milgram’s work on the small world is responsible
for the standard belief that “everyone is connected by a chain
of about 6 steps.”
2. Two questions:
– Given what we know about networks, what is the longest path (defined
by handshakes) that separates any two people?
– Is 6 steps a long distance or a short distance?
© Thomas Plotkowiak 2010
- 111. Example: Two Hermits on opposite sites of the
country
OH Store
Hermit Owner
Truck
Manager
Driver
Corporate Corporate
Manager President
Congress Congress
Rep. Rep.
Corporate Corporate
President Manager
Truck
Manager
Driver
Store Mt.
Owner Hermit
© Thomas Plotkowiak 2010
- 112. Milgrams Test
Milgram’s test: Send a packet from sets of randomly selected
people to a stockbroker in Boston.
Experimental Setup: Arbitrarily select people from 3 pools:
– People in Boston
– Random in Nebraska
– Stockholders in Nebraska
© Thomas Plotkowiak 2010
- 113. Results
• Most chains found their
way through a small
number of
intermediaries.
• What do these two
findings tell us of the
global structure of social
relations?
© Thomas Plotkowiak 2010
- 114. Results II
1. Social networks contains a lot of short paths
2. People acting without any sort of global ‘map’ are effective at
collectively finding these short paths
© Thomas Plotkowiak 2010
- 115. The Watts-Strogatz model
• Two main principles explaining short paths: homophily and
weak ties:
• Homophily: every node forms a link to all other nodes that lie within a
radius of r grid steps
• Weak ties: each nodes forms a link to k other random nodes
• Suppose, everyone lives on a two-dimensional grid (as a
model of geographic proximity)
© Thomas Plotkowiak 2010
- 117. The Watts-Strogatz model
• Suppose, we only allow one out of k nodes to a to have a
single random friend
• k * k square has k random links - consider it as a single node
• Surprising small amount of randomness is enough to make
the world “small” with short paths between every pair of
nodes © Thomas Plotkowiak 2010
- 118. Decentralized Search
• People are able to collectively find short paths to the
designated target while they don’t know the global ‘map’ of all
connections
• Breadth-first search vs. tunneling
• Modeling:
– Can we construct a network where decentralized search succeeds?
– If yes, what are the qualitative properties of such a network?
© Thomas Plotkowiak 2010
- 119. A model for decentralized search
• A starting node s is given a message that it must forward to a
target node t
• s knows only the location of t on the grid, but s doesn’t know
the edges out of any other node
• Model must span all the intermediate ranges of scale as well
© Thomas Plotkowiak 2010
- 120. Modeling the process of decentralized search
• We adapt the model by introducing clustering exponent q
• For two nodes v and w, d(v,w) - the number of steps between them
• Random edges now generated with probability proportional
to d(v,w)-q
• Model changes with different values q:
– q=0 : links are chosen uniformly at random
– when q is very small : long-range links are “too random”
– when q is large: long-range links are “not random enough”
© Thomas Plotkowiak 2010
- 122. Decentralized Search when q=2
Experiments show that decentralized search is more efficient
when q=2 (random links follow inverse-square distribution)
© Thomas Plotkowiak 2010
- 123. What’s special about q=2
• Since area in the plane grows like the square of the radius, the
total number of nodes in this group is proportional to d2
• the probability that a random edge links into some node in
this ring is approximately independent of the value of d.
• long-range weak ties are being formed in a way that’s spread
roughly uniformly over all different scales of resolution
Think of the postal
system: country, state,
city, street, and finally
the street number
© Thomas Plotkowiak 2010
- 124. Small-World Phenomenon
Conclusions I
1. Start from a Milgram’s experiment: (1) seems there are short
paths and (2) people know how to find them effectively
2. Build mathematical models for (1) and (2)
3. Make a prediction based on the models: clustering exponent
q=2
4. Validate this prediction using real data from large social
networks (LiveJournal, Facebook)
Why do social networks arrange themselves in a pattern of
friendships across distance that is close to optimal for forwarding
messages to far-off targets?
© Thomas Plotkowiak 2010
- 125. Small-World Phenomenon
Conclusions II
• If there are dynamic forces or selective pressures driving the
network toward this shape, they must be more implicit, and it
remains a fascinating open problem to determine whether
such forces exist and how they might operate.
• Robustness, Search, Spread of disease, opinion formation,
spread of computer viruses, gossip,…
• For example: Diseases move more slowly in highly clustered
graphs
• The dynamics are very non-linear -- with no clear pattern
based on local connectivity.
Implication: small local changes (shortcuts) can have dramatic
global outcomes (think of disease diffusion)
© Thomas Plotkowiak 2010
- 126. Small World Construction
• Network changes from structured to random
• Given 6 Billion Nodes L starts at 3 million, decreases to 4 (!)
• Clustering: starts at 0.75, decreases to zero
• Most important is what happens ALONG the way.
© Thomas Plotkowiak 2010
- 128. Interactive Summary
The biggest advantage I can gain by using SNA is…
The most important fact about SNA for me is…
The concept that made the most sense for me today was…
The biggest danger in using SNA is …
If I will use SNA in the future, I will try to make sure that…
If I use SNA in my next project I will use it for …
I should change my perspective on networks in considering …
I have changed my opinion about SNA , finding out that…
I missed today that …
Before attending that seminar I didn't know that …
I wish we could have covered…
If I forget mostly everything that learned today, I will still remember …
The most important thing today for me was …
© Thomas Plotkowiak 2010