Social network analysis part ii

Social Network Analysis

Fundamental Concepts in Social Network
Analysis (Part 2)

Katarina Stanoevska-Slabeva, Miriam Meckel, Thomas Plotkowiak

Agenda

1. Intro
2. Measuring Networks
– Embedding Measures (Ties)
– Positions and Roles (Nodes)
– Group Concepts

3. Network Mechanisms
4. Network Theories

© Thomas Plotkowiak 2010

Introduction
Knoke information exchange network

In 1978, Knoke & Wood
collected data from
workers at 95 organizations in
Indianapolis. Respondents
indicated with which other
organizations their own
organization had any of 13
different types of relationships.

The exchange of information
among ten organizations that
were involved in the local
political economy of social
welfare services in a Midwestern
city.


2. Network Measures
2.1 Network Measures for Actors
Embedding Measures

Embedding Measures

• Reciprocity (Dyad Census)
• Transitivity (Triad Census)
• Clustering
• Density
• Group-external and group-internal Ties
• Other Network Mechanisms


Reciprocity

• With symmetric data two actors are either connected or not.
• With directed data there are four possible dyadic
relationships:
– A and B are not connected
– A sends to B
– B sends to A
– A and B send to each other.


Reciprocity II

• What is the reciprocity in this network?
– Answer 1: % of pairs that have reciprocated ties / all possible pairs
• AB of {AB,AC,BC} = 0.33
– Answer2: % of pairs that have reciprocated ties / existing pairs
• AB of {AB,BC} = 0.5
– Answer 3: % directed ties / all directed ties
• {AB,BA} of {AB, BA, AC, CA, BC, CA} = 0.33


Transitivity

• With undirected there are four possible types of triadic relations
– No ties
– One tie
– Two Ties
– Three Ties
• The count of the relative prevalence of these four types of relations is
called "triad census“. A population can be characterized by:
– "isolation"
– "couples only"
– "structural holes" (one actors is connected to two others, who are not
connected to each other)
– or "clusters"


Transitivity II
Directed Networks

M-A-N number:
M # of mutual positive dyads
A #asymmetric dyads
N #of null dyads

D =Down, U = Up, C = Cyclic, T= Transitive

Triad Census Models

(all) (all)

Linear Hierarchy Model
Every triad is 030T (all)

(all) (all)

Balance Model with Two Cliques
(Heider Balance)
Triads either 300 or 102 Ranked Clusters Model (Hierarchy of Cliques)
Triads: 300, 102, 003, 120D, 120U, 030T, 021D, 021U


Example
Directed information exchange network
9

8

7
6
1

5 3

10
2
4
The exchange of information among ten organizations that were involved in the local
political economy of social welfare services in a Midwestern city.

A
Transitivity III 1 3

B C
2
• How to measure transitivity?
– A) Divide the number of found transitive triads by the total number of
possible triplets (for 3 nodes there are 6 possibilities)
– B) Norm the number of transitive triads by the number of cases where
a single link could complete the triad.
Norm {AB, BC, AC} by {AB, BC, anything)
(for 3 nodes there are 4 possibilities)


Transitivity IV

146/720

146/217


Clustering

Most actors live in local neighborhoods and are connected to one
another. A large proportion of the total number of ties is highly
"clustered" into local neighborhoods.

VS.


Global clustering coefficient

Closed triplet Triplet


Average Local Clustering coefficient

A measure to calculate how clustered the graph is we examine the local
neighborhood of an actor (all actors who are directly connected to ego) and
calculate the density in this neighborhood (leaving out the ego). After doing
this for all actors, we can characterize the degree of clustering as an average of
all the neighborhoods.

C=1 C = 1/3 C=0


Individual local clustering coefficient
(in this case for directed ties)
Clustering can also be examined for each actor:
– Notice actor 6 has three neighbors and hence only 3 possible ties. Of
these only one is present, so actor 6 is not highly clustered.
– Actor eight has 6 neighbors and hence 15 pairs of neighbors and is
highly clustered.

2 edges out of 6
edges


Density for groups

Instead of calculating the density of the whole network (last
lecture), we can calculate the density of partitions of the network.

Governmental agencies
Non-governmental generalist
Welfare specialists

A social structure in which individuals were highly clustered
would display a pattern of high densities on the diagonal, and
low densities elsewhere. © Thomas Plotkowiak 2010

Density for groups II

• Group 1 has dense in and out ties to one another and to the
other populations
• Group 2 have out-ties among themselves and with group 1
and have high densities of in-ties with all three sub populations

The density in the 1,1 block is .6667.That is, of
the six possible directed ties among actors 1, 3,
and 5, four are actually present

The extend of how those blocks characterize all the
individuals within those blocks can be assessed by
looking at the standard deviations. The standard
deviations measure the lack of homogeneity within
the partition, or the extent to which the actors vary.


E-I Index

• The E-I (external – internal) index takes the number of
ties of group members to outsiders, subtracts the number of
ties to other group members, and divides by the total number
of ties.

(1-4)/7 = -3/7 (1-2)/7 = -1/7


E-I Index II

• The resulting E-I index ranges from -1 (all ties internal) to +1
(all ties external). Ties between members of the same group
are ignored.
• The E-I index can be applied at three levels:
– entire population
– each group
– each individual

Notice: The relative size of sub populations (e.g. 10 vs. 1000) have dramatic
consequences for the degree of internal and external contacts, even when
individuals may choose contacts at random.


E-I Index for groups

Notice that the data has
been symmetrized


E-I Index for the entire population

Notice that the data has
been symmetrized

Internal: 7*2/64 = 21%
External 25*2/64 = 70%
E-I (50-14)/64 = 56%


Permutation Tests

To assess whether the E-I index value is significantly different that
what would be expected by random mixing a permutation test is
performed.

Notice: Under random distribution, the E-I Index would be expected to have a
value of .467 which is not much different from .563, especially given the standard
error .078 (given the result the difference of .10 could be just by chance)


E-I Index for individuals

Notice: Several actors (4,6,9) tend toward closure , while
others (10,1) tend toward creating ties outside their groups.


2. Network Measures
2.2 Network Measures for Actors
Position & Roles

Positions & Roles

• Structural Equivalence
• Automorphic Equivalence
• Regular Equivalence

• Measuring similarity/dissimilarity
• Visualizing similarity and distance
• Measuring automorphic equivalence
• Measuring regular equivalence

• Blockmodelling


Chinese Kinship Relations


Positions and Roles

• Positions: Actors that show a similar structure of relationships
and are thus similarly embedded into the network.
• Roles: The pattern of relationships of members of same or
different positions.

• Note: Many of the category systems used by sociologists are
based on "attributes" of individual actors that are common
across actors.


Similarity

• The idea of "similarity" has to be rather precisely defined
• Nodes are similar if they fall in the same "equivalence class"
– We could come up with a equivalence class of out-degree of zero for
example

• There are three particular definitions of equivalence:
– Strucutral Equivalence
– Automorphic Equivalence (rarely used)
– Regular Equivalence


Strucutral Equivalence

• Structural Equivalence: Two structural equivalent actors could
exchange their positions in a network without changing their
connections to the other actors in the network.

• Structural equivalence is the "strongest" form of equivalence.

• Problem: Imagine two teachers in Toronto and St. Gallen.
Rather than looking for connections to exactly the same
persons we would like to find connection to similar persons
but not exactly the same ones.


Automorphic Equivalence

• Automorphic Equivalence: Two persons could change their
positions in the network, without changing the structure of
the network (Notice that after the exchange they would be
partially connected to other persons than before)

• Problem: How big do we have to define the radius in which
we analyze the structure of the network (1, 2, 3 … steps)
• For the One-Step Radius we consider the NUMBER of:
– asymetric outgoing,
– asymetric incoming,
– symetric in- and outgoing,
– and not existing ties.


1 Step, 2 Step Equivalence

?

1

2


Regular Equivalence

• Regular Equivalence: Two positions are considered as similar,
if every important Aspect of the observed structure applies
(or does not apply)for both positions.
• For the One-Step Radius we consider the EXISTENCE of :
– asymetric outgoing,
– asymetric incoming,
– symetric in- and outgoing,
– and not existing ties.


1 A

B and C are
B C regular equivalent

D E F G H

2 A 3 A

B and C are B and C are
B C B C
automorph structural
equivalent equivalent

D E F G H D E F G H


Computing Positional Similarity
Example Information exchange network


Measuring Similarity
Adjacency Matrix

1 Coun 2 Comm 3 Educ 4 Indu 5 Mayr 6 WRO 7 News 8 UWay 9 Welf 10 West
1 Coun --- 1 0 0 1 0 1 0 1 0
2 Comm 1 --- 1 1 1 0 1 1 1 0
3 Educ 0 1 --- 1 1 1 1 0 0 1
4 Indu 1 1 0 --- 1 0 1 0 0 0
5 Mayr 1 1 1 1 --- 0 1 1 1 1
6 WRO 0 0 1 0 0 --- 1 0 1 0
7 News 0 1 0 1 1 0 --- 0 0 0
8 UWay 1 1 0 1 1 0 1 --- 1 0
9 Welf 0 1 0 0 1 0 1 0 --- 0
10 West 1 1 1 0 1 0 1 0 0 ---


Measuring Similarity
Concatenated Row & Colum View
1 Coun 2 Comm 3 Educ 4 Indu 5 Mayr 6 WRO 7 News 8 UWay 9 Welf 10 West
--- 1 0 1 1 0 0 1 0 1
1 --- 1 1 1 0 1 1 1 1
0 1 --- 0 1 1 0 0 0 1
0 1 1 --- 1 0 1 1 0 0
1 1 1 1 --- 0 1 1 1 1
0 0 1 0 0 --- 0 0 0 0
1 1 1 1 1 1 --- 1 1 1
0 1 0 0 1 0 0 --- 0 0
1 1 0 0 1 1 0 1 --- 0
0 0 1 0 1 0 0 0 0 ---
--- 1 0 0 1 0 1 0 1 0
1 --- 1 1 1 0 1 1 1 0
0 1 --- 1 1 1 1 0 0 1
1 1 0 --- 1 0 1 0 0 0
1 1 1 1 --- 0 1 1 1 1
0 0 1 0 0 --- 1 0 1 0
0 1 0 1 1 0 --- 0 0 0
1 1 0 1 1 0 1 --- 1 0
0 1 0 0 1 0 1 0 --- 0
1 1 1 0 1 0 1 0 0 ---


Pearson correlation coefficients, covariances
and cross-products
• Person correlation (ranges from -1 to +1) summarize pair-
wise structural equivalence.


Pairwise Structural Equivalence
We can see, for example, that
9
node 1 and node 9 have
identical patterns of ties.

8
The Pearson correlation
measure does not pay
attention to the overall
7 prevalence of ties (the mean
6
of the row or column), and it
1 does not pay attention to
differences between actors in
5 3
the variances of their ties.

Often this is desirable to
10 focus only on the pattern,
2
4 rather than the mean and
variance as aspects of
similarity between actors.


Euclidean squared distances

Euclidean or squared Euclidean distances are not sensitive to the
linearity of association and can be used with valued or binary
data.

Other similar measures
can be Jaccard or
hamming distance.


Going from pairs to groups of structural
equivalence
It is often useful to examine the similarities or distances to try to
locate groupings of actors (that is, larger than a pair) who are
similar. By studying the bigger patterns of which groups of actors
are similar to which others, we may also gain some insight into
"what about" the actor's positions is most critical in making them
more similar or more distant.

In the next two sections we will cover how multi-dimensional
scaling and hierarchical cluster analysis can be used to identify
patterns in actor-by-actor similarity/distance matrices.

Both of these tools are widely used in non-network analysis; there are large and
excellent literatures on the many important complexities of using these methods. Our
goal here is just to provide just a very basic introduction.


Hierarchical Clustering

• Hierarchical Clustering:
– Initially places each case in its own cluster
– The two most similar cases are then combined
– This process is repeated until all cases are agglomerated into a single
cluster (once a case has been joined it is never re-classsified)


Multi Dimensional Scaling

• MDS represents the patterns of similarity or dissimilarity in
the profiles among the actors as a "map" in a multi-
dimensional space. This map lets us see how "close" actors are
and whether they "cluster".
– Stress is a measure of badness of fit
– The author has to determine the meaning of the dimensions


Finding automorphic equivalence
(for binary data)
• Brute Force Approach: All the nodes of a graph are
exchanged and the distances among all pairs of actors in the
new graph are compared to the original one. When the new
and the old graph have the same distances among nodes the
"swapping" that was done identified the automorphic position.
• Brute Force is expensive (363880 Permutations!!)


Regular Equivalence
Block Matrix
Informal Definition: Two actors are regularly equivalent if they
have similar patterns of ties to equivalent others.

Problem: Each definition of each position depends on its relations
with other positions. Where to start?

Sender

Repeater

Receiver


Regular Equivalence
Block Matrix  Block Image
• Create a matrix so that each actor in each partition has the
same pattern of connection to actors in the other partition.
– Notice: We don’t care about ties among members of the same regular
class!
– A sends to {BCD} but none of {EFGHI}
– {BCD} does not send to A but to {EFGHI}
– {EFGHI} does not send to A or {BCD}
A B C D E F G H I
A --- 1 1 1 0 0 0 0 0
B 0 --- 0 0 1 1 0 0 0
C 0 0 --- 0 0 0 1 0 0 A B,C,D E,F,G,H,I
D 0 0 0 --- 0 0 0 1 1 A --- 1 0
E 0 0 0 0 --- 0 0 0 0
F 0 0 0 0 0 --- 0 0 0 B,C,D 0 --- 1
G 0 0 0 0 0 0 --- 0 0 E,F,G,H,I 0 0 ---
H 0 0 0 0 0 0 0 --- 0
I 0 0 0 0 0 0 0 0 ---

Algorithms for detection of Regular Equivalence
Tabu Search
• This method of blocking and relies on extensive use of the
computer. Tabu search is trying to implement the same idea of
grouping together actors who are most similar into a block.
• Tabu search does this by searching for sets of actors who, if
placed into a blocks, produce the smallest sum of within-block
variances in the tie profiles.
• If actors in a block have similar ties, their variance around the
block mean profile will be small.
• So, the partitioning that minimizes the sum of within block
variances is minimizing the overall variance in tie profiles


Algorithms for detection of Regular Equivalence
Tabu Search Results

9 (2,5) for example,
are pure "repeaters"

8

7
6
1

5 3

10
2
4
The set { 6, 10, 3 } send to only two other types (not all three
other types) and receive from only one other type. © Thomas Plotkowiak 2010

Blockmodeling

Blockmodeling is able to include all kinds of equivalences into one
analysis

Examples of blocks:
• Complete blocks (everybody is connected with each other
inside the block)
• Null blocks (people in this block are not connected to
anybody)
• Regular blocks, people share the same regular equivalence class
in this block


Blockmodels
Matrix Permutation


Blockmodels

Student Government. Discussion relation among the eleven students who were members of the student
government at the University of Ljubljana in Sloveninia. The students were asked to indicate with
whom of their fellows they discussed matters concerning the administration of the university
informally.

General Blockmodelling with predefined
partitions


Blockmodeling based on actors-attributes


Blockmodels
Matrix Representation


2. Network Measures
2.2 Network Measures Subgroups
Cohesive Subgroups

Cohesive Subgroups

Cohesive subgroups: We hypothesize that cohesive subgroups
are the basis for solidarity, shared norms, identity and
collective behavior. Perceived similarity, for instance,
membership of a social group, is expected to promote
interaction. We expect similar people to interact a lot, at least
more often than with dissimilar people.


Example – Families in Haciendas (1948)

Each arc represents "frequent visits" from one family to another.

Components
A semiwalk from vertex u to vertex v is a sequence of lines such
that the end vertex of one line is the starting vertex of the next
line and the sequence starts at vertex u and end at vertex v.

A walk is a semiwalk with the additional condition that none of its
lines are an arc of which the end vertex is the arc's tail

Note that v5 v3  v4 v5 v3
is also a walk to v3

Paths

A semipath is a semiwalk in which no vertex in between the first
and last vertex of the semiwalk occurs more than once.

A path is a walk in which no vertex in between the first and last
vertex of the walk occurs more than once.


Connectedness

A network is (weakly) connected if each pair of vertices is
connected by a semipath.

A network is strongly connected if each pair of vertices is
connected by a path.

This network is not connected
because v2 is isolated.

Connected Components

A (weak) component is a maximal (weakly) connected
subnetwork.

A strong component is a maximal strongly connected
subnetwork.

v1,v3,v4,v5 are a weak component v3,v4,v5 are a strong component


Example Strong Components

1. Net > Components > {Strong, Weak}


Cliques and Complete Subnetworks

A clique is a maximal complete subnetwork containing three
vertices or more. (cliques can overlap)

v2,v4,v5 is not a clique

v1,v6,v5 is a clique v2,v3,v4,v5 is a clique

n-Clique & n-Clan
n-Clique: Is a maximal complete subgraph, in the analyzed graph,
each node has maximally the distance n. A Clique is a n-Clique
with n=1.

n-Clan: Ist a maximal complete subgraph, where each node has
maximally the distance n in the resulting graph

2-Clique

2-Clan


n-Clans & n-Cliques

6 5

1 4

2 3

2-Clans: 123,234,345,456,561,612
2-Cliques: 123,234,345,456,561,612 and 135,246


k-Plexes

k-Plex: A k-Plex is a maximal complete subgraph with gs nodes, in
which each node has at least connections with gs-k nodes.

6 5

1 4

2 3

2-Plexe:s 1234, 2345, 3456, 4561, 5612, 6123
In general k-Plexes are more robust than Cliques und Clans.


Overview Subgroups

4 3 4 3 4 3

1 2 1 2 1 2

2 Components 1 Component 1 Component
2 2-Clans (341,412) 1 2-Clans (124)
2 2-Cliques (341,412) 1 2-Clique (124)

4 3 4 3 1 Component
1 Component 1 2-Clan (1234)
1 2-Clan (1234) 1 2-Clique (1234)
1 2-Clique (1234) 1 2-Plex (1234)
1 2-Plex (1234) 1 2 1 2 1 Clique


Overview Groupconcepts

• 1-Clique, 1-Clan und 1-Plex are identical
• A n-Clan is always included in a higher order n-Clique

Component

2-Clique
2-Clan
2-Plex

Clique


k-Cores

A •k-core is a maximal subnetwork in which each vertex has at
Net > Components > {Strong, Weak}
least degree k within the subnetwork.


k-Cores
k-cores are nested which means that a vertex in a 3-core is also
part of a 2-core but not all members of a 2-core belong to a 3-
core.


k-Cores Application

• K-cores help to detect cohesive subgroups by removing the
lowes k-cores from the network until the network breaks up
into relatively dense components.
• Net > Partitions > Core >{Input, Output, All}


Network Mechanisms

• Tie Outdegree Effect • In/Out Popularity Effect

• Reciprocity • In/Out Activity Effect

• Transitivity • In/Out Assortativity Effect
& Three-Cycles Effect • Covariate Similarity Effect
• Balance Effect • Covariate Ego-Effect

• Covariate Alter-Effect

• Same Covariate Effect


Outdegree Effect

• The most basic effect is defined by the outdegree of actor i. It
represents the basic tendency to have ties at all,

• In a decision-theoretic approach this effect can be regarded as
the balance of benefits and costs of an arbitrary tie.
• Most networks are sparse (i.e., they have a density well below 0.5)
which can be represented by saying that for a tie to an arbitrary other
actor – arbitrary meaning here that the other actor has no
characteristics or tie pattern making him/her especially attractive to i –,
the costs will usually outweigh the benefits. Indeed, in most cases a
negative parameter is obtained for the outdegree effect.


Reciprocity Effect

• Another quite basic effect is the tendency toward reciprocity,
represented by the number of reciprocated ties of actor i. This
is a basic feature of most social networks (cf. Wasserman and
Faust, 1994, Chapter 13)

i j


Transitivity and other triadic effects

• Next to reciprocity, an essential feature in most social
networks is the tendency toward transitivity, or transitive
closure (sometimes called clustering): friends of friends
become friends, or in graph-theoretic terminology: two-paths
tend to be, or to become, closed (e.g., Davis 1970, Holland
and Leinhardt 1971).

j j

i i

h h

Transitive triplet Three cycle

Balance Effect

• An effect closely related to transitivity is balance (Newcomb,
1962), which is the same as structural equivalence with
respect to out-ties (Burt, 1982), is the tendency to have and
create ties to other actors who make the same choices as
ego.

A D

B C


In/Out Popularity Effect

• The degree-related popularity effect is based on indegree or
outdegree of an actor. Nodes with higher indegree, or higher
outdegree, are more attractive for others to send a tie to.
• That implies that high indegrees reinforce themselves, which
will lead to a relatively high dispersion of the indegrees (a
Matthew effect in popularity as measured by indegrees, cf.
Merton, 1968 and Price, 1976).

A

B C D


In/Out Activity Effect

• Nodes with higher indegree, or higher outdegree respectively,
will have an extra propensity to form ties to others.
• The outdegree-related activity effect again is a self-reinforcing
effect: when it has a positive parameter, the dispersion of
outdegrees will tend to increase over time, or to be sustained
if it already is high.

A

B C D


Preferential Attachment

• Notice: These four degree-related effects can be regarded as
the analogues in the case of directed relations of what was
called cumulative advantage by Price (1976) and preferential
attachment by Barabasi and Albert (1999) in their models for
dynamics of non-directed networks: a self-reinforcing process
of degree differentiation.


In/Out Assortativity Effect

• Preferences of actors dependent on their degrees. Depending
on their own out- and in-degrees, actors can have differential
preferences for ties to others with also high or low out- and
in-degrees (Morris and Kretzschmar 1995; Newman 2002)

A D

B C E F


Covariate Similarity Effect

• The covariate similarity effect, describes whether ties tend to
occur more often between actors with similar values on a
value (homophily effect). Tendencies to homophily constitute
a fundamental characteristic of many social relations, see
McPherson, Smith-Lovin, and Cook (2001).

• Example: Ipad Owners tend to be friends with other Ipad
owners.


Covariate Ego Effect

• The covariate ego effect, describes that actors with higher
values on a covariate tend to nominate more friends and
hence have a higher outdegree.

• Example: Heavier smokers have more friends.


Covariate Alter Effect

• The alter effect describes whether actors with higher V values
will tend to be nominated by more others and hence have
higher indegrees.

• Example: Beautiful people have more friends.


Modeling networks

1. Actor Based modeling for longitudonal data
– SIENA (analysis of repeated measures on social networks and MCMC-
estimation of exponential random graphs)
2. Stochastic modeling for panels
– Pnet

objective function Model 1 Model 2 Model3
esti
estim s.e. p estim s.e p s.e. p
m
outdegree (density) -2,46 0,12 <0,0001* -4,04 0,23 <0,0001* -1,99 0,13 <0,0001*

reciprocity 2,57 0,20 <0,0001* 2,29 0,22 <0,0001* 3,02 0,21 <0,0001*

transitive triplets 0,07 0,01 <0,0001*

transitive mediated triplets -0,03 0,01 0,0005*

transitive ties 1,47 0,24 <0,0001*

3-cycles -0,06 0,02 0,0037*

attribute party 1,13 0,15 <0,0001* 0,73 0,15 <0,0001*

attribute gender -0,11 0,15 0.48


3. Network Theories

Homophily & Assortativity
Power Laws & Preferential Attachment
The Strength of Weak Ties
Small Worlds
Social Capital

Homophily

• Homophily (i.e., love of the same) is the tendency of
individuals to associate and bond with similar others.
(Mechanisms of selection vs influence)
• In the study of networks, assortative mixing is a bias in favor of
connections between network nodes with similar characteristics. In the
specific case of social networks, assortative mixing is also known as
homophily. The rarer disassortative mixing is a bias in favor of connections
between dissimilar nodes.

Low Homophily High Homophily

Homophily II

Types (acc. to McPherson et. Al 2001):
– Race and Ethnicity (Marsden 1987, 88| Louch 2000, Kalleberg et al
1996, Laumann 1973…)
– Sex and Gender (Maccoby 1998, Eder & Hallinan 1978, Shrum et al
1988, Huckfeldt & Sprague 1995, Brass 1985 …)
– Age (Fischer 1977,82, Feld 1982, Blau et Al 1991, Burt 1990,91…)
– Religion (Laumann 1973, Verbrugge 1977, Fischer 1977,82, Marsden
1988, Louch 2000…)
– Education, Occupation and Social Class (Laumann 1973, Marsden 1987,
Verbrugge 1977, Wright 1997, Kalmijn 1998…)
– Network Positions (Brass 1985, Burt 1982, Friedkin 1993…)
– Behavior (Cohen 1977, Kandel 1978, Knocke 1990…)
– Attitudes, Abilities, Beliefs and Aspirations (Jussim & Osgood 1989,
Huckfeldt & Sprague 1995, Verbrugge 1977,83, Knocke 1990)


Schellings Segregation Demo


3.2 Power Laws & Preferential
Attachment

Power Law distribution

• As a function of k, what fraction of pages on the Web have k
in-links?

• A natural guess the normal, or Gaussian, distribution

• Central Limit Theorem (roughly): if we take any sequence of
small independent random quantities, then in the limit their sum
will be distributed according to the normal distribution


Power Law distribution

But when people measured the Web, they found something
very different: The fraction of Web pages that have k in-links is approximately
proportional to 1/k^2

• Power law function
• Popularity exhibits extreme imbalances: there are few very popular Web
pages that have extremely many in-links

True for other domains:
• the fraction of telephone numbers that receive k calls per day: 1/k^2
• the fraction of books bought by k people: 1/k^3
• the fraction of scientific papers that receive k citations: 1/k^3


Preferential attachment leads to power laws

• A preferential attachment process is any of a class of
processes in which some quantity, typically some form of
wealth or credit, is distributed among a number of individuals
or objects according to how much they already have, so that
those who are already wealthy receive more than those who
are not. Notice: "Preferential attachment" (A.L. Barabasi and
R.Albert 1999) is only the most recent of many names that
have been given to such processes.

• Notice: Preferential attachment can, under suitable
circumstances, generate power law distributions.


Preferential Attachment Demo

DEMO with NETLOGO

Balance Theory
Franz Heider
Franz Heider (1940): A person (P) feels uncomfortable whe he
ore she disagrees with his ore her friend(O) on a topic (X).

P feels an urge to change this imbalance. He can adjust his
opinion, change his affection for O, or convince himself that O is
not really opposed to X.


Balance Theory

(a) + + + : three people are
mutual friends
(c) - - + : two people are friends,
and they have mutual enemy in the
third
(b) + + - : A is a friend with B and
C; but B and C – enemies
(d) - - - : all enemies; motivates two
of them to “team up” against the
third

b and d represent unstable
relationship


Balance Theory
Community in a New England Monastery

Young Turks (1), Loyal Opposition (2), Outcasts (3) Interstitial Group (4)

Balance Theory
International Relations


Strength of Weak Ties
Mark Granovetter
• “One of the most influential sociology papers ever
written” (Barabasi)
– One of the most cited (Current Contents, 1986)

• Accepted by the American Journal of Sociology after
4 years of unsuccessful attempts elsewhere.

• Interviewed people and asked: “How did you find
your job?”
– Kept getting the same answer: “through an acquaintance,
not a friend”


Basic Argument

• Classify interpersonal relations as “strong”, “weak”, or “absent”
• Strength is (vaguely) defined as “a (probably linear)
combination of…
– the amount of time,
– the emotional intensity,
– the intimacy (mutual confiding),
– and the reciprocal services which characterize the tie

• The stronger the tie between two individuals, the larger the
proportion of people to which they are both tied (weakly or
strongly)


Strong Ties

• If person A has a strong tie to both B and C, then it is unlikely
for B and C not to share a tie.

A

B C


Weak Ties for Information Diffusion

„Intuitively speaking, this means that
whatever is to be diffused can reach a
larger number of people, and traverse
greater social distance, when passed
through weak ties rather than strong.“


Connectivity and the Small World

1. Travers and Milgram’s work on the small world is responsible
for the standard belief that “everyone is connected by a chain
of about 6 steps.”

2. Two questions:
– Given what we know about networks, what is the longest path (defined
by handshakes) that separates any two people?

– Is 6 steps a long distance or a short distance?


Example: Two Hermits on opposite sites of the
country
OH Store
Hermit Owner

Truck
Manager
Driver

Corporate Corporate
Manager President

Congress Congress
Rep. Rep.

Corporate Corporate
President Manager

Truck
Manager
Driver

Store Mt.
Owner Hermit


Milgrams Test

Milgram’s test: Send a packet from sets of randomly selected
people to a stockbroker in Boston.

Experimental Setup: Arbitrarily select people from 3 pools:
– People in Boston
– Random in Nebraska
– Stockholders in Nebraska


Results

• Most chains found their
way through a small
number of
intermediaries.

• What do these two
findings tell us of the
global structure of social
relations?


Results II

1. Social networks contains a lot of short paths
2. People acting without any sort of global ‘map’ are effective at
collectively ﬁnding these short paths


The Watts-Strogatz model

• Two main principles explaining short paths: homophily and
weak ties:
• Homophily: every node forms a link to all other nodes that lie within a
radius of r grid steps
• Weak ties: each nodes forms a link to k other random nodes

• Suppose, everyone lives on a two-dimensional grid (as a
model of geographic proximity)


Watts-Strogatz


The Watts-Strogatz model

• Suppose, we only allow one out of k nodes to a to have a
single random friend
• k * k square has k random links - consider it as a single node

• Surprising small amount of randomness is enough to make
the world “small” with short paths between every pair of
nodes © Thomas Plotkowiak 2010

Decentralized Search

• People are able to collectively ﬁnd short paths to the
designated target while they don’t know the global ‘map’ of all
connections

• Breadth-ﬁrst search vs. tunneling

• Modeling:
– Can we construct a network where decentralized search succeeds?
– If yes, what are the qualitative properties of such a network?


A model for decentralized search

• A starting node s is given a message that it must forward to a
target node t
• s knows only the location of t on the grid, but s doesn’t know
the edges out of any other node

• Model must span all the intermediate ranges of scale as well


Modeling the process of decentralized search

• We adapt the model by introducing clustering exponent q
• For two nodes v and w, d(v,w) - the number of steps between them
• Random edges now generated with probability proportional
to d(v,w)-q

• Model changes with different values q:
– q=0 : links are chosen uniformly at random
– when q is very small : long-range links are “too random”
– when q is large: long-range links are “not random enough”


Varying clustering exponent


Decentralized Search when q=2

Experiments show that decentralized search is more efficient
when q=2 (random links follow inverse-square distribution)


What’s special about q=2

• Since area in the plane grows like the square of the radius, the
total number of nodes in this group is proportional to d2
• the probability that a random edge links into some node in
this ring is approximately independent of the value of d.
• long-range weak ties are being formed in a way that’s spread
roughly uniformly over all different scales of resolution

Think of the postal
system: country, state,
city, street, and ﬁnally
the street number


Small-World Phenomenon
Conclusions I
1. Start from a Milgram’s experiment: (1) seems there are short
paths and (2) people know how to ﬁnd them effectively
2. Build mathematical models for (1) and (2)
3. Make a prediction based on the models: clustering exponent
q=2
4. Validate this prediction using real data from large social
networks (LiveJournal, Facebook)

Why do social networks arrange themselves in a pattern of
friendships across distance that is close to optimal for forwarding
messages to far-off targets?


Small-World Phenomenon
Conclusions II
• If there are dynamic forces or selective pressures driving the
network toward this shape, they must be more implicit, and it
remains a fascinating open problem to determine whether
such forces exist and how they might operate.
• Robustness, Search, Spread of disease, opinion formation,
spread of computer viruses, gossip,…
• For example: Diseases move more slowly in highly clustered
graphs
• The dynamics are very non-linear -- with no clear pattern
based on local connectivity.

Implication: small local changes (shortcuts) can have dramatic
global outcomes (think of disease diffusion)

Small World Construction

• Network changes from structured to random
• Given 6 Billion Nodes L starts at 3 million, decreases to 4 (!)
• Clustering: starts at 0.75, decreases to zero
• Most important is what happens ALONG the way.


Small worlds demo


Interactive Summary
 The biggest advantage I can gain by using SNA is…
 The most important fact about SNA for me is…
 The concept that made the most sense for me today was…
 The biggest danger in using SNA is …
 If I will use SNA in the future, I will try to make sure that…
 If I use SNA in my next project I will use it for …
 I should change my perspective on networks in considering …
 I have changed my opinion about SNA , finding out that…
 I missed today that …
 Before attending that seminar I didn't know that …
 I wish we could have covered…
 If I forget mostly everything that learned today, I will still remember …
 The most important thing today for me was …


Thanks for your attention!

Questions & Discussion

Social network analysis part ii

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Social network analysis part ii

Similar to Social network analysis part ii (20)

Recently uploaded

Recently uploaded (20)

Social network analysis part ii