Unit 4.pdf

Link Prediction in Social Networks
Evolution
Unit 4

Contents
 Evolution in Social Networks
 Models and Algorithms for Social Influence Analysis
 Algorithms and Systems for Expert Location in Social
Networks
 Link Prediction in Social Networks
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
2

• Informally, evolution refers to a change that
manifests itself across the time axis.
Evolution in Social Networks
3

Evolution in Social Networks
 Framework
 Challenges of Social Network Streams
 Incremental Mining for Community Tracing
 Tracing Smoothly Evolving Communities
4

Framework
 The activities of the participants of a social network
constitute a stream.
 Different perspectives for the study of evolution on
a social network
 Modeling a Network across the Time Axis
 Evolution across Four Dimensions
 Challenges of Social Network Streams
5

Modeling a Network across the
Time Axis
 The objective of a stream mining algorithm is to
maintain an up-to-date model of the data.
 This translates into adapting the model learned to
the currently visible records.
 Only records within the window called Sliding
Window are considered for model adaptation.
 One model Mi is built at each timepoint i and
adapted at the next timepoint into model Mi+1 using
the contents of the time window.
6

Evolution across Four Dimensions
 Four dimensions are associated to knowledge
discovery in social networks:
 Dealing with Time
 Objective of Study
 Definition of a Community
 Evolution as Objective vs Assumption
7

Dimension 1: Dealing with Time
 two different threads on the analysis of dynamic
social networks:
 learn a single model that captures the dynamics of the
network by exploiting information on how the network
has changed from one timepoint to the next.
 learn one model at each timepoint and adapt it to the
data arriving at the next timepoint.
8

Dimension 1: Dealing with Time
(Cont…)
 The object of study for the first thread is a time
series, for the second one it is a stream.
 First thread can exploit all information available
about the network. second thread cannot know how
the network will grow
 The second thread monitors communities, the first thread
derives laws of evolution that govern all communities in a
given network.
9

Dimension 2: Objective of Study
 Understanding how social networks are formed and how they
evolve.
 This is pursued mainly within the first thread of Dimension 1
 seeks to find laws of evolution that explain the dynamics of the large
social networks we see in the Web.
 Community evolution monitoring is pursued by algorithms that
derive and adapt models over a stream of user interactions or
user postings.
 The underpinnings come from two domains:
 stream clustering
 dynamic probabilistic modeling
10

Dimension 3: Definition of a
Community
 A community may be defined by structure
 It may be defined by proximity or similarity of
members
 depends strongly on the underlying model of
computation:
 if stream clustering is used, then communities are crisp
clusters;
 if fuzzy clustering is used, then communities are fuzzy
clusters
 if dynamic probabilistic modeling is used, then
communities are latent variables
11

Dimension 4: Evolution as
Objective vs Assumption
1. methods that discretize the time axis to study how
communities have changed from one timepoint to
the next.
2. methods that make the assumption of smooth
evolution across time and learn community models
that preserve temporal smoothness
12

A framework for Social Network
Evolution
13

Challenges of Social Network
Streams
 community monitoring involves multiple, interrelated
streams (challenge C1).
 community monitoring requires taking account of the
information on the data entities in each of the
streams involved (C2).
 community monitoring involves entities that have the
properties of perennial objects (C3)
14

Incremental Mining for
Community Tracing
 tracing the same community at different timepoints
 communities across time while ensuring temporal
smoothness
15

Tracing Smoothly Evolving
Communities
 Assume that communities are smoothly evolving
constellations of interacting entities.
 Then community learning over time translates into
finding a sequence of models
16

Temporal Smoothness for Clusters
 Two aspects of model quality or model cost:
 Snapshot cost captures the quality of the clustering
learned at each time interval
 Temporal cost measures the (dis-)similarity of a
clustering learned at a particular timepoint to the
clustering learned at the previous timepoint
17

Temporal Smoothness for
Clusters(Cont…)
 cluster monitoring translates to the optimization
problem of finding a sequence of models that
minimizes the overall cost
 cost of model ξt to be learned from the entity
similarity matrix Mt valid at timepoint t is
 The "change parameter" β expresses the
importance of temporal smoothness between the old
model ξt−1 and the new one ξt relatively to the
quality of the new model.
18

Clusters(Cont…)
 The matrix Mt is computed by considering
 local similarity of entities:"local" refers to the current
snapshot of the underlying bipartite graph,
 "temporal similarity", which reflects similarity with
entities at earlier moments.
 This approach assumes that entities (users) found to
be similar in the past are likely to be still similar.
19

Clusters(Cont…)
 Temporal Smoothness for Clusters
 slightly remodeled the cost function to allow for tuning
the impact of the snapshot cost explicitly:
 two ways of modeling temporal smoothness :
1. preservation of cluster membership at timepoint t
2. preservation of cluster quality at t
 Designed for a graph that has a fixed, a priori
known, number of nodes.
20

Dynamic Probabilistic Models
 Data generating process is described by a number
of latent variables.
 Behavior of each entity is described/ determined
by each latent variable to a different degree
(contribution), while all contributions add to 1.
 A community(Latent Variables) determines each
activity observed at each timepoint with some
probability
 latent variables describe the stream(s) of activities
rather than the (stream of) entities,
21

Dynamic Probabilistic Models
(Cont…)
 Discovering and Monitoring Latent Communities
 Dealing with Assumptions on the Data Generating
Process
 Extending the Notion of Community
 Complexity: quadratic to the number of nodes, if no
optimizations are undertaken
22

Models and Algorithms for Social
Influence Analysis

Contents
 Introduction
 Influence Related Statistics
 Social Similarity and Influence
 Influence Maximization in Viral Marketing
24

Social influence
 Social influence refers to the behavior change of
individuals affected by others in a network
 Advertising
 Social recommendation
 Marketing
 Social security
 …
25

Introduction
 The strength of social influence depends on many
factors
 the strength of relationships between people in the
network
 the network distance between users, temporal effects,
 characteristics of networks and individuals in the
network.
26

Influence Related Statistics
 Edge measures
 relate the influence-based concepts and measures on a
pair of nodes.
 explain simple influence-related processes and
interactions between individual nodes.
 Node measures
 Centrality t o measure the importance of a node in the
network
27

Edge Measures
 Tie strength
 Weak ties
 Edge betweenness
28

1.Tie Strength
 Tie strength between two nodes depends on the
overlap of their neighborhoods
 Tie Strength
 nA,nB are the neighborhoods of A and B
 The higher tie strength, the easier for two
individuals to trust one another.
29

Triadic Closure
 Triadic closure can be corollared from tie
strength.
 If A-B and A-C are strong ties, then B-C are likely to be
a strong tie.
 If A-B and A-C are weak ties, then B-C are less likely to
be strong tie.
 Related to social balance theory
30

Triadic Closure (Cont…)
 Triadic closure is measured by the clustering
coefficient of the network
 nΔ be the number of triangles in the network
 |E| be the number of edges
31

2.Weak Ties
 Weak tie
 The overlap of neighborhoods is small
 Local bridge
 No overlapping neighbors
 Global bridge
 Removal of A-B may result in the disconnection of the
connected component containing A and B .
 In real network, global bridges are rare compared to
local bridges
 The effect of local and global bridges is quite similar
32

3.Edge Betweenness
 Edge betweenness
 The total amount of flow across an edge A-B
 The information flow between A and B are evenly
distributed on the shortest paths between A and B.
 Graph partitioning is an application of edge
betweenness
 Gradually remove edges of high betweenness to turn
the network into disconnected components
33

Node Measures - centrality
 to measure the importance of a node in the network.
 centrality measures can be grouped into two categories
 Radial measures
 computed by random walks that start or end from a given node, including:
Degree ,Katz and its extensions ,Closeness
 Medial measures
 computed by the random walks that pass through a given node
 Examples: Node betweenness ,structural holes
34

Node Measures - centrality Cont…
 Radial measures are further categorized into:
 Volume measures
 fix the length of walks and find the volume ( of the walks
limited by the length.
 Length measures
 Fix the volume of the target nodes and find the length of
walks to reach the target volume.
35

Degree
 radial and volume-based measures.
 degree centrality
 The degree centrality of node i is defined to
be the degree of the node:
36

Degree Cont…
 The Katz centrality
 based on the diffusion behavior in the network.
 counts the number of walks starting from a node, while
penalizing longer walks.
 ei is a column vector whose ith element is 1, and all
other elements are 0.
 The value of β is a positive penalty constant
between 0 and 1.
37

Degree Cont…
 Bonacich centrality
 allows for negative values of β.
 negative weight allows to subtract the even-numbered
walks from the odd-numbered walks which have an
interpretation in exchange networks
38

Degree Cont…
 The Hubbell centrality
 X is a matrix and y is a vector.
 X = βA and y = βA1 lead to the Katz centrality
 X = βA and y = A1 lead to the Bonacich centrality.
39

Degree Cont…
 The eigenvector centrality
 the principal eigenvector of the matrix A, is related to
the Katz centrality
 the eigenvector centrality is the limit of the Katz
centrality as β approaches 1/λ from below
40

2.Closeness
 radial and length based measures.
 the length based measures count the length of the
walks.
 Freeman’s closeness centrality
 measures the centrality by computing the average of
the shortest distances to all other nodes.
 S be the matrix whose (i, j)th element contains the
length of the shortest path from node i to j
 1 is the all one vector.
41

3.Node Betweenness
 Freeman’s betweenness centrality
 measures how much a given node lies in the shortest
paths of other nodes.
 bjk is the number of shortest paths from node j to k,
 bjik be the number of shortest paths from node j to k that
pass through node i.
42

Node Betweenness Cont…
 Newman’s betweenness centrality
 Based on random walks on the graph.
 Idea: instead of considering shortest paths, it considers
all possible walks and computes the betweenness from
these different walks.
 matrix whose (j, k)th element k contains the
probability of a random walk from j to k, which contains i
as an intermediate node.
43

Structural holes
 A node is a structural hole if it is connected to
multiple local bridges.
 The actor who serves as a structural hole can
interconnect information originating from multiple
non-interacting parties.
 Therefore, this actor is structurally important to the
connectivity of diverse regions of the network.
44

Social Similarity and Influence
45

Social Similarity and Influence
 A central problem for social influence is to
understand the interplay between similarity and
social ties
 Homophily
 Existential Test for Social Influence
 Influence and actions
 Influence and Interaction
46

Homophily
 Homophily
 An actor in the social network tends to be similar to their connected
neighbors.
 Originated from different mechanisms
 Social influence
 Indicates people tend to follow the behaviors of their friends
 Selection
 Indicates people tend to create relationships with other people who are
already similar to them
 Confounding variables
 Other unknown variables exist, which may cause friends to behave similarly
with one another.
47

Generative models for selection
and influence
 Holme-Newman Generative model
 initially place the M edges of the network uniformly at
random between vertex pairs
 assign opinions to vertices uniformly at random.
 At each step either:
 move an edge to lie between two individuals whose opinions
agree (selection process)
 change the opinion of an individual to agree with one of
their neighbors (influence process)
 selection tend to generate a large number of small
clusters, while social influence will generate large
coherent clusters.
48

Generative models for selection
and influence
 Limitation of Holme-Newman Generative model
 Every vertex at a given time can only have one opinion.
 Crandall et al. introduced multi-dimensional opinion
vectors to better model complex social networks.
 Assume a set of m possible activities in the social network.
 Each node v at time t has an m-dimensional vector v(t),
where the ith coordinate of v(t) represents the extent to
which person v is engaging
 use cosine similarity to compute the similarity between two
people. in activity i.
 More data is required in order to learn the parameters.
49

Quantifying influence and
selection
 Numerator is the conditional probability that an
unlinked pair will become linked
 numerator is the same probability for unlinked pairs
whose similarity exceeds the threshold .
 Values greater than one indicate the presence of
selection.
50

Quantifying influence and
selection Cont…
 numerator is the conditional probability that similarity
increases from time t−1 to t between two nodes that
became linked at time t
 Denominator is the probability that the similarity increases
from time t−1 to t between two nodes that were not
linked at time t − 1.
 values greater than one indicate the presence of
influence.
51

Matrix alignment
framework
 incorporates the temporal information to learn the
weight of different attributes for establishing
relationships between users.
 This can be done by optimizing (minimizing) the
following objective function:
 diagonal elements of W correspond to the vector of weights
of attributes and denotes the Frobenius norm
 A distortion distance function is used to measure the degree
of influence and selection.
52

Matrix alignment
framework Cont…
 used to analyze influence and selection.
 it does not differentiate the influence from different
angles
 How do we quantify the strength of those social
influences?
 How do we construct a model and estimate the model
parameters for real large networks?
53

Topic-level Social Influence Analysis
on Co-author Network
54

Topical Factor Graph
(TFG)model
 to formalize the topic-level social influence analysis
into a unified graphical model,
 Presents Topical Affinity Propagation (TAP) for
model learning.
 the goal of the model is to simultaneously capture
user interests, similarity between users, and network
structure.
 The TFG model has:
 a set of observed variables {vi}Ni =1 and
 a set of hidden vectors {yi}Ni =1
55

Topical Factor Graph
(TFG)model
 The hidden vector yi ∈ {1, . . . ,N}T models the
topic-level influence from other nodes to node vi.
 Each element yz i takes the value from the set {1, . .
. ,N},
 represents the node that has the highest probability to
influence node vi on topic z.
56

Graphical Representation of TFG
model
57

Shuffle test for social influence
 Goal
 to test whether social influence exists in a social
network?
 Intuition
 if social influence does not play a role, then the timing
of activation should be independent of the timing of
activation of other users.
 Edge reversal test: if the social correlation is not
determined by social influence, then reversing the edge
direction will not change the social correlation estimate
58

Randomization Test of Social
Influence
 Model social network as a time-evolving graph
Gt=(V, Et), and the nodes with attribute at time t is
Xt
 Main idea
 Selection and social influence can be differentiated
through the autocorrelation between Xt and Gt
 Selection process
 Xt-1 determines the social network at Gt
 Social influence
 Gt-1 determines the node attributes at time t, i.e., Xt
59

Computational social influence
 Quantifying Influence from different topics
60

Influence and Action
61

Learning Influence Probabilities
 Goal: Learn user influence and action influence from
historical actions
 Assumption
 If user vi performs an action y at time t and later his
friend vj also perform the action, then there is an
influence from vi to vj
 User influence probability
62

Learning Influence Probabilities
 Action influence probability
 is the difference between the time when vj
performing the action and the time when user vi
performing the action, given eij=1; represents the
action propagation score
63

Social Action Modeling and
Prediction
64

Influence and Autocorrelatoin
 Autocorrelation
 A set of linked users eij=1 and an attribute matrix X
associated with these users, as the correlation between
the values of X on these instance pairs
 Social influence, diffusion processes, and the principle
of homophily give rise to auto correlated observations
 Behavior correlation
 utilize the observed behavior correlation to predict the
collective behavior in social media through
 Social dimension extraction
 Discriminative learning
65

Influence and Grouping Behavior
 Grouping behavior
 E.g., user’s participation behavior into a forum
 four factors in online forums that potentially influence people’s behavior in
joining communities
 Friends of Reply Relationship.
 Describe how users are influenced by the number of neighbors with whom they have
ever had any reply relationship.
 Community Sizes
 Quantify the ‘popularity’ of information
 Average Ratings of Top Posts
 Measure how the authority of information impacts user behavior.
 Similarities of Users
 If two users are ‘similar’ in a certain way, what is the correlation of the sets of
communities they join?
66

Influence maximization in viral
marketing
67

Influence Maximization
 Influence maximization
 Minimize marketing cost and more generally to
maximize profit.
 E.g., to get a small number of influential users to adopt
a new product, and subsequently trigger a large
cascade of further adoptions.
68

Problem Abstraction
 We associate each user with a status: active or
inactive
 The status of the chosen set of users (seed nodes) to
market is viewed as active
 Other users are viewed as inactive
 Influence maximization
 Initially all users are considered inactive
 Then the chosen users are activated, who may further
influence their friends to be active as well
69

Alg1: High-degree heuristic
 Choose the seed nodes according to their degree.
 Intuition
 The nodes with more neighbors would arguably tend to
impose more influence upon its direct neighbors.
 Know as “degree centrality”
70

Alg2: Low-distance Heuristic
 Consider the nodes with the shortest paths to other
nodes as seed nodes
 Intuition
 Individuals are more likely to be influenced by those
who are closely related to them.
71

Alg3: Degree Discount Heuristic
 General idea
 If u has been selected as a seed, then when considering
selecting v as a new seed based on its degree, we
should not count the edge v->u
 Referred to SingleDiscount
 Specifically, for a node v with dv neighbors of
which tv are selected as seeds, we should discount
v’s degree by 2tv +(dv-tv)tvp
72

Diffusion Influence Model
 Linear Threshold Model
 Cascade Model
73

Linear Threshold Model
 General idea
 Whether a given node will be active can be based on an
arbitrary monotone function of its neighbors that are
already active.
 Formalization
 fv : map subsets of v’s neighbors to real numbers in [0,1]
 θv : a threshold for each node
 S: the set of neighbors of v that are active in step t-1
 Node v will turn active in step t if fv(S)> θv
 Specifically, in [Kempe, 2003], fv is defined as ,
where bv,u can be seen as a fixed weight, satisfying
74

Cascade Model
 Cascade model
 pv(u,S) : the success probability of user u activating user v
 Independent cascade model
 pv(u,S) is a constant, meaning that whether v is to be active
does not depend on the order v’s neighbors try to activate
it.
 Key idea: Flip coins c in advance -> live edges
 Fc(A): People influenced under outcome c (set cover !)
 F(A) = Sum cP(c) Fc(A) is submodular as well
75

Algorithms and
Systems for Expert Location in Social Networks
76

Expert-location problem
 Given a particular task and a set of candidates,
one often wants to identify the right expert (or set
of experts) that can perform the given task.
 Expert location without graph constraints.
 an important aspect of expert location is finding out the
expertise of different individuals from the available
data.
 Expert location with score propagation.
 problem of inferring the expertise of individuals using
their connections with other experts
77

Expert team formation problem
 problem of identifying a team of experts that can
communicate well
78

Expert Location without Graph
Constraints
 Given a query Q that consists of a list of skills,
identify a subset of candidates X ⊆ X who have the
skills specified by the query.
79

Expert Loc. with Score Propagation
 Given a query Q that consists of a list of skills, and
the social graph G, identify an initial subset of
candidates X ⊆ X who have the skills specified by the
query.
 Use the input graph among experts to propagate the
initial expertise scores to rerank experts or to identify
new experts.
80

Expert Team Formation
 Given a query Q that consists of a list of skills, and
the social graph G, identify a subset of candidates X
⊆ X so that the chosen candidates cover the required
skills and the collaboration/communication cost of X
is minimized.
 The collaboration/communication cost is defined over
the input graph G.
81

Expert Location with Score Propagation
82

A two-step process
1. using language model or heuristic rules to compute
an initial expertise score for each candidate
2. using graph-based ranking algorithms to
propagate scores computed in the step 1 and
rerank experts.
83

Webpage ranking algorithms
 PageRank
 HITS
84

The PageRank Algorithm
 Models the probability that a web page will be
visited by a random surfer.
 At each time step, the surfer proceeds from his
current web page u to a randomly chosen web
page v following one of two strategies:
1. When u does not have outlinks, the surfer jumps to an
arbitrary page in the Web.
2. When u has outlinks, the surfer jumps to an arbitrary
page in the Web with probability α, and a random
page that u hyperlinks to with probability 1 − α.
85

The PageRank Algorithm (Cont…)
 when the surfer follows the above combined
process, the probability of him visiting a webpage
u, denoted by π(u), converges to a fixed, steady-
state quantity.
 We call this probability π(u) the PageRank of u.
86

HITS Algorithm
 There are two types of web pages useful as results for broad-
topic searches:
 Authority
 Authoritative source for the broad topic search
 Hubs
 there are many other pages that contain lists of links to authoritative pages
on a specific topic.
 A good hub is one that points to many good authorities; a
good authority page is one that is linked to by many
goodhubs.
 Therefore, each web page can be assigned two scores – an
authority score and a hub score.
87

Expert Score Propagation
 Algorithm 1:Assume user A has answered questions
for users U1, . . . , Un,then the expertise ranking score
of A, denoted by ER(A), is computed as follows:
 C(Ui) is the total number of users who have helped Ui
 α is the damping factor which can be set between 0
and 1.
88

Algorithm 2
 The expert score of a user vi, denoted by s(vi), is
updated using the following rules:
 w ((vj, vi), e) is the propagation coeffient and e ∈ Rji
is one kind of relationship from person vj to vi.
 U denotes the set of neighbors to vi and Rji is the set
of all relationships between vj and vi.
89

Algorithm 3
 Employee ej ’s expertise score p(q|ej) on a given
topic q is smoothed using the following equation:
 α is weighting parameter and Nj is the number of
neighbors for employee ej .
 p(q|ej ) and p(q|ei) are the initial scores for
employee ej and his neighbor ei, respectively.
90

Expert Team Formation
91

Introduction
 The objective is to find a group of experts in the
network who can
 collectively perform a task in an effective manner.
92

Metrics
 TheMyers-Briggs Type Indicator (MBTI)
 frequently-employed personality assessment tool.
 help people to identify the sort of jobs where they would be “most
comfortable and effective".
 used in improving and predicting team performance
 Herrmann Brain Dominance Instrument (HBDI)
 reflects individual’s affinity for creativity, facts, form and feelings
 used in several experimental settings
 Kolbe Conative Index (KCI)
 a psychometric system that measures conation
 effective in predicting team’s performance.
93

Forming Teams of Experts
 Problem 1 [Team Formation] Given a set of n
individuals denoted by X = {1, . . . , n}, a graph
G(X,E), and task T, find X ⊆ X, so that (∪i∈X’Xi) ∩ T
= T, and the communication cost Cc(X’) is minimized.
 Diameter (R): Given graph G(X,E) and a set of
individuals X ⊆ X, we define the diameter
communication cost of X, denoted by Cc-R (X), to be
the diameter of the subgraph G[X].
94

Forming Teams of Experts (Cont…)
 Minimum Spanning Tree (Mst): Given graph G(X,E)
and X ⊆ X, we define the Mst communication cost of
X, denoted by Cc-Mst (X), to be the cost of the
minimum spanning tree on the subgraph G[X].
 The Team Formation problem with communication
function Cc-R is called the Diameter-Tf problem.
 Team Formation problem with communication
function Cc-Mst is called the Mst-Tf problem.
95

The RarestFirst Algorithm for the
Diameter-TF Problem
96

The EnhancedSteiner Algorithm for
the MST-TF Problem
97

Link Prediction in Social Networks
98

The “Link Prediction Problem”
 Given a snapshot of a social network, can we infer
which new interactions among its members are likely to
occur in the near future?
 Based on “proximity” of nodes in a network
99

Variations
 Link completion
 Example: When a user buys five books online and the
name of one book is corrupted in transfer. A link
completion algorithm could infer the name of the
missing book based on the user's name and the other
books she bought
 Alice, Bob, and a third person attended a meeting.
Given people’s previous co-occurrences, who is the third
person?
 Link discovery
100

Definition of Link Prediction
 Generalized Definition
 Existence (i.e., does a link exist?)
 Classification (i.e., what type of the relationship?)
 Regression (i.e., how does the user rate the item?)
 Cardinality (i.e., is there more than one link between same pair of nodes?)
101

Introduction
 Natural examples of social networks:
Nodes = people/entities
Edges = interaction/ collaboration
Nodes Edges
Scientists in a discipline Co-authors of a paper
Employees in a large
company
Working on a project
Business Leaders Serve together on a
board
102

Motivation
 Understanding how social networks evolve
 The link prediction problem
 Given a snapshot of a social network at time t, we seek
to accurately predict the edges that will be added to the
network during the interval (t, t’)
?
103

Why?
 To suggest interactions or collaborations that haven’t yet
been utilized within an organization
 To monitor terrorist networks - to deduce possible
interaction between terrorists (without direct evidence)
 Used in Facebook and Linked In to suggest friends
104

Motivation
 Co-authorship network for scientists
 Scientists who are “close” in the network
will have common colleagues & circles –
likely to collaborate
Caveat: Scientists who have never collaborated might
in future - hard to predict
 Goal: make that intuitive notion precise;
understand which measures of
“proximity” lead to accurate predictions
A
B
C
D
105

Goals
 Present measures of proximity
 Understand relative effectiveness of network proximity
measures (adapted from graph theory, CS, social sciences)
 Prove that prediction by proximity outperforms random
predictions by a factor of 40 to 50
 Prove that subtle measures outperform more direct
measures
106

Challenges of Link Prediction
 Data sparse
 Missing data/relationship
 Imbalance
 So many possibilities, so few choices
 Referred as inaccurate problem due to low accuracy in
practice
 Accuracy vs. Scalability
 Modeling (unobserved/unknown factors)
 Tasks of approximation/optimization
107

Methods for Link Prediction
 Take the input graph during training period Gcollab
 Pick a pair of nodes (x, y)
 Assign a connection weight score(x, y)
 Make a list in descending order of score
 score is a measure of proximity
108

Proximity Measures for Link Prediction
109

Graph distance & Common Neighbors
 Graph distance: (Negated) length
of shortest path between x and y
 Common Neighbors: A and C have
2 common neighbors, more likely to
collaborate
A
B
C
D
E
(A, C) -2
(C, D) -2
(A, E) -3
A
B
C
D
E
110

Jaccard’s coefficient and Adamic / Adar
 Jaccard’s coefficient: same as
common neighbors, adjusted for
degree
 Adamic / Adar: weighting rarer
neighbors more heavily
A
B
C
D
E
111

Preferential Attachment
 Probability that a new collaboration involves x is
proportional to T(x), current neighbors of x
 score (x, y) :=
112

Considering all paths: Katz
 Katz: measure that sums over
the collection of paths,
exponentially damped by length
(to count short paths heavily)
 β is chosen to be a very small value (for
dampening)
A
B
C
D
E
113

Hitting time, PageRank
 Hitting time: expected number of steps for a random
walk starting at x to reach y
 Commute time:
 If y has a large stationary probability, Hx,y is small. To
counterbalance, we can normalize
 PageRank: to cut down on long random walks, walk can
return to x with a probablity α at every step y
114

SimRank
 Defined by this recursive definition: two nodes are
similar to the extent that they are joined by similar
neighbors
115

Low-rank approximation
 Treat the graph as an adjacency matrix
 Compute the rank-k matrix Mk (noise-reduction)
 x is a row, y is a row, score(x, y) = inner product of rows
r(x) and r(y)
-A B C
A 1 0
B 1 1
C 0 1
116

Unseen bigrams and Clustering
 Unseen bigrams: Derived from language modeling
 Estimating frequency of unseen bigrams – pairs of
words (nodes here) that co-occur in a test corpus but not
in the training corpus
 Clustering: deleting tenuous edges in Gcollab through a
clustering procedure and running predictors on the
“cleaned-up” subgraph
117

Results
 The results are presented as:
1. Factor improvement of proposed predictors over
 Random predictor
 Graph distance predictor
 Common neighbors predictor
2. Relative performance vs. the above predictors
3. Common Predictions
118

Feature based Link Prediction
119

Drawbacks of Unsupervised Algorithms
 Generality
 Unsupervised methods are domain-specific
 Unstable not only from one network to the other, but
also from one graph-instance to another
 Variance Reduction and Sampling Issues
 Ensemble could possibly remove important information
 Samples of networks with ill-behaving distributions
produce networks with different properties
 Imbalance
 Agnostic to class distributions by definition
120

Supervised Learning
 Feature extraction
 Classification
 SVM
 Decision Tree
121

Supervised Learning
122

Pros & Cons of Supervised
Algorithms
 Generality
 Generalize well to many environments in the sense that they
can adjust models depending on posteriorinformation.
 Variance Reduction and Sampling Issues
 Classification algorithms, like decision tree, can befit from
reduced variance by placing them in an ensemble
framework.
 Imbalance
 Capable of balancing data and focus on class boundaries.
 XLimitation
 Unavailability of labels for trainings.
123

Probabilistic Models
124

Motivation
 Most real-world data are stored in relational DBMS
 Few learning algorithms are capable of handling
data in its relational form; thus we have to resort to
“flattening” the data in order to do analysis
 As a result, we lose relational information which might
be crucial to understanding the data
125

Probabilistic Models of Bayesian
networks (BNs)
 BNs for a given domain involves a pre-
specified set of attributes whose relationship to
each other is fixed in advance
126

Bayesian Networks
Good Writer Smart
Quality
Accepted
Review
Length
nodes = domain variables
edges = direct causal influence
Network structure encodes conditional independencies:
I(Review-Length , Good-Writer | Reviewer-Mood)
Reviewer
Mood
0.6 0.4
w
s
w
0.3 0.7
0.1 0.9
0.4 0.6
s
w
s
s
w
S
W P(Q| W, S)
conditional probability table (CPT)
127

BN Semantics
 Compact & natural representation:
 nodes  k parents  O(2k n) vs. O(2n) params
 natural parameters
conditional
independencies
in BN structure
+ local
CPTs
full joint
distribution
over domain
=
W
M
S
Q
A
L
)
,
(
)
|
(
)
,
|
(
)
|
(
)
(
)
(
)
,
,
,
,
,
(
q
m
a
P
m
l
P
s
w
q
P
w
m
P
s
P
w
P
a
l
q
m
s
w
P
|

128

Reasoning in BNs
 Full joint distribution answers any query
 P(event | evidence)
 Allows combination of different types of reasoning:
 Causal: P(Reviewer-Mood | Good-Writer)
 Evidential: P(Reviewer-Mood | not Accepted)
 Intercausal: P(Reviewer-Mood | not Accepted,
Quality)
W
M
S
Q
A
L
129

 To compute 

l
q
m
s
w
a
l
q
m
s
w
P
a
P
,
,
,
,
)
,
,
,
,
,
(
)
(
 l
q
m
s
w
q
m
a
P
m
l
P
s
w
q
P
w
m
P
s
P
w
P
,
,
,
,
)
,
|
(
)
|
(
)
,
|
(
)
|
(
)
(
)
(
factors
mood good writer
good
pissy
good
pissy
false
true
true
false
0.7
0.3
0.1
0.9 A factor is a function from
values of variables to
positive real numbers
Variable Elimination
130

Some Other Inference Algorithms
 Exact
 Junction Tree [Lauritzen & Spiegelhalter 88]
 Cutset Conditioning [Pearl 87]
 Approximate
 Loopy Belief Propagation [McEliece et al 98]
 Likelihood Weighting [Shwe & Cooper 91]
 Markov Chain Monte Carlo [eg MacKay 98]
 Gibbs Sampling [Geman & Geman 84]
 Metropolis-Hastings [Metropolis et al 53, Hastings 70]
 Variational Methods [Jordan et al 98]
131

Parameter Estimation in BNs
 Assume known dependency structure G
 Goal: estimate BN parameters q
 entries in local probability models,
 q is good if it’s likely to generate observed data.
 MLE Principle: Choose q* so as to maximize l
 Alternative: incorporate a prior
)
,
|
(
log
)
,
:
( G
D
P
G
D
l q
q 
)
]
[
Pa
|
(
, u
X
x
X
P
u
x 


q
133

Learning With Complete Data
 Fully observed data: data consists of set of
instances, each with a value for all BN variables
 With fully observed data, we can compute
= number of instances with , and
 and similarly for other counts
 We then estimate
s
w
q
N ,
,
s
w
s
w
q
s
w
q
N
N
s
w
q
P
,
,
,
,
, )
,
|
( 

q
q w s
134

Learning with Missing Data:
Expectation-Maximization (EM)
 Can’t compute
 But
1. Given parameter values, can compute expected counts:
2. Given expected counts, estimate parameters:
 Begin with arbitrary parameter values
 Iterate these two steps
 Converges to local maximum of likelihood
s
w
q
N ,
,


i
i
i
i
i
s
w
q s
w
q
P
N
E
instances
,
, )
evidence
|
,
,
(
]
[
]
[
]
[
)
,
|
(
,
,
,
,
,
s
w
s
w
q
s
w
q
N
E
N
E
s
w
q
P 

q
this requires BN inference
135

Structure search
 Begin with an empty network
 Consider all neighbors reached by a search operator
that are acyclic
 add an edge
 remove an edge
 reverse an edge
 For each neighbor
 compute ML parameter values
 compute score(s) =
 Choose the neighbor with the highest score
 Continue until we reach a local maximum
)
(
log
)
,
|
(
log *
s
P
s
D
P s 
q
*
s
q
136

Limitations of BNs
 Inability to generalize across collection of individuals
within a domain
 if you want to talk about multiple individuals in a domain,
you have to talk about each one explicitly, with its own local
probability model
 Domains have fixed structure: e.g. one author, one
paper and one reviewer
 if you want to talk about domains with multiple inter-related
individuals, you have to create a special purpose network
for the domain
 For learning, all instances have to have the same set
of entities
137

Relational Schema
Author
Good Writer
Author of
Has Review
• Describes the types of objects and relations in the
database
Review
Paper
Quality
Accepted
Mood
Length
Smart
139

Probabilistic Relational Model
Length
Mood
Author
Good Writer
Paper
Quality
Accepted
Review
Smart
140

Length
Mood
Author
Good Writer
Paper
Quality
Accepted
Review
Smart
÷
÷
÷







Paper.Review.Mood
Paper.Quality,
Paper.Accepted |
P
141

Length
Mood
Author
Good Writer
Paper
Quality
Accepted
Review
Smart
3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
142

Fixed relational skeleton :
– set of objects in each class
– relations between them
Author A1
Paper P1
Author: A1
Review: R1
Review R2
Review R1
Author A2
Relational Skeleton
Paper P2
Author: A1
Review: R2
Paper P3
Author: A2
Review: R2
Primary Keys
Foreign Keys
Review R2
143

Author A1
Paper P1
Author: A1
Review: R1
Review R2
Review R1
Author A2
PRM defines distribution over instantiations of attributes
PRM w/ Attribute Uncertainty
Paper P2
Author: A1
Review: R2
Paper P3
Author: A2
Review: R2
Good Writer
Smart
Length
Mood
Quality
Accepted
Length
Mood
Review R3
Length
Mood
Quality
Accepted
Quality
Accepted
Good Writer
Smart
144

A Portion of the BN
P2.Accepted
P2.Quality r2.Mood
P3.Accepted
P3.Quality
3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
Pissy
Low
3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
r3.Mood
145

A Portion of the BN
P2.Accepted
P2.Quality r2.Mood
P3.Accepted
P3.Quality
Pissy
Low
r3.Mood
High 3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
Pissy
146

PRM: Aggregate Dependencies
Length
Mood
Paper
Quality
Accepted
Review
Review R1
Length
Mood
Review R2
Length
Mood
Review R3
Length
Mood
Paper P1
Accepted
Quality
149

PRM: Aggregate Dependencies
sum, min, max,
avg, mode, count
Length
Mood
Paper
Quality
Accepted
Review
Review R1
Length
Mood
Review R2
Length
Mood
Review R3
Length
Mood
Paper P1
Accepted
Quality
mode
3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
150

PRM with AU Semantics
))
.
(
|
.
(
)
,
,
|
( ,
.
A
x
parents
A
x
P
P S
x A
x


 



S
I
Attributes
Objects
probability distribution over completions I:
PRM relational skeleton 
+ =
Author
Paper
Review
Author
A1
Paper
P2
Paper
P1
Review
R3
Review
R2
Review
R1
Author
A2
Paper
P3
151

Unit 4.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Unit 4.pdf

Similar to Unit 4.pdf (20)

More from Jayaprasanna4

More from Jayaprasanna4 (20)

Recently uploaded

Recently uploaded (20)

Unit 4.pdf