14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 1
Predicting New Friendships In Social Networks
Anvardh Nanduri
Department of Computer Science
George Mason University
Fairfax, VA 22030
snanduri@gmu.edu
Huzefa Rangwala
Department of Computer Science
George Mason University
Fairfax, VA 22030
rangwala@cs.gmu.edu
Abstract—Predicting new links in social networks is an im-
portant problem within the network analysis literature. Many
existing methods consider a single snapshot of the network as
input, neglecting an important aspect of these networks; their
temporal evolution over time. In this paper, we incorporate
temporal information to solve the link prediction problem.
Temporal information comes from the past interactions among
the nodes in the network. Using a supervised learning framework,
we train a binary classifier which predicts potential new links
in the network in the next epoch. Anonymized Facebook data of
New Orleans’ users over a period of 28 months has been used in
this study. Specifically, we train decision tree classifiers and feed-
forward neural networks to predict new relationships that are
going to emerge in the underlying network. Our experiments
clearly show that these models when trained with temporal
information perform better compared to models trained with
no temporal information.
I. INTRODUCTION
We live in extremely complex and interdependent societies,
where people join together in groups for various reasons
including mutual aid and protection [15]. The members of
these networks interact with each other all the time, reinforcing
their relationships and forming new relationships. Thus the
connections among the members in the social networks evolve
over time and understanding these mechanisms by which they
evolve is a complex problem to solve due to a large number
of variable parameters. One of the common challenges is to
understand the association between two specific members also
known as nodes in the network. In particular, the accurate
prediction of new relationships is an important problem in this
setting and is called the link prediction problem. The advent
of social networks has significantly increased online interac-
tions by fostering friendships (e.g., Facebook, Google Plus,
Renren), by forming professional networks (e.g., Linkedin),
and by finding potential partners (e.g., Match.com). Besides
social networks, dynamic relational networks are ubiquitous in
plethora of other scientific fields. Examples include the World
Wide Web, biological interaction networks, citation networks,
trade networks, transportation networks and ecological net-
works to name a few.
Recently, dynamic relational networks have emerged as
a powerful representation mechanism for the data captured
from these complex systems that evolve over time. The main
source of complexity in dynamic relational networks can be
attributed to the changes in the topological structure and/or
in the information associated with nodes and edges. For a
friendship network like Facebook, changes in its representative
relational network are seen when individuals join or leave the
network, i.e., addition and deletion of vertices, respectively.
Subsequently, new friendships can be formed or broken and
thus resulting in addition or deletion of edges in the network.
Similarly, node-based attributes may change over time based
on the personal information of an individual.
The primary emphasis of this paper is to develop approaches
for predicting new friendships (links) within a popular, large
and evolving online social network with heterogeneous infor-
maton; Facebook. In this work we study the temporal aspect
of social network evolution in the link prediction task by
combining various static features with time-sensitive features.
We show that using temporal aspects of the social network
improves the overall prediction performance of the learned
classifier. In our experiments we have trained decision tree
classifier J48 from WEKA data mining suite. We also trained
feed forward neural network with 2 layers having 200 neurons
per each layer in big data analytics tool called H2O [16].
The rest of the paper is organized as follows. In Section 2,
we formally present the problem of link prediction in temporal
networks with an example. In Section 3, we survey how past
research has tried to address link prediction problem and other
related variants of the problem. In Section 4, we present in
detail the methods used and techniques applied in this work.
In Section 5, we present the details of the Facebook data set
used, the evaluation metrics used to gauge the performance of
various methods and the experimental results. Then finally we
present ideas for future extension.
II. TEMPORAL LINK PREDICTION
Given a snapshot of a network at time t, we seek to
accurately predict the edges that will be added to the network
during the interval from time t to a given future time t’ [1].
Given a snapshot of graph G (V, E) at given time t- link
prediction involves accurately determining the possibility of
a link between a pair of nodes (v, u) ε E’ in graph G’ (V’,
E’) in future time t’. For example, in Figure 1, if graph on
the left is the snapshot of a graph at time t with 5 nodes- A
B C D E, we have to predict if a link forms between B and
D in next snapshot at time t+1 based on the information at
time t. Assuming a new node E was newly added to network
14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 2
at snapshot at time t, we also have to predict if E forms a link
with B in next snapshot at time t+1.
Figure 1: Evolution of links in Temporal Networks
A. Definitions & Notations
Network data is represented as graph G (V, E) where V is
set of vertices and E is set of edges. Users or members are
represented as nodes or vertices of the network and similarly
relationships, connections or interactions are represented by
the edges in the network. We will use small letters, like x, y
and z to denote nodes in a social network. For a node x, Γ(x)
represents the set of neighbors of x. Degree of x is the size of
the Γ(x) denoted by |Γ(x)|.
III. RELATED WORK
Extensive research has been done pertaining to link predic-
tion in temporal networks and its applications [10].
A. Non temporal Link Prediction
Fire et. al. studied the problem of computationally efficient
link prediction in social networks [4]. They observed that
primary bottleneck in link prediction techniques is extraction
of structural features required for training the classifier. They
proposed two easy-to-compute features: (i) friends measure
and (ii) same community measure and showed that they are
equally effective in identifying missing/future links even when
applied to a predicament of classifying links between individ-
uals with at least one common friend. They also presented a
method for calculating the amount of data needed in order to
build more accurate classifiers.
Hasan et. al. surveyed representative link prediction meth-
ods [11] by considered three types of approaches: (i) Tradi-
tional or non-Bayesian methods which extract set of features
to train a binary classification model (ii) Probabilistic methods,
which model the joint probability among the entities in a net-
work by Bayesian graphical models and (iii) Linear Algebraic
methods, which compute the similarity between nodes by rank-
reduced similarity matrices.
Menon et. al. proposed to solve the link prediction problem
using a supervised matrix factorization approach [6], where
the model learns latent features from the topological structure
of a graph and is shown to make better predictions than
popular unsupervised factorization techniques. They discussed
how latent features could be combined with optional explicit
features for nodes or edges to yield better performance than
using either type of feature exclusively.
Huang et. al. compared various static features for link
prediction and presented the most useful features in terms of
their prediction accuracy [9]. In this paper we have considered
the combination of these features as our comparative baseline.
B. Temporal Link Prediction
Early studies conducted by Leskovec et. al. [2] state that
little work has been done on analyzing long-term graph
trends. Their study found that, over time, networks densify
and the average distance between nodes decreases. This was
contrary to the existing beliefs that the average node degree
remains constant and average distance slowly increases. They
claimed that existing graph generation models are not realistic
and proposed a new forest-fire model which simulates the
behaviors like shrinking diameters and densification.
Potgieter et. al. used temporal analysis to aid link prediction
[3]. They showed that incorporating the temporal trends that
emerge in sociogram sequences greatly increase the accuracy
of link prediction. They introduced three types of temporal fea-
tures, the first two of which are well studied in the economics
literature. Return as the percentage increase or decrease of a
value over a period of time shows the rate of change of a given
metric. Moving averages are used to extract long term trends
from short-term noise. Recency captures how much time has
elapsed since a node has communicated. Recency and moving
averages seemed very promising and we have implemented
them in this study as part of the temporal features.
Qiu et. al. studied the evolution of node behavior in link
prediction [13]. They found that evolution of node behavior
is useful in modeling social networks, extracted temporal
features to characterize behavior evolution, and used these
temporal features to improve link prediction. They considered
four types of temporal features: (i) Simple Statistics features
such as recency and activeness, (ii) Local Pattern features
which calculate the frequency of a repetitive local pattern,
(iii) Prediction features that captured the global trends and
(iv) Interplay features that computed the joint likelihood of
two nodes to be connected in the future based on to which
degree the two nodes match each other’s preference.
Tylenda et. al. [5] investigated the value of incorporating
historical information available on the interactions (or links)
of the current social network state and showed that using
time-stamps of past interactions significantly improved the
prediction accuracy of new and recurrent links over rather so-
phisticated methods proposed. Another interesting contribution
of this work is in introducing the dichotomy of edge-centric
and node-centric methodology in solving link prediction prob-
lem. They showed that edge-centric approaches have some
drawbacks, as they assume that users are interested in new
edges (relationships) irrespective of the vertices they connect
to and added that this may lead to situations where the top of
the list of edges is occupied by mutually uninteresting pairs
of vertices. They pointed out that edge-centric link prediction
often boils down to deciding whether two nodes have anything
in common – irrespective of the distance between them in the
social network. However, in reality, most links are established
between nodes that are close in terms of the graph distance
between them and similarity of attributes.
When the link prediction is treated as a binary classification,
the following problem occurs in the selection of training and
testing data points. The negative points are random pairs of
vertices. Therefore, it is highly unlikely that they have any
14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 3
properties in common. It would make more sense to choose
pairs of vertices that are close to each other, but did not
connect yet. A similar methodology is followed in this study
while selecting negative samples and we refer to them as hard
negatives while training the classifier.
Zignani et. al. introduced two novel temporal features in
context of social networks called link delay and triadic closure
delay that capture the time delay between when a link or triadic
closure is possible, and when they actually instantiate in the
trace [8]. They claim that over time or across traces, the values
of these features can provide insight on the speed at which
users act in building and extending their social neighborhoods.
The two features capture the “eagerness” of users in building
social structure. Inspired by its caliber to capture eagerness
information of the network, in this study we have implemented
the link delay metric.
Viswanath et. al. [7] introduced a very interesting aspect
of analyzing the activity network of users in Facebook net-
work. They observed that though initial studies have provided
insights on how an activity network is structurally different
from the social network itself, a natural and important aspect
of the activity network has been disregarded: the fact that over
time social links can grow stronger or weaker. By analyzing
the evolution of user interaction and the activity between users
in the social network they found that links in the activity
network tend to appear and disappear rapidly over time, and
the strength of ties exhibits a general decreasing trend of
activity as the social network link ages. Previous works by
Potgieter et. al. [3], [8] considered the activity network similar
to one as suggested by Viswanath et. al. [7], but they were used
in link prediction for activity networks. The past interactions
between online dating community members were used to
predict the new interactions among users [3] and it is important
to note that there is no concrete notion of tangible friendship
or permanent association. Similarly Zignani et. al. [8] used
the temporal information on wall posts of Facebook network
(same data used in this paper) to predict new posts. In other
words, given the past interactions of a node with its neighbors
they predict new future interactions with the other nodes.
In this paper, we predict new associations or friendships
going to emerge in next snapshot of the network given the past
interactions of a node with its neighbors and the interactions
among the neighbors. Here a friendship is a much stronger
notion than an interaction. In previous works, interactions are
equivalent to the strengthening of existing links. Thus in this
study we solve a more challenging and interesting problem
as we predict future friendships or potential links between
unconnected pairs of nodes by analyzing the network’s activity
till a given point in time.
IV. METHODS
In this work, we approached the link prediction problem
from a supervised machine learning perspective by training a
binary classifier also called a predictor. A pair of nodes (u,v)
in the graph G is labeled as a positive example if an edge
has emerged between nodes u and v in the current snapshot
of graph G. Otherwise, the node pair is a considered as a
negative example. The job of a classifier is to observe the
given labeled training examples (either positive or negative),
learn from them and then predict the labels of unseen test
examples, both positive and negative. In this section we present
the major steps of our method to solve the temporal link
prediction problem. In the first step, we discuss the set of
features extracted. In this we also explain the dichotomy of
baseline non-temporal (static features) and temporal features.
Next, we briefly discuss how the classifiers are trained on
the provided examples from the snapshot(s) of the network
and then how they are tested on unseen examples from the
subsequent snapshot.
A. Feature Extraction
The first step is to extract set of features for the nodes in
the network.
Static Baseline Features : These features do not capture
any temporal information of a node within the network. They
are based on the nodes’ static characteristics in the network.
These are presented in Huang et. al. [9].
• Common Neighbors Number (CN)- In a graph, the num-
ber of common neighbors of nodes x and y is given as
below and it measures the similarity of nodes:
CN(x, y) = |Γ(x) ∩ Γ(y)|
• Jaccard’s Coefficient (JC)- This measures the number of
neighbors of both x and y compared to the number of
nodes that are either neighbors of x or neighbors of y.
JC(x, y) =
|Γ(x) ∩ Γ(y)|
|Γ(x) ∪ Γ(y)|
• Adamic/Adar number (AA)- It computes features shared
by objects and measures the similarity between them as
follows:
AA(x, y) =
z Γ (x)∩Γ (y)
1
log|Γ(z)|
Here an object is a node and the features are its neighbors.
• Preferential Attachment score (PA)- Popularity of a node
is measured as the number of friends/neighbors it has.
Preferential Attachment score is measured by multiplying
the number of neighbors of both vertices as follows:
PA(x, y) = |Γ(x)|.|Γ(y)|
• Katz measure- Katz is a measure that sums weights of all
paths between two nodes exponentially damped by length
[14]. It is calculated as follows:
Katz(x, y) =
lmax=∞
lmin=1
βl
|pathl
x,y|
where |pathl
x,y|is the number of paths between x and
y with length l and 0 < β < 1. β is the parameter
which controls the exponential damping of sum of all
paths by their lengths so that shorter paths weigh heavily.
But calculating Katz measure is very expensive as it has
cubic complexity. Hence we calculate Friends-measure
14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 4
(FM), an approximation of Katz measure [4] for given
nodes x and y as the number of links between all the
neighbors of x and y as follows:
FM(x, y) =
u Γ (x) v Γ (y)
δ(u, v)
where δ(u, v) =
1 if u = v or (u, v) E
0 otherwise
Temporal Features: These features capture the temporal
information of the nodes within the network. We have im-
plemented recency of the nodes as the measure of when they
were last active [3]. With the help of wall post time-stamp
information we have know the snapshot of the network in
which the node has participated in the wall post interaction. We
also compute moving averages for last 2, 5 and 10 snapshots
that have been calculated for all static features. Apart from
recency and moving averages, we also calculated the Link
Delay (LD) as a measure of time delay between when a link
could have formed, and when it actually formed [8]. A link
could be formed as soon as both the nodes are present in the
network. It is important to note that, the presence of both nodes
is a required but not sufficient criteria for a link to be formed
between them.
Temporal Features Combinations: For the purpose of better
evaluation we have developed three temporal feature combi-
nations: (i) TempRec combination has only recency along with
above mentioned static features. (ii) TempRecMA2 combina-
tion contains recency feature and moving averages of static
features from past 2 snapshots along with static features. (iii)
AllTemp combination contains recency, moving averages of
baseline features from past 2, 5 and 10 snapshots of the
network along with all static features.
B. Model Learning
1) Training Samples: In order to train the classifier we
have to first collect the training examples. All the nodes
pairs which participated in edges in a given snapshot but
not present in previous snapshot are considered as positive
training examples. Assuming the link between a node pair
has not existed in previous snapshot but emerged in this
snapshot, then it is treated as positive training example. On
the other hand, if a pair of nodes does not form an edge then
it is a negative training example. Moreover, if the distance
between unconnected nodes is greater than two they form
an easy negative example. If it is equal to two they form
a hard negative example. Dealing with unclosed triangles
is a challenging task for a link predictor [4]. Although we
conducted experiments using both easy and hard negative
examples, in this paper we reported results only for hard
negative examples. Once we identify the pairs of nodes as
training examples, we calculate all the identified features for
them.
2) Classifier Training: We train the J48 decision tree classi-
fier with default options using the Weka Toolkit. J48 classifier
implicitly avoids the problem of over-fitting of the model to the
training data by technique called post-pruning. We also trained
a feed forward neural network with error back propagation
using ‘deep learning feature’ of big data analytics tool, H2O
[16].
V. DISCUSSION AND RESULTS
A. Data Set: Facebook New Orleans Network
In this work we have used anonymized Facebook data of
users from New Orleans. In total the data has 90,269 users and
3,646,662 friendship links between those users for a period
of 28 months from Sept 2006 to Dec 2008. This accounts
for 52% of the users in the New Orleans network based on
the statistics provided by Facebook [7]. This captures the
friendship information among the users. Besides this, the wall
posts information of 63,731 (70.6%) users has been collected
who were connected together by 1,545,686 directed links in
the social network with an average node degree of 25.6.
B. Evaluation Metrics
For the purpose of evaluating the classifier, the precision
and recall for the positive class (links forming) have been
used. Recall (True Positive Rate) is defined as ratio of total
true positives to sum of true positives and false negatives,
TP/ (TP+FN) and here it is given as number between 0
and 1. The numerator represents all the forming links we
correctly predicted. The denominator is the sum of these links,
as well as the links we did not predict. The aim of link
prediction is to maximize the recall, rather than the overall
success rate given by (TP+FN)/ (TP + TN + FP + FN).
However, we would like to minimize number of false positives.
So we also considered precision as a measure of classifier
performance. Precision is defined as ratio of true positives
to sum of true positives and false positives, TP/(TP+FP).
Thus less number of false positives results in better precision
values. Furthermore, we have also considered the area under
the receiver operating characteristic curve(AUC ROC) metric
as a performance measure for the classifiers, as it is a robust
measure which is not influenced by the imbalance in the
classes.
C. Experimental Protocol
Details of training examples: Table I shows details of the
data used for training and testing the classifiers. The J48
experiments were conducted on data from May 2007 to Mar
2008. For the Neural Net experiments, we used two test
snapshots, viz., Jan 2007 and Feb 2008.
Snapshot Time Period #Positive Instances #Negative Instances
17 Nov 2007 29971 39604
18 Dec 2007 25029 38071
19 Jan 2008 24704 39518
20 Feb 2008 28674 47066
21 Mar 2008 30014 53970
22 Apr 2008 30352 53587
Table I: Details of training examples collected for various time
periods (snapshots) of New Orleans Facebook network
14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 5
(a) Oct-Nov 2007 (b) Nov-Dec 2007
(c) Dec 2007-Jan 2008 (d) Jan-Feb 2008
Figure 2: Performance of J48 decision tree classifier on
baseline combination compared to various temporal feature
combinations
1) Results- J48 Decision Tree : Figure 2a to Figure 2d
shows the recall, precision and AUC ROC for positive class of
data (links forming in next snapshot). We compare the recall,
precision and AUC values of the baseline combination with
the three temporal feature combinations. The performance of
the classifier when trained on the data of two consecutive
months and tested on the data of the subsequent month is
recorded. For instance, Figure 2b shows the performance of
the classifier when it is trained on the all the links added for
Nov 2007 and Dec 2007 and tested on the links emerged in
Jan 2008. Our results show that in all the cases, using temporal
features always performs better in comparison to baseline
methods without temporal information. Moreover, for all the
time periods tested, the recall of the positive class was better
when trained with TempRec combination. Also the precision
values have significantly improved when trained with AllTemp
combination in all cases. The AUC ROC values for AllTemp
combination outperformed the Static (baseline) combination
in all cases.
(a) Nov-Dec 2007 (FF NN) (b) Dec 2007- Jan 2008 (FF NN)
Figure 3: Performance of Feed Forward Neural Network on
baseline combination compared to various temporal feature
combinations
2) Results- Feed Forward Neural Network: Results of Feed
Forward Neural Network agree unequivocally with that of
J48’s. The Feed Forward Neural Network is trained using same
data as that of J48 and its performance in terms of recall,
precision and AUC ROC values is measured. Methodology of
using the data is also same as explained before. Figure 3a
shows the recall, precision and AUC ROC values comparing
all four feature combinations when trained on Nov- Dec 2007
data and tested on data from Jan 2008.
Figure 3b shows the recall values comparing all the feature
combinations when trained on Dec 2007 and Jan 2008 data
and tested on Feb 2008’s data. For AUC ROC metric, the
TempRec combination outperformed the Static baseline com-
bination. For neural networks the reported results are for 26
epochs/iterations over the training data and the neural net has
only two hidden layers with 200 neurons each. Experiments
were also conducted with another design of neural network
with five hidden layers with 50 neurons in each layer. The
results were very similar to those presented above, so we
are not presenting them here. More experiments have to be
conducted with various other network designs to empirically
decide which design performs the best for the given data.
3) Training with one month’s data: We have also conducted
the above experiments by training the classifiers with one
month’s data and compared the performance against that of
training with two months data. Figure 4a shows the results for
J48 classifier for 11 month test periods (from May 2007 to Mar
2008) where the classifier was trained on previous month’s
data. The AUC values of baseline are compared against the
AUC values of the three temporal combinations. Figure 4b
shows the AUC ROC results of J48 decision trees for 10
month test periods (Jun 2007 to Mar 2008), but in this case
the classifier was trained on previous 2 months’ data.
(a) J48 decision tree trained with 1 month data.
(b) J48 decision tree trained with 2 months data.
Figure 4: Comparison of AUC ROC of temporal combinations
against baseline combination for J48 decision trees.
We have observed that the AllTemp combination was the
best performing in terms of AUC among all the temporal
combinations. Furthermore, in order to directly compare the 1
month training performance and 2 month training performance
we have calculated the percentage improvement in AUC when
14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 6
trained with AllTemp combination over Static combination, the
baseline.
Figure 5: Comparison of J48 performance when trained with
1 month data vs training with 2 months data
Figure 5 shows these comparison results for 10 month test
periods (Jun 2007 to Mar 2008). Interestingly, in some cases,
the classifier trained with one month data has performed better
than the classifier trained with two months’ data. In other
cases, the classifier trained with two month’s data was better
performing.
VI. CONCLUSION AND FUTURE WORK
In this paper, we have studied the link prediction problem
in evolving networks and trained decision trees and neural
networks to predict the new links being formed in the New
Orleans Facebook network. We found that incorporating tem-
poral information along with static features while training
the classifiers improves the prediction performance. In this
work, we analyzed the last time a user posted on the wall of
other user and captured that information as recency metric and
used it to predict- yet to emerge potential friendships among
neighboring users. We found that using temporal features
along with other static features improved the performance of
the trained classifier. This provides a strong motivation for us
to study more temporal features and investigate further how to
incorporate them in training the classifier to improve the link
prediction in evolving temporal networks. As part of future
work, we will study more effective temporal features similar
to recency and use them in the temporal feature combinations.
VII. ACKNOWLEDGMENTS
This work was partially supported by a grant from National
Science Foundation (NSF IIS 1447489). We thank Dr. Mislove
for generously providing the access to Facebook New Orleans
data.
VIII. REFERENCES
[1] David Liben-Nowell, Jon Kleinberg, The Link Prediction
Problem for Social Networks, Proceedings of the Twelfth
Annual ACM International Conference on Information and
Knowledge Management (CIKM’03), Nov 2003
[2] Jure Leskovec, Jon Kleinberg, Christos Faloutsos, Graph
Evolution: Densification and Shrinking Diameters, ACM
Transactions on knowledge discovery from data
[3] Anet Potgieter, Kurt A. April, Richard J.E. Cooke & Isaac
O. Osunmakinde, Temporality in Link Prediction: Understand-
ing Social Complexity, E:CO Issue Vol. 11 No. 1 2009 pp.
69-83
[4] Michael Fire, Lena Tenenboim- Chekina, Rami Puzis,
Ofrit Lesser, Lior Rokach, and Yuval Elovici, Computationally
Efficient Link Prediction in variety of Social Networks
[5] Tomasz Tylenda, Ralitsa Angelova, Srikanta Bedathur,
Towards Time-aware Link Prediction in Evolving Social Net-
works, In Proceedings of the 3rd Workshop on Social Network
Mining and Analysis, SNAKDD ’09
[6] A. K. Menon and C. Elkan. Link prediction via matrix
factorization, European conference on Machine learning and
knowledge discovery in databases- Volume Part II, ECML
PKDD’11, pages 437–452, Berlin, Heidelberg, 2011. Springer-
Verlag.
[7] Bimal Viswanath, Alan Mislove, Meeyoung Cha, Krishna
P. Gummadi, On the Evolution of User Interaction in Face-
book, WOSN’09, August 17, 2009, Barcelona, Spain.
[8] Matteo Zignani, Sabrina Gaito, Gian Paolo Rossi, Xiaohan
Zhao, Haitao Zheng, Ben Y. Zhao, Link and Triadic Clo-
sure Delay: Temporal Metrics for Social Network Dynamics,
Proceedings of the 8th International AAAI Conference on
Weblogs and Social Media
[9] Zan Huang, Xin Li, Hsinchun Chen, Link Prediction
Approach to Collaborative Filtering, JCDL’05, June 7–11,
2005, Denver, Colorado, USA
[10] Mohammad Al Hasan, Vineet Chaoji, Saeed Salem,
Mohammed Zaki, Link Prediction using Supervised Learning,
SIAM Workshop on Link Analysis with SIAM Data Mining
Conference, Bethesda, MD,2006
[11] Mohammad Al Hasan, and Mohammed J. Zaki, A Survey
of Link Prediction in Social Networks in Social Network Data
Analytics, pp 243-275, 2011, Springer
[12] Zan Huang, Dennis K. J. Lin, Time-Series Link Prediction
Problem with Applications in Communication Surveillance,
INFORMS Journal on Computing, Vol. 21, No. 2, Spring
2009, pp. 286–303
[13] Baojun Qiu, Qi He, John Yen, Evolution of Node Behavior
in Link Prediction, Proceedings of the Twenty-Fifth AAAI
Conference on Artificial Intelligence
[14] Katz, L. A new status index derived from sociometric
analysis. Psychometrika, 18, 1 (1953), 39-43.
[15] Richard F. Taflinger, Social Basis of Self-Preservation
from Social Basis of Human Behavior, 1996
[16] H2O: An open Source Big Data Analytics Platform,
http://0xdata.com/

Predicting_new_friendships_in_social_networks

  • 1.
    14TH INTERNATIONAL CONFERENCEON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 1 Predicting New Friendships In Social Networks Anvardh Nanduri Department of Computer Science George Mason University Fairfax, VA 22030 snanduri@gmu.edu Huzefa Rangwala Department of Computer Science George Mason University Fairfax, VA 22030 rangwala@cs.gmu.edu Abstract—Predicting new links in social networks is an im- portant problem within the network analysis literature. Many existing methods consider a single snapshot of the network as input, neglecting an important aspect of these networks; their temporal evolution over time. In this paper, we incorporate temporal information to solve the link prediction problem. Temporal information comes from the past interactions among the nodes in the network. Using a supervised learning framework, we train a binary classifier which predicts potential new links in the network in the next epoch. Anonymized Facebook data of New Orleans’ users over a period of 28 months has been used in this study. Specifically, we train decision tree classifiers and feed- forward neural networks to predict new relationships that are going to emerge in the underlying network. Our experiments clearly show that these models when trained with temporal information perform better compared to models trained with no temporal information. I. INTRODUCTION We live in extremely complex and interdependent societies, where people join together in groups for various reasons including mutual aid and protection [15]. The members of these networks interact with each other all the time, reinforcing their relationships and forming new relationships. Thus the connections among the members in the social networks evolve over time and understanding these mechanisms by which they evolve is a complex problem to solve due to a large number of variable parameters. One of the common challenges is to understand the association between two specific members also known as nodes in the network. In particular, the accurate prediction of new relationships is an important problem in this setting and is called the link prediction problem. The advent of social networks has significantly increased online interac- tions by fostering friendships (e.g., Facebook, Google Plus, Renren), by forming professional networks (e.g., Linkedin), and by finding potential partners (e.g., Match.com). Besides social networks, dynamic relational networks are ubiquitous in plethora of other scientific fields. Examples include the World Wide Web, biological interaction networks, citation networks, trade networks, transportation networks and ecological net- works to name a few. Recently, dynamic relational networks have emerged as a powerful representation mechanism for the data captured from these complex systems that evolve over time. The main source of complexity in dynamic relational networks can be attributed to the changes in the topological structure and/or in the information associated with nodes and edges. For a friendship network like Facebook, changes in its representative relational network are seen when individuals join or leave the network, i.e., addition and deletion of vertices, respectively. Subsequently, new friendships can be formed or broken and thus resulting in addition or deletion of edges in the network. Similarly, node-based attributes may change over time based on the personal information of an individual. The primary emphasis of this paper is to develop approaches for predicting new friendships (links) within a popular, large and evolving online social network with heterogeneous infor- maton; Facebook. In this work we study the temporal aspect of social network evolution in the link prediction task by combining various static features with time-sensitive features. We show that using temporal aspects of the social network improves the overall prediction performance of the learned classifier. In our experiments we have trained decision tree classifier J48 from WEKA data mining suite. We also trained feed forward neural network with 2 layers having 200 neurons per each layer in big data analytics tool called H2O [16]. The rest of the paper is organized as follows. In Section 2, we formally present the problem of link prediction in temporal networks with an example. In Section 3, we survey how past research has tried to address link prediction problem and other related variants of the problem. In Section 4, we present in detail the methods used and techniques applied in this work. In Section 5, we present the details of the Facebook data set used, the evaluation metrics used to gauge the performance of various methods and the experimental results. Then finally we present ideas for future extension. II. TEMPORAL LINK PREDICTION Given a snapshot of a network at time t, we seek to accurately predict the edges that will be added to the network during the interval from time t to a given future time t’ [1]. Given a snapshot of graph G (V, E) at given time t- link prediction involves accurately determining the possibility of a link between a pair of nodes (v, u) ε E’ in graph G’ (V’, E’) in future time t’. For example, in Figure 1, if graph on the left is the snapshot of a graph at time t with 5 nodes- A B C D E, we have to predict if a link forms between B and D in next snapshot at time t+1 based on the information at time t. Assuming a new node E was newly added to network
  • 2.
    14TH INTERNATIONAL CONFERENCEON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 2 at snapshot at time t, we also have to predict if E forms a link with B in next snapshot at time t+1. Figure 1: Evolution of links in Temporal Networks A. Definitions & Notations Network data is represented as graph G (V, E) where V is set of vertices and E is set of edges. Users or members are represented as nodes or vertices of the network and similarly relationships, connections or interactions are represented by the edges in the network. We will use small letters, like x, y and z to denote nodes in a social network. For a node x, Γ(x) represents the set of neighbors of x. Degree of x is the size of the Γ(x) denoted by |Γ(x)|. III. RELATED WORK Extensive research has been done pertaining to link predic- tion in temporal networks and its applications [10]. A. Non temporal Link Prediction Fire et. al. studied the problem of computationally efficient link prediction in social networks [4]. They observed that primary bottleneck in link prediction techniques is extraction of structural features required for training the classifier. They proposed two easy-to-compute features: (i) friends measure and (ii) same community measure and showed that they are equally effective in identifying missing/future links even when applied to a predicament of classifying links between individ- uals with at least one common friend. They also presented a method for calculating the amount of data needed in order to build more accurate classifiers. Hasan et. al. surveyed representative link prediction meth- ods [11] by considered three types of approaches: (i) Tradi- tional or non-Bayesian methods which extract set of features to train a binary classification model (ii) Probabilistic methods, which model the joint probability among the entities in a net- work by Bayesian graphical models and (iii) Linear Algebraic methods, which compute the similarity between nodes by rank- reduced similarity matrices. Menon et. al. proposed to solve the link prediction problem using a supervised matrix factorization approach [6], where the model learns latent features from the topological structure of a graph and is shown to make better predictions than popular unsupervised factorization techniques. They discussed how latent features could be combined with optional explicit features for nodes or edges to yield better performance than using either type of feature exclusively. Huang et. al. compared various static features for link prediction and presented the most useful features in terms of their prediction accuracy [9]. In this paper we have considered the combination of these features as our comparative baseline. B. Temporal Link Prediction Early studies conducted by Leskovec et. al. [2] state that little work has been done on analyzing long-term graph trends. Their study found that, over time, networks densify and the average distance between nodes decreases. This was contrary to the existing beliefs that the average node degree remains constant and average distance slowly increases. They claimed that existing graph generation models are not realistic and proposed a new forest-fire model which simulates the behaviors like shrinking diameters and densification. Potgieter et. al. used temporal analysis to aid link prediction [3]. They showed that incorporating the temporal trends that emerge in sociogram sequences greatly increase the accuracy of link prediction. They introduced three types of temporal fea- tures, the first two of which are well studied in the economics literature. Return as the percentage increase or decrease of a value over a period of time shows the rate of change of a given metric. Moving averages are used to extract long term trends from short-term noise. Recency captures how much time has elapsed since a node has communicated. Recency and moving averages seemed very promising and we have implemented them in this study as part of the temporal features. Qiu et. al. studied the evolution of node behavior in link prediction [13]. They found that evolution of node behavior is useful in modeling social networks, extracted temporal features to characterize behavior evolution, and used these temporal features to improve link prediction. They considered four types of temporal features: (i) Simple Statistics features such as recency and activeness, (ii) Local Pattern features which calculate the frequency of a repetitive local pattern, (iii) Prediction features that captured the global trends and (iv) Interplay features that computed the joint likelihood of two nodes to be connected in the future based on to which degree the two nodes match each other’s preference. Tylenda et. al. [5] investigated the value of incorporating historical information available on the interactions (or links) of the current social network state and showed that using time-stamps of past interactions significantly improved the prediction accuracy of new and recurrent links over rather so- phisticated methods proposed. Another interesting contribution of this work is in introducing the dichotomy of edge-centric and node-centric methodology in solving link prediction prob- lem. They showed that edge-centric approaches have some drawbacks, as they assume that users are interested in new edges (relationships) irrespective of the vertices they connect to and added that this may lead to situations where the top of the list of edges is occupied by mutually uninteresting pairs of vertices. They pointed out that edge-centric link prediction often boils down to deciding whether two nodes have anything in common – irrespective of the distance between them in the social network. However, in reality, most links are established between nodes that are close in terms of the graph distance between them and similarity of attributes. When the link prediction is treated as a binary classification, the following problem occurs in the selection of training and testing data points. The negative points are random pairs of vertices. Therefore, it is highly unlikely that they have any
  • 3.
    14TH INTERNATIONAL CONFERENCEON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 3 properties in common. It would make more sense to choose pairs of vertices that are close to each other, but did not connect yet. A similar methodology is followed in this study while selecting negative samples and we refer to them as hard negatives while training the classifier. Zignani et. al. introduced two novel temporal features in context of social networks called link delay and triadic closure delay that capture the time delay between when a link or triadic closure is possible, and when they actually instantiate in the trace [8]. They claim that over time or across traces, the values of these features can provide insight on the speed at which users act in building and extending their social neighborhoods. The two features capture the “eagerness” of users in building social structure. Inspired by its caliber to capture eagerness information of the network, in this study we have implemented the link delay metric. Viswanath et. al. [7] introduced a very interesting aspect of analyzing the activity network of users in Facebook net- work. They observed that though initial studies have provided insights on how an activity network is structurally different from the social network itself, a natural and important aspect of the activity network has been disregarded: the fact that over time social links can grow stronger or weaker. By analyzing the evolution of user interaction and the activity between users in the social network they found that links in the activity network tend to appear and disappear rapidly over time, and the strength of ties exhibits a general decreasing trend of activity as the social network link ages. Previous works by Potgieter et. al. [3], [8] considered the activity network similar to one as suggested by Viswanath et. al. [7], but they were used in link prediction for activity networks. The past interactions between online dating community members were used to predict the new interactions among users [3] and it is important to note that there is no concrete notion of tangible friendship or permanent association. Similarly Zignani et. al. [8] used the temporal information on wall posts of Facebook network (same data used in this paper) to predict new posts. In other words, given the past interactions of a node with its neighbors they predict new future interactions with the other nodes. In this paper, we predict new associations or friendships going to emerge in next snapshot of the network given the past interactions of a node with its neighbors and the interactions among the neighbors. Here a friendship is a much stronger notion than an interaction. In previous works, interactions are equivalent to the strengthening of existing links. Thus in this study we solve a more challenging and interesting problem as we predict future friendships or potential links between unconnected pairs of nodes by analyzing the network’s activity till a given point in time. IV. METHODS In this work, we approached the link prediction problem from a supervised machine learning perspective by training a binary classifier also called a predictor. A pair of nodes (u,v) in the graph G is labeled as a positive example if an edge has emerged between nodes u and v in the current snapshot of graph G. Otherwise, the node pair is a considered as a negative example. The job of a classifier is to observe the given labeled training examples (either positive or negative), learn from them and then predict the labels of unseen test examples, both positive and negative. In this section we present the major steps of our method to solve the temporal link prediction problem. In the first step, we discuss the set of features extracted. In this we also explain the dichotomy of baseline non-temporal (static features) and temporal features. Next, we briefly discuss how the classifiers are trained on the provided examples from the snapshot(s) of the network and then how they are tested on unseen examples from the subsequent snapshot. A. Feature Extraction The first step is to extract set of features for the nodes in the network. Static Baseline Features : These features do not capture any temporal information of a node within the network. They are based on the nodes’ static characteristics in the network. These are presented in Huang et. al. [9]. • Common Neighbors Number (CN)- In a graph, the num- ber of common neighbors of nodes x and y is given as below and it measures the similarity of nodes: CN(x, y) = |Γ(x) ∩ Γ(y)| • Jaccard’s Coefficient (JC)- This measures the number of neighbors of both x and y compared to the number of nodes that are either neighbors of x or neighbors of y. JC(x, y) = |Γ(x) ∩ Γ(y)| |Γ(x) ∪ Γ(y)| • Adamic/Adar number (AA)- It computes features shared by objects and measures the similarity between them as follows: AA(x, y) = z Γ (x)∩Γ (y) 1 log|Γ(z)| Here an object is a node and the features are its neighbors. • Preferential Attachment score (PA)- Popularity of a node is measured as the number of friends/neighbors it has. Preferential Attachment score is measured by multiplying the number of neighbors of both vertices as follows: PA(x, y) = |Γ(x)|.|Γ(y)| • Katz measure- Katz is a measure that sums weights of all paths between two nodes exponentially damped by length [14]. It is calculated as follows: Katz(x, y) = lmax=∞ lmin=1 βl |pathl x,y| where |pathl x,y|is the number of paths between x and y with length l and 0 < β < 1. β is the parameter which controls the exponential damping of sum of all paths by their lengths so that shorter paths weigh heavily. But calculating Katz measure is very expensive as it has cubic complexity. Hence we calculate Friends-measure
  • 4.
    14TH INTERNATIONAL CONFERENCEON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 4 (FM), an approximation of Katz measure [4] for given nodes x and y as the number of links between all the neighbors of x and y as follows: FM(x, y) = u Γ (x) v Γ (y) δ(u, v) where δ(u, v) = 1 if u = v or (u, v) E 0 otherwise Temporal Features: These features capture the temporal information of the nodes within the network. We have im- plemented recency of the nodes as the measure of when they were last active [3]. With the help of wall post time-stamp information we have know the snapshot of the network in which the node has participated in the wall post interaction. We also compute moving averages for last 2, 5 and 10 snapshots that have been calculated for all static features. Apart from recency and moving averages, we also calculated the Link Delay (LD) as a measure of time delay between when a link could have formed, and when it actually formed [8]. A link could be formed as soon as both the nodes are present in the network. It is important to note that, the presence of both nodes is a required but not sufficient criteria for a link to be formed between them. Temporal Features Combinations: For the purpose of better evaluation we have developed three temporal feature combi- nations: (i) TempRec combination has only recency along with above mentioned static features. (ii) TempRecMA2 combina- tion contains recency feature and moving averages of static features from past 2 snapshots along with static features. (iii) AllTemp combination contains recency, moving averages of baseline features from past 2, 5 and 10 snapshots of the network along with all static features. B. Model Learning 1) Training Samples: In order to train the classifier we have to first collect the training examples. All the nodes pairs which participated in edges in a given snapshot but not present in previous snapshot are considered as positive training examples. Assuming the link between a node pair has not existed in previous snapshot but emerged in this snapshot, then it is treated as positive training example. On the other hand, if a pair of nodes does not form an edge then it is a negative training example. Moreover, if the distance between unconnected nodes is greater than two they form an easy negative example. If it is equal to two they form a hard negative example. Dealing with unclosed triangles is a challenging task for a link predictor [4]. Although we conducted experiments using both easy and hard negative examples, in this paper we reported results only for hard negative examples. Once we identify the pairs of nodes as training examples, we calculate all the identified features for them. 2) Classifier Training: We train the J48 decision tree classi- fier with default options using the Weka Toolkit. J48 classifier implicitly avoids the problem of over-fitting of the model to the training data by technique called post-pruning. We also trained a feed forward neural network with error back propagation using ‘deep learning feature’ of big data analytics tool, H2O [16]. V. DISCUSSION AND RESULTS A. Data Set: Facebook New Orleans Network In this work we have used anonymized Facebook data of users from New Orleans. In total the data has 90,269 users and 3,646,662 friendship links between those users for a period of 28 months from Sept 2006 to Dec 2008. This accounts for 52% of the users in the New Orleans network based on the statistics provided by Facebook [7]. This captures the friendship information among the users. Besides this, the wall posts information of 63,731 (70.6%) users has been collected who were connected together by 1,545,686 directed links in the social network with an average node degree of 25.6. B. Evaluation Metrics For the purpose of evaluating the classifier, the precision and recall for the positive class (links forming) have been used. Recall (True Positive Rate) is defined as ratio of total true positives to sum of true positives and false negatives, TP/ (TP+FN) and here it is given as number between 0 and 1. The numerator represents all the forming links we correctly predicted. The denominator is the sum of these links, as well as the links we did not predict. The aim of link prediction is to maximize the recall, rather than the overall success rate given by (TP+FN)/ (TP + TN + FP + FN). However, we would like to minimize number of false positives. So we also considered precision as a measure of classifier performance. Precision is defined as ratio of true positives to sum of true positives and false positives, TP/(TP+FP). Thus less number of false positives results in better precision values. Furthermore, we have also considered the area under the receiver operating characteristic curve(AUC ROC) metric as a performance measure for the classifiers, as it is a robust measure which is not influenced by the imbalance in the classes. C. Experimental Protocol Details of training examples: Table I shows details of the data used for training and testing the classifiers. The J48 experiments were conducted on data from May 2007 to Mar 2008. For the Neural Net experiments, we used two test snapshots, viz., Jan 2007 and Feb 2008. Snapshot Time Period #Positive Instances #Negative Instances 17 Nov 2007 29971 39604 18 Dec 2007 25029 38071 19 Jan 2008 24704 39518 20 Feb 2008 28674 47066 21 Mar 2008 30014 53970 22 Apr 2008 30352 53587 Table I: Details of training examples collected for various time periods (snapshots) of New Orleans Facebook network
  • 5.
    14TH INTERNATIONAL CONFERENCEON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 5 (a) Oct-Nov 2007 (b) Nov-Dec 2007 (c) Dec 2007-Jan 2008 (d) Jan-Feb 2008 Figure 2: Performance of J48 decision tree classifier on baseline combination compared to various temporal feature combinations 1) Results- J48 Decision Tree : Figure 2a to Figure 2d shows the recall, precision and AUC ROC for positive class of data (links forming in next snapshot). We compare the recall, precision and AUC values of the baseline combination with the three temporal feature combinations. The performance of the classifier when trained on the data of two consecutive months and tested on the data of the subsequent month is recorded. For instance, Figure 2b shows the performance of the classifier when it is trained on the all the links added for Nov 2007 and Dec 2007 and tested on the links emerged in Jan 2008. Our results show that in all the cases, using temporal features always performs better in comparison to baseline methods without temporal information. Moreover, for all the time periods tested, the recall of the positive class was better when trained with TempRec combination. Also the precision values have significantly improved when trained with AllTemp combination in all cases. The AUC ROC values for AllTemp combination outperformed the Static (baseline) combination in all cases. (a) Nov-Dec 2007 (FF NN) (b) Dec 2007- Jan 2008 (FF NN) Figure 3: Performance of Feed Forward Neural Network on baseline combination compared to various temporal feature combinations 2) Results- Feed Forward Neural Network: Results of Feed Forward Neural Network agree unequivocally with that of J48’s. The Feed Forward Neural Network is trained using same data as that of J48 and its performance in terms of recall, precision and AUC ROC values is measured. Methodology of using the data is also same as explained before. Figure 3a shows the recall, precision and AUC ROC values comparing all four feature combinations when trained on Nov- Dec 2007 data and tested on data from Jan 2008. Figure 3b shows the recall values comparing all the feature combinations when trained on Dec 2007 and Jan 2008 data and tested on Feb 2008’s data. For AUC ROC metric, the TempRec combination outperformed the Static baseline com- bination. For neural networks the reported results are for 26 epochs/iterations over the training data and the neural net has only two hidden layers with 200 neurons each. Experiments were also conducted with another design of neural network with five hidden layers with 50 neurons in each layer. The results were very similar to those presented above, so we are not presenting them here. More experiments have to be conducted with various other network designs to empirically decide which design performs the best for the given data. 3) Training with one month’s data: We have also conducted the above experiments by training the classifiers with one month’s data and compared the performance against that of training with two months data. Figure 4a shows the results for J48 classifier for 11 month test periods (from May 2007 to Mar 2008) where the classifier was trained on previous month’s data. The AUC values of baseline are compared against the AUC values of the three temporal combinations. Figure 4b shows the AUC ROC results of J48 decision trees for 10 month test periods (Jun 2007 to Mar 2008), but in this case the classifier was trained on previous 2 months’ data. (a) J48 decision tree trained with 1 month data. (b) J48 decision tree trained with 2 months data. Figure 4: Comparison of AUC ROC of temporal combinations against baseline combination for J48 decision trees. We have observed that the AllTemp combination was the best performing in terms of AUC among all the temporal combinations. Furthermore, in order to directly compare the 1 month training performance and 2 month training performance we have calculated the percentage improvement in AUC when
  • 6.
    14TH INTERNATIONAL CONFERENCEON MACHINE LEARNING AND APPLICATIONS (IEEE ICMLA’15) 6 trained with AllTemp combination over Static combination, the baseline. Figure 5: Comparison of J48 performance when trained with 1 month data vs training with 2 months data Figure 5 shows these comparison results for 10 month test periods (Jun 2007 to Mar 2008). Interestingly, in some cases, the classifier trained with one month data has performed better than the classifier trained with two months’ data. In other cases, the classifier trained with two month’s data was better performing. VI. CONCLUSION AND FUTURE WORK In this paper, we have studied the link prediction problem in evolving networks and trained decision trees and neural networks to predict the new links being formed in the New Orleans Facebook network. We found that incorporating tem- poral information along with static features while training the classifiers improves the prediction performance. In this work, we analyzed the last time a user posted on the wall of other user and captured that information as recency metric and used it to predict- yet to emerge potential friendships among neighboring users. We found that using temporal features along with other static features improved the performance of the trained classifier. This provides a strong motivation for us to study more temporal features and investigate further how to incorporate them in training the classifier to improve the link prediction in evolving temporal networks. As part of future work, we will study more effective temporal features similar to recency and use them in the temporal feature combinations. VII. ACKNOWLEDGMENTS This work was partially supported by a grant from National Science Foundation (NSF IIS 1447489). We thank Dr. Mislove for generously providing the access to Facebook New Orleans data. VIII. REFERENCES [1] David Liben-Nowell, Jon Kleinberg, The Link Prediction Problem for Social Networks, Proceedings of the Twelfth Annual ACM International Conference on Information and Knowledge Management (CIKM’03), Nov 2003 [2] Jure Leskovec, Jon Kleinberg, Christos Faloutsos, Graph Evolution: Densification and Shrinking Diameters, ACM Transactions on knowledge discovery from data [3] Anet Potgieter, Kurt A. April, Richard J.E. Cooke & Isaac O. Osunmakinde, Temporality in Link Prediction: Understand- ing Social Complexity, E:CO Issue Vol. 11 No. 1 2009 pp. 69-83 [4] Michael Fire, Lena Tenenboim- Chekina, Rami Puzis, Ofrit Lesser, Lior Rokach, and Yuval Elovici, Computationally Efficient Link Prediction in variety of Social Networks [5] Tomasz Tylenda, Ralitsa Angelova, Srikanta Bedathur, Towards Time-aware Link Prediction in Evolving Social Net- works, In Proceedings of the 3rd Workshop on Social Network Mining and Analysis, SNAKDD ’09 [6] A. K. Menon and C. Elkan. Link prediction via matrix factorization, European conference on Machine learning and knowledge discovery in databases- Volume Part II, ECML PKDD’11, pages 437–452, Berlin, Heidelberg, 2011. Springer- Verlag. [7] Bimal Viswanath, Alan Mislove, Meeyoung Cha, Krishna P. Gummadi, On the Evolution of User Interaction in Face- book, WOSN’09, August 17, 2009, Barcelona, Spain. [8] Matteo Zignani, Sabrina Gaito, Gian Paolo Rossi, Xiaohan Zhao, Haitao Zheng, Ben Y. Zhao, Link and Triadic Clo- sure Delay: Temporal Metrics for Social Network Dynamics, Proceedings of the 8th International AAAI Conference on Weblogs and Social Media [9] Zan Huang, Xin Li, Hsinchun Chen, Link Prediction Approach to Collaborative Filtering, JCDL’05, June 7–11, 2005, Denver, Colorado, USA [10] Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, Mohammed Zaki, Link Prediction using Supervised Learning, SIAM Workshop on Link Analysis with SIAM Data Mining Conference, Bethesda, MD,2006 [11] Mohammad Al Hasan, and Mohammed J. Zaki, A Survey of Link Prediction in Social Networks in Social Network Data Analytics, pp 243-275, 2011, Springer [12] Zan Huang, Dennis K. J. Lin, Time-Series Link Prediction Problem with Applications in Communication Surveillance, INFORMS Journal on Computing, Vol. 21, No. 2, Spring 2009, pp. 286–303 [13] Baojun Qiu, Qi He, John Yen, Evolution of Node Behavior in Link Prediction, Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence [14] Katz, L. A new status index derived from sociometric analysis. Psychometrika, 18, 1 (1953), 39-43. [15] Richard F. Taflinger, Social Basis of Self-Preservation from Social Basis of Human Behavior, 1996 [16] H2O: An open Source Big Data Analytics Platform, http://0xdata.com/