Introduction Link Prediction and Metrics Results Conclusions 
Supervised Learning Link Recommendation in the 
DBLP co-authoring network 
Gabriel P Gimenes, Hugo Gualdron, Thiago R Raddo, Jose F 
Rodrigues Jr 
Instituto de Ci^encias Matematicas e de Computac~ao 
Universidade de S~ao Paulo 
Av. Trabalhador S~ao-carlense, 400-Centro, S~ao Carlos, SP, Brasil 
Click for paper: 
http://www.icmc.usp.br/pessoas/junio/PublishedPapers/Gimenes_et_al_IEEE-PerCom-SCI2014.pdf 
This work has
nanctial support from FAPESP (2013/10026-7 2011/13724-1) 
1 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Summary 
1 Introduction 
2 Link Prediction and Metrics 
3 Results 
4 Conclusions 
2 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Context 
Advances in the WWW led to improved mechanisms for users 
to interact 
Data became abundant in several scenarios 
social networks, co-authoring networks, recommender systems, 
communication networks 
Need for tools that can assist in the decision making process 
Most of the networks produced on our daily lives are dynamic 
- Link Recommendation 
3 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Objectives 
Analysis of the Link Recommendation task on a co-authoring 
network - DBLP 
Comparison between the most used algorithms in supervised 
learning using performance metrics (AUC, F-measure, 
Precision e Recall) 
Including the use of meta-classi
ers such as Bagging and 
Random Forest 
Detailed study of the parameters involved on the technique - 
Core(k) and the intervals 
4 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Link Prediction and Metrics 
1 Introduction 
2 Link Prediction and Metrics 
3 Results 
4 Conclusions 
5 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Problem De
nition 
It is possible to model a co-authoring network as a graph, 
nodes represent individuals and edges indicate a collaboration 
between them 
The idea is to predict/recommend new edges using only past 
and present informations about the network using supervised 
learning techniques 
6 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Problem De
nition 
Applications exist in dierent domains such as: 
Forecasting suspect behavior on social networks, terrorism, for 
example 
Identifying interactions that would need intense 
experimentation in biology 
Suggesting new collaborations/interactions to individuals on 
co-authoring networks 
7 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Problem De
nition 
Given a snapshot of a network on time t, we are interested in 
the edges that most likely should/could exist in t', where 
t  t0. 
Training a supervised classi
er using topological features 
extracted from the network to be able to analyze its dynamics 
8 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Problem De
nition 
9 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Core 
Core(k) is the subset of nodes of interest 
Nodes that have at least k edges on training and test intervals 
are considered to be in Core(k), the other nodes are not used 
10 / 22
Introduction Link Prediction and Metrics Results Conclusions 
Topological Features 
Metric Equation 
Common Neighbours CN(x; y) = j(x)  (y)j 
Jaccard Coe
cient JC(x; y) = j(x)(y)j 
j(x)[(y)j 
Preferential Attachment PA(x; y) = j(x)j  j(y)j 
Adamic-Adar Coe
cient AA(x; y) = 
P 
z2(x)(y) 
1 
logj(z)j 
Geodesic Distance shortest path between x and y 
Resource Allocation Index RA(x; y) = 
P 
z2(x)(y) 
1 
j(z)j 
Local Paths LP(x; y) =

Supervised-Learning Link Recommendation in the DBLP co-authoring network

  • 1.
    Introduction Link Predictionand Metrics Results Conclusions Supervised Learning Link Recommendation in the DBLP co-authoring network Gabriel P Gimenes, Hugo Gualdron, Thiago R Raddo, Jose F Rodrigues Jr Instituto de Ci^encias Matematicas e de Computac~ao Universidade de S~ao Paulo Av. Trabalhador S~ao-carlense, 400-Centro, S~ao Carlos, SP, Brasil Click for paper: http://www.icmc.usp.br/pessoas/junio/PublishedPapers/Gimenes_et_al_IEEE-PerCom-SCI2014.pdf This work has
  • 2.
    nanctial support fromFAPESP (2013/10026-7 2011/13724-1) 1 / 22
  • 3.
    Introduction Link Predictionand Metrics Results Conclusions Summary 1 Introduction 2 Link Prediction and Metrics 3 Results 4 Conclusions 2 / 22
  • 4.
    Introduction Link Predictionand Metrics Results Conclusions Context Advances in the WWW led to improved mechanisms for users to interact Data became abundant in several scenarios social networks, co-authoring networks, recommender systems, communication networks Need for tools that can assist in the decision making process Most of the networks produced on our daily lives are dynamic - Link Recommendation 3 / 22
  • 5.
    Introduction Link Predictionand Metrics Results Conclusions Objectives Analysis of the Link Recommendation task on a co-authoring network - DBLP Comparison between the most used algorithms in supervised learning using performance metrics (AUC, F-measure, Precision e Recall) Including the use of meta-classi
  • 6.
    ers such asBagging and Random Forest Detailed study of the parameters involved on the technique - Core(k) and the intervals 4 / 22
  • 7.
    Introduction Link Predictionand Metrics Results Conclusions Link Prediction and Metrics 1 Introduction 2 Link Prediction and Metrics 3 Results 4 Conclusions 5 / 22
  • 8.
    Introduction Link Predictionand Metrics Results Conclusions Problem De
  • 9.
    nition It ispossible to model a co-authoring network as a graph, nodes represent individuals and edges indicate a collaboration between them The idea is to predict/recommend new edges using only past and present informations about the network using supervised learning techniques 6 / 22
  • 10.
    Introduction Link Predictionand Metrics Results Conclusions Problem De
  • 11.
    nition Applications existin dierent domains such as: Forecasting suspect behavior on social networks, terrorism, for example Identifying interactions that would need intense experimentation in biology Suggesting new collaborations/interactions to individuals on co-authoring networks 7 / 22
  • 12.
    Introduction Link Predictionand Metrics Results Conclusions Problem De
  • 13.
    nition Given asnapshot of a network on time t, we are interested in the edges that most likely should/could exist in t', where t t0. Training a supervised classi
  • 14.
    er using topologicalfeatures extracted from the network to be able to analyze its dynamics 8 / 22
  • 15.
    Introduction Link Predictionand Metrics Results Conclusions Problem De
  • 16.
  • 17.
    Introduction Link Predictionand Metrics Results Conclusions Core Core(k) is the subset of nodes of interest Nodes that have at least k edges on training and test intervals are considered to be in Core(k), the other nodes are not used 10 / 22
  • 18.
    Introduction Link Predictionand Metrics Results Conclusions Topological Features Metric Equation Common Neighbours CN(x; y) = j(x) (y)j Jaccard Coe
  • 19.
    cient JC(x; y)= j(x)(y)j j(x)[(y)j Preferential Attachment PA(x; y) = j(x)j j(y)j Adamic-Adar Coe
  • 20.
    cient AA(x; y)= P z2(x)(y) 1 logj(z)j Geodesic Distance shortest path between x and y Resource Allocation Index RA(x; y) = P z2(x)(y) 1 j(z)j Local Paths LP(x; y) =