SlideShare a Scribd company logo
1 of 32
Download to read offline
Link Prediction
Class Data Mining Technology for Business and Society
Program M. Sc. Data Science
University Sapienza University of Rome
Semester Spring 2016
Lecturer Carlos Castillo http://chato.cl/
Sources:
● Chapter 10 of Zafarani, Abbasi, and Liu's book on Social Media
Mining [slides]
● Sarkar, Chakrabarti, Moore: [slides] [slides]
Problem definition
● Given a graph G=(V,E) at time t
– Or a series of snapshots of the graph at times ti<=t
● Describe the state of the graph at time t'>t
● Sometimes, assume V stays the same and E
changes
Applications
Accelerating formation
of connections in
professional social
networks
Applications
● Helping find your offline friends online
Applications
● Increase server efficiency through pre-fetching
● Determining which links are missing in Wikipedia
pages (or other educational resources)
● Monitor/control propagation of computer viruses
● Fixing corrupted data
– You bought five books, one of the titles is lost, can
we infer it?
● ...
Basic method
1)Assign a score to every possible link (u,v)
2)Sort links by descending score
3)Predict the top-k links
Or the links with scores above a threshold
4)Profit!
Common neighbors
● Newman 2001: The probability of scientists
collaborating increases with the number of
other collaborators they have in common.
● Tendency to close triangles, more on this
later ...
Jaccard similarity
● Correct common neighbors by reducing the
influence of nodes with many neighbors
Adamic/Adar
● Count common neighbors but weight down
nodes with too many neighbors
The idea is to avoid this
Understanding the Adamic/Adar
heuristic
8 followers
1000
followers
Prolific common friends
Weaker evidence
Less prolific
Stronger evidence
Alice
Bob
Charlie
Preferential attachment
● “Rich-get-richer”
● Newman 2001: the probability of two authors
collaborating is proportional to the product of
their number of collaborators
Example: score(v5
,v7
)
Exercise, compute:
● Number of common neighbors
● Jaccard coefficient
● Adamic and Adar's
● Preferential attachment
Geodesic/shortest path distance
● Assumption: social connections are formed by
following edges, then finding a new person,
then connecting directly
score(u,v) := -(length of shortest path from u to v)
● Limit case: triadic closure
Katz 1953 or “rooted PageRank”
● Score based on weighted counts of paths, with
exponential decay on path length. For α < 1
● A small α yields predictions which are similar to
common neighbors
More on random walks
● Hitting time
● Hu,v = expected steps of random walk from u to v
● To reduce the influence of well-connected nodes, we can
multiply by the probability of a node in stationary state
Symmetric hitting time
(commute time)
● Hitting time is not symmetric, we can
symmetrize easily
Graph projections
SimRank [Jeh 2002]
● For directed graphs; follows inlinks
u
v
w
p
q s
r
Exercise, compute:
● simrank(u,v)
● simrank(v,w)
● simrank(u,w)
Meta-method / prunning
● Compute score(u,v) for all existing edges
assuming they do not exist
● Remove k% with lower score
● Compute score(u,v) in the reduced graph
Evaluating link prediction methods
● After one of the aforementioned measures is selected, a list of
the top most similar pairs of nodes are selected.
● These pairs of nodes denote edges predicted to be the most
likely to soon appear in the network.
● Performance (precision, recall, or accuracy) can be evaluated
using the testing graph and by comparing the number of the
testing graph’s edges that the link prediction algorithm
successfully reveals.
● Performance is usually very low, since many edges are created
due to reasons not solely available in a social network graph.
● So, a common baseline is to compare the performance with
random edge predictors and report the factor improvements
over random prediction.
Performance comparison
[Liben-Nowell et al. 2003]
Notes:
Effectiveness in general is very
low (challenging problem)
Adamic/Adar + content
Supervised learning
[Hassan et al. 2006]
● Input features are all attributes, possibly
including node-links as attribute
● Predict connected/not-connected learning on a
sub-set of the data
Example experimental results with
supervised learning
● Data: co-authorship network in DBLP and
BIOBASE
● Split into two disjoint ranges of publication years
(Ra, Rb)
– Example: DBLP, Ra = [1999,2000] Rb=[2001,2004]
● Training item is a pair of authors (u,v), both with a
paper in Ra, and all their attributes computed in Ra
● Ground truth is whether (u,v) co-author during Rb
– Positive=yes, Negative=no
Example features
● Content similarity
– Keywords in common, conferences in common, ...
● Aggregation features
– Sum of papers, Sum of neighbors, ...
● Topological distance
– Shortest-path length, ...
Performance results under various
learning schemes (same feature set)
Community prediction
Community membership prediction
● Why do users join communities?
– What factors affect the community-joining behavior of individuals?
●
We can observe users who join communities and determine the
factors that are common among them
●
We require a population of users, a community C, and community
membership information (i.e., users who are members of C).
– To distinguish between users who have already joined the community
and those who are now joining it, we need community memberships at
two different times t1 and t2, with t2 > t1.
– At t2, we determine users such as u who are currently members of the
community, but were not members at t1. These new users form the
subpopulation that is analyzed for community-joining behavior.
Peer influence
Hypothesis: individuals are
inclined toward an activity when
their friends are engaged in the
same activity.
A factor that plays a role in users
joining a community is the
number of their friends who are
already members of the
community.
Supervised learning
Example regression tree
Beyond community membership
● Communities can be implicit: One can think of individuals
buying a product as a community, and people buying the
product for the first time as individuals joining the community
● Collective Behavior: A group of individuals behaving in a
similar way (first defined by sociologist Robert Park)
● It can be planned and coordinated, but often is spontaneous
and unplanned
● Examples:
– Individuals standing in line for a new product release
– Posting messages online to support a cause or to show
support for an individual
● Approach can be similar to community membership prediction

More Related Content

What's hot

Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred toolRaf Guns
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisSujoy Bag
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
Social network analysis part ii
Social network analysis part iiSocial network analysis part ii
Social network analysis part iiTHomas Plotkowiak
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis Dragan Gasevic
 
Community detection
Community detectionCommunity detection
Community detectionScott Pauls
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
Network embedding
Network embeddingNetwork embedding
Network embeddingSOYEON KIM
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringSOYEON KIM
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for GraphsDeepLearningBlr
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011guillaume ereteo
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewSatyaki Sikdar
 
Social network analysis
Social network analysisSocial network analysis
Social network analysisCaleb Jones
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithmsAlireza Andalib
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Thang Nguyen
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?IAMAl
 

What's hot (20)

Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred tool
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
Deepwalk vs Node2vec
Deepwalk vs Node2vecDeepwalk vs Node2vec
Deepwalk vs Node2vec
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Social network analysis part ii
Social network analysis part iiSocial network analysis part ii
Social network analysis part ii
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis
 
Community detection
Community detectionCommunity detection
Community detection
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
 

Similar to Link prediction

cs224w-79-final
cs224w-79-finalcs224w-79-final
cs224w-79-finalDarren Koh
 
LEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYLEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYIJITE
 
Organizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsOrganizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsSam Shah
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaCharalampos Chelmis
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfBalasundaramSr
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social NetworksIJCSIS Research Publications
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018Arsalan Khan
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsNeo4j
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...IOSR Journals
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksAnvardh Nanduri
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET Journal
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Sc Huang
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...Waqas Nawaz
 
Graph Neural Networks for Social Recommendation.pptx
Graph Neural Networks for Social Recommendation.pptxGraph Neural Networks for Social Recommendation.pptx
Graph Neural Networks for Social Recommendation.pptxssuser2624f71
 
A new approach to erd s collaboration network using page rank
A new approach to erd s collaboration network using page rankA new approach to erd s collaboration network using page rank
A new approach to erd s collaboration network using page rankAlexander Decker
 
A new approach to erd s collaboration network using page rank
A new approach to erd s collaboration network using page rankA new approach to erd s collaboration network using page rank
A new approach to erd s collaboration network using page rankAlexander Decker
 
Community DetectionSlide
Community DetectionSlideCommunity DetectionSlide
Community DetectionSlideAshwini Tokekar
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 

Similar to Link prediction (20)

Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
cs224w-79-final
cs224w-79-finalcs224w-79-final
cs224w-79-final
 
LEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYLEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEY
 
Organizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsOrganizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its Applications
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social Networks
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
 
Slides ecir2016
Slides ecir2016Slides ecir2016
Slides ecir2016
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
 
Graph Neural Networks for Social Recommendation.pptx
Graph Neural Networks for Social Recommendation.pptxGraph Neural Networks for Social Recommendation.pptx
Graph Neural Networks for Social Recommendation.pptx
 
A new approach to erd s collaboration network using page rank
A new approach to erd s collaboration network using page rankA new approach to erd s collaboration network using page rank
A new approach to erd s collaboration network using page rank
 
A new approach to erd s collaboration network using page rank
A new approach to erd s collaboration network using page rankA new approach to erd s collaboration network using page rank
A new approach to erd s collaboration network using page rank
 
Community DetectionSlide
Community DetectionSlideCommunity DetectionSlide
Community DetectionSlide
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Link prediction

  • 1. Link Prediction Class Data Mining Technology for Business and Society Program M. Sc. Data Science University Sapienza University of Rome Semester Spring 2016 Lecturer Carlos Castillo http://chato.cl/ Sources: ● Chapter 10 of Zafarani, Abbasi, and Liu's book on Social Media Mining [slides] ● Sarkar, Chakrabarti, Moore: [slides] [slides]
  • 2. Problem definition ● Given a graph G=(V,E) at time t – Or a series of snapshots of the graph at times ti<=t ● Describe the state of the graph at time t'>t ● Sometimes, assume V stays the same and E changes
  • 3. Applications Accelerating formation of connections in professional social networks
  • 4. Applications ● Helping find your offline friends online
  • 5. Applications ● Increase server efficiency through pre-fetching ● Determining which links are missing in Wikipedia pages (or other educational resources) ● Monitor/control propagation of computer viruses ● Fixing corrupted data – You bought five books, one of the titles is lost, can we infer it? ● ...
  • 6. Basic method 1)Assign a score to every possible link (u,v) 2)Sort links by descending score 3)Predict the top-k links Or the links with scores above a threshold 4)Profit!
  • 7. Common neighbors ● Newman 2001: The probability of scientists collaborating increases with the number of other collaborators they have in common. ● Tendency to close triangles, more on this later ...
  • 8. Jaccard similarity ● Correct common neighbors by reducing the influence of nodes with many neighbors
  • 9. Adamic/Adar ● Count common neighbors but weight down nodes with too many neighbors The idea is to avoid this
  • 10. Understanding the Adamic/Adar heuristic 8 followers 1000 followers Prolific common friends Weaker evidence Less prolific Stronger evidence Alice Bob Charlie
  • 11. Preferential attachment ● “Rich-get-richer” ● Newman 2001: the probability of two authors collaborating is proportional to the product of their number of collaborators
  • 12. Example: score(v5 ,v7 ) Exercise, compute: ● Number of common neighbors ● Jaccard coefficient ● Adamic and Adar's ● Preferential attachment
  • 13. Geodesic/shortest path distance ● Assumption: social connections are formed by following edges, then finding a new person, then connecting directly score(u,v) := -(length of shortest path from u to v) ● Limit case: triadic closure
  • 14. Katz 1953 or “rooted PageRank” ● Score based on weighted counts of paths, with exponential decay on path length. For α < 1 ● A small α yields predictions which are similar to common neighbors
  • 15. More on random walks ● Hitting time ● Hu,v = expected steps of random walk from u to v ● To reduce the influence of well-connected nodes, we can multiply by the probability of a node in stationary state
  • 16. Symmetric hitting time (commute time) ● Hitting time is not symmetric, we can symmetrize easily
  • 18. SimRank [Jeh 2002] ● For directed graphs; follows inlinks u v w p q s r Exercise, compute: ● simrank(u,v) ● simrank(v,w) ● simrank(u,w)
  • 19. Meta-method / prunning ● Compute score(u,v) for all existing edges assuming they do not exist ● Remove k% with lower score ● Compute score(u,v) in the reduced graph
  • 20. Evaluating link prediction methods ● After one of the aforementioned measures is selected, a list of the top most similar pairs of nodes are selected. ● These pairs of nodes denote edges predicted to be the most likely to soon appear in the network. ● Performance (precision, recall, or accuracy) can be evaluated using the testing graph and by comparing the number of the testing graph’s edges that the link prediction algorithm successfully reveals. ● Performance is usually very low, since many edges are created due to reasons not solely available in a social network graph. ● So, a common baseline is to compare the performance with random edge predictors and report the factor improvements over random prediction.
  • 21. Performance comparison [Liben-Nowell et al. 2003] Notes: Effectiveness in general is very low (challenging problem)
  • 23. Supervised learning [Hassan et al. 2006] ● Input features are all attributes, possibly including node-links as attribute ● Predict connected/not-connected learning on a sub-set of the data
  • 24. Example experimental results with supervised learning ● Data: co-authorship network in DBLP and BIOBASE ● Split into two disjoint ranges of publication years (Ra, Rb) – Example: DBLP, Ra = [1999,2000] Rb=[2001,2004] ● Training item is a pair of authors (u,v), both with a paper in Ra, and all their attributes computed in Ra ● Ground truth is whether (u,v) co-author during Rb – Positive=yes, Negative=no
  • 25. Example features ● Content similarity – Keywords in common, conferences in common, ... ● Aggregation features – Sum of papers, Sum of neighbors, ... ● Topological distance – Shortest-path length, ...
  • 26. Performance results under various learning schemes (same feature set)
  • 28. Community membership prediction ● Why do users join communities? – What factors affect the community-joining behavior of individuals? ● We can observe users who join communities and determine the factors that are common among them ● We require a population of users, a community C, and community membership information (i.e., users who are members of C). – To distinguish between users who have already joined the community and those who are now joining it, we need community memberships at two different times t1 and t2, with t2 > t1. – At t2, we determine users such as u who are currently members of the community, but were not members at t1. These new users form the subpopulation that is analyzed for community-joining behavior.
  • 29. Peer influence Hypothesis: individuals are inclined toward an activity when their friends are engaged in the same activity. A factor that plays a role in users joining a community is the number of their friends who are already members of the community.
  • 32. Beyond community membership ● Communities can be implicit: One can think of individuals buying a product as a community, and people buying the product for the first time as individuals joining the community ● Collective Behavior: A group of individuals behaving in a similar way (first defined by sociologist Robert Park) ● It can be planned and coordinated, but often is spontaneous and unplanned ● Examples: – Individuals standing in line for a new product release – Posting messages online to support a cause or to show support for an individual ● Approach can be similar to community membership prediction