SlideShare a Scribd company logo
1 of 40
Download to read offline
Link-Based Ranking
Class Algorithmic Methods of Data Mining
Program M. Sc. Data Science
University Sapienza University of Rome
Semester Fall 2015
Lecturer Carlos Castillo http://chato.cl/
Sources:
● Fei Li's lecture on PageRank
● Evimaria Terzi's lecture on link analysis.
● Paolo Boldi, Francesco Bonchi, Carlos Castillo, and Sebastiano
Vigna. 2011. Viscous democracy for social networks. Commun.
ACM 54, 6 (June 2011), 129-137. [link]
2
Purpose of Link-Based Ranking
● Static (query-independent) ranking
● Dynamic (query-dependent) ranking
● Applications:
– Search in social networks
– Search on the web
3
Given a set of connected objects
4
Assign some weights
5
Alternatives
● Various centrality metrics
– Degree, betweenness, ...
● Classical algorithms
– HITS / Hubs and Authorities
– PageRank
6
HITS (Hubs and Authorities)
7
HITS
● Jon M. Kleinberg. 1999. Authoritative sources in
a hyperlinked environment. J. ACM 46, 5
(September 1999), 604-632. [DOI]
● Query-dependent algorithm
– Get pages matching the query
– Expand to 1-hop neighborhood
– Find pages with good out-links (“hubs”)
– Find pages with good in-links (“authorities”)
8
Root set = matches the query
9
Base set S = root set plus
1-hop neighbors
Base set S is expected to be small and topically focused.
10
Base graph S of n nodes
11
Bipartite graph of 2n nodes
12
Bipartite graph of 2n nodes
0) Initialization:
1) Iteration: 2) Normalization:
13
Try it!
H(1) A(1) Â(1) H(2) Ĥ(2) A(2) Â(2)
1 0
1 3
1 1
1 1
1 1
Complete the table. Which one is the biggest hub? Which the
biggest authority? Does it differ from ranking by degree?
14
What are we computing?
● Vector a is an eigenvector of ATA
● Conversely, vector h is an eigenvector of AAT
15
Tightly-knit communities
● Imagine a graph made of a 3,3 and a 2,3 clique
1
1
1
1
1
16
Tightly-knit communities
● Imagine a graph made of a 3,3 and a 2,3 clique
3
3
3
2
2
2
17
Tightly-knit communities
● Imagine a graph made of a 3,3 and a 2,3 clique
3x3
3x3
3x3
3x2
3x2
18
Tightly-knit communities
● Imagine a graph made of a 3,3 and a 2,3 clique
3x3x3
3x3x3
3x3x3
3x2x2
3x2x2
3x2x2
19
Tightly-knit communities
● HITS favors the largest dense sub-graph
…
...
...
After n iterations:
20
PageRank
21
PageRank
● The pagerank citation algorithm: bringing order
to the web by L Page, S Brin, R Motwani, T
Winograd - 7th World Wide Web Conference,
1998 [link].
● Designed by Page & Brin as part of a research
project that started in 1995 and ended in 1998
… with the creation of Google
22
A Simple Version of PageRank
●
Nj: the number of forward links of page j
●
c: normalization factor to ensure
||P||L1= |P1 + … + Pn| = 1
23
An example of Simplified PageRank
First iteration of calculation
24
An example of Simplified PageRank
Second iteration of calculation
25
An example of Simplified PageRank
Convergence after some iterations
26
A Problem with Simplified PageRank
A loop:
During each iteration, the loop accumulates rank but
never distributes rank to other pages!
27
An example of the Problem
First iteration
28
An example of the Problem
Second iteration … see what's happening?
29
An example of the Problem
Convergence
30
What are we computing?
● p is an eigenvector of A with eigenvalue 1
● This (power method) can be used if A is:
– Stochastic (each row adds up to one)
– Irreducible (represents a strongly connected graph)
– Aperiodic (does not represent a bipartite graph)
31
Markov Chains
● Discrete process over a set of states
● Next state determined by current state and
current state only (no memory of older states)
– Higher-order Markov chains can be defined
● Stationary distribution of Markov chain is a
probability distribution such that p = Ap
● Intuitively, p represents “the average time spent”
at each node if the process continues forever
32
Random Walks in Graphs
●
Random Surfer Model
– The simplified model: the standing probability distribution of a
random walk on the graph of the web. simply keeps clicking
successive links at random
●
Modified Random Surfer
– The modified model: the “random surfer” simply keeps clicking
successive links at random, but periodically “gets bored” and
jumps to a random page based on the distribution of E
– This guarantees irreducibility
– Pages without out-links (dangling nodes) are a row of zeros,
can be replaced by E, or by a row of 1/n
33
Modified Version of PageRank
E(i): web pages that “users” jump to when they “get bored”;
Uniform random jump => E(i) = 1/n
34
An example of Modified PageRank
35
Variant: personalized PageRank
● Modify vector E(i) according to users' tastes
(e.g. user interested in sports vs politics)
http://nlp.stanford.edu/IR-book/html/htmledition/topic-specific-pagerank-1.html
36
PageRank and internal linking
● A website has a maximum amount of Page Rank
that is distributed between its pages by internal
links [depends on internal links]
● The maximum amount of Page Rank in a site
increases as the number of pages in the site
increases.
● By linking poorly, it is possible to fail to reach the
site's maximum Page Rank, but it is not possible
to exceed it.
http://www.cs.sjsu.edu/faculty/pollett/masters/Semesters/Fall11/tanmayee/Deliverable3.pdf
37
PageRank as a form of actual voting
(liquid democracy)
● If alpha = 1, we can implement liquid
democracy
– In liquid democracy, people chose to either vote or
to delegate their vote to somebody else
● If alpha < 1, we have a sort of “viscous”
democracy where delegation is not total
38
PageRank as a form of liquid
democracy
39
One of these two
graphs has alpha = 0.9.
The other has alpha =
0.2.
Which one is which?
40
PageRank Implementation
●
Suppose there are n pages and m links
●
Trivial implementation of PageRank requires O(m+n) memory
●
Streaming implementation requires O(n) memory … how?
●
More on PageRank to follow in another lecture ...

More Related Content

What's hot

Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
CS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVCS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVpkaviya
 
CS6010 Social Network Analysis Unit III
CS6010 Social Network Analysis   Unit IIICS6010 Social Network Analysis   Unit III
CS6010 Social Network Analysis Unit IIIpkaviya
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMSai Kumar Ale
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challengeGan Keng Hoon
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysisBangalore
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction) Primya Tamil
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search enginePrimya Tamil
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systemsUsman Tariq
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 

What's hot (20)

Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Term weighting
Term weightingTerm weighting
Term weighting
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
 
CS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVCS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IV
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDFCS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
 
CS6010 Social Network Analysis Unit III
CS6010 Social Network Analysis   Unit IIICS6010 Social Network Analysis   Unit III
CS6010 Social Network Analysis Unit III
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 
Inverted index
Inverted indexInverted index
Inverted index
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
Distributed information system
Distributed information systemDistributed information system
Distributed information system
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Web usage-mining
Web usage-miningWeb usage-mining
Web usage-mining
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search engine
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systems
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 

Similar to Link-Based Ranking

Link analysis for web search
Link analysis for web searchLink analysis for web search
Link analysis for web searchEmrullah Delibas
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.pptrayyverma
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHDivyansh Verma
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EEPing Yeh
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Margaret Wang
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure miningAtul Khanna
 
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...Lviv Data Science Summer School
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学Xu jiakon
 
Link analysis : Comparative study of HITS and Page Rank Algorithm
Link analysis : Comparative study of HITS and Page Rank AlgorithmLink analysis : Comparative study of HITS and Page Rank Algorithm
Link analysis : Comparative study of HITS and Page Rank AlgorithmKavita Kushwah
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 

Similar to Link-Based Ranking (20)

Link analysis for web search
Link analysis for web searchLink analysis for web search
Link analysis for web search
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.ppt
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EE
 
Page rank algortihm
Page rank algortihmPage rank algortihm
Page rank algortihm
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure mining
 
CSE509 Lecture 3
CSE509 Lecture 3CSE509 Lecture 3
CSE509 Lecture 3
 
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
 
Web mining
Web miningWeb mining
Web mining
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Q3
Q3Q3
Q3
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
Social (1)
Social (1)Social (1)
Social (1)
 
Mazhiming
MazhimingMazhiming
Mazhiming
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学
 
Link analysis : Comparative study of HITS and Page Rank Algorithm
Link analysis : Comparative study of HITS and Page Rank AlgorithmLink analysis : Comparative study of HITS and Page Rank Algorithm
Link analysis : Comparative study of HITS and Page Rank Algorithm
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Web mining
Web miningWeb mining
Web mining
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Link-Based Ranking

  • 1. Link-Based Ranking Class Algorithmic Methods of Data Mining Program M. Sc. Data Science University Sapienza University of Rome Semester Fall 2015 Lecturer Carlos Castillo http://chato.cl/ Sources: ● Fei Li's lecture on PageRank ● Evimaria Terzi's lecture on link analysis. ● Paolo Boldi, Francesco Bonchi, Carlos Castillo, and Sebastiano Vigna. 2011. Viscous democracy for social networks. Commun. ACM 54, 6 (June 2011), 129-137. [link]
  • 2. 2 Purpose of Link-Based Ranking ● Static (query-independent) ranking ● Dynamic (query-dependent) ranking ● Applications: – Search in social networks – Search on the web
  • 3. 3 Given a set of connected objects
  • 5. 5 Alternatives ● Various centrality metrics – Degree, betweenness, ... ● Classical algorithms – HITS / Hubs and Authorities – PageRank
  • 6. 6 HITS (Hubs and Authorities)
  • 7. 7 HITS ● Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (September 1999), 604-632. [DOI] ● Query-dependent algorithm – Get pages matching the query – Expand to 1-hop neighborhood – Find pages with good out-links (“hubs”) – Find pages with good in-links (“authorities”)
  • 8. 8 Root set = matches the query
  • 9. 9 Base set S = root set plus 1-hop neighbors Base set S is expected to be small and topically focused.
  • 10. 10 Base graph S of n nodes
  • 12. 12 Bipartite graph of 2n nodes 0) Initialization: 1) Iteration: 2) Normalization:
  • 13. 13 Try it! H(1) A(1) Â(1) H(2) Ĥ(2) A(2) Â(2) 1 0 1 3 1 1 1 1 1 1 Complete the table. Which one is the biggest hub? Which the biggest authority? Does it differ from ranking by degree?
  • 14. 14 What are we computing? ● Vector a is an eigenvector of ATA ● Conversely, vector h is an eigenvector of AAT
  • 15. 15 Tightly-knit communities ● Imagine a graph made of a 3,3 and a 2,3 clique 1 1 1 1 1
  • 16. 16 Tightly-knit communities ● Imagine a graph made of a 3,3 and a 2,3 clique 3 3 3 2 2 2
  • 17. 17 Tightly-knit communities ● Imagine a graph made of a 3,3 and a 2,3 clique 3x3 3x3 3x3 3x2 3x2
  • 18. 18 Tightly-knit communities ● Imagine a graph made of a 3,3 and a 2,3 clique 3x3x3 3x3x3 3x3x3 3x2x2 3x2x2 3x2x2
  • 19. 19 Tightly-knit communities ● HITS favors the largest dense sub-graph … ... ... After n iterations:
  • 21. 21 PageRank ● The pagerank citation algorithm: bringing order to the web by L Page, S Brin, R Motwani, T Winograd - 7th World Wide Web Conference, 1998 [link]. ● Designed by Page & Brin as part of a research project that started in 1995 and ended in 1998 … with the creation of Google
  • 22. 22 A Simple Version of PageRank ● Nj: the number of forward links of page j ● c: normalization factor to ensure ||P||L1= |P1 + … + Pn| = 1
  • 23. 23 An example of Simplified PageRank First iteration of calculation
  • 24. 24 An example of Simplified PageRank Second iteration of calculation
  • 25. 25 An example of Simplified PageRank Convergence after some iterations
  • 26. 26 A Problem with Simplified PageRank A loop: During each iteration, the loop accumulates rank but never distributes rank to other pages!
  • 27. 27 An example of the Problem First iteration
  • 28. 28 An example of the Problem Second iteration … see what's happening?
  • 29. 29 An example of the Problem Convergence
  • 30. 30 What are we computing? ● p is an eigenvector of A with eigenvalue 1 ● This (power method) can be used if A is: – Stochastic (each row adds up to one) – Irreducible (represents a strongly connected graph) – Aperiodic (does not represent a bipartite graph)
  • 31. 31 Markov Chains ● Discrete process over a set of states ● Next state determined by current state and current state only (no memory of older states) – Higher-order Markov chains can be defined ● Stationary distribution of Markov chain is a probability distribution such that p = Ap ● Intuitively, p represents “the average time spent” at each node if the process continues forever
  • 32. 32 Random Walks in Graphs ● Random Surfer Model – The simplified model: the standing probability distribution of a random walk on the graph of the web. simply keeps clicking successive links at random ● Modified Random Surfer – The modified model: the “random surfer” simply keeps clicking successive links at random, but periodically “gets bored” and jumps to a random page based on the distribution of E – This guarantees irreducibility – Pages without out-links (dangling nodes) are a row of zeros, can be replaced by E, or by a row of 1/n
  • 33. 33 Modified Version of PageRank E(i): web pages that “users” jump to when they “get bored”; Uniform random jump => E(i) = 1/n
  • 34. 34 An example of Modified PageRank
  • 35. 35 Variant: personalized PageRank ● Modify vector E(i) according to users' tastes (e.g. user interested in sports vs politics) http://nlp.stanford.edu/IR-book/html/htmledition/topic-specific-pagerank-1.html
  • 36. 36 PageRank and internal linking ● A website has a maximum amount of Page Rank that is distributed between its pages by internal links [depends on internal links] ● The maximum amount of Page Rank in a site increases as the number of pages in the site increases. ● By linking poorly, it is possible to fail to reach the site's maximum Page Rank, but it is not possible to exceed it. http://www.cs.sjsu.edu/faculty/pollett/masters/Semesters/Fall11/tanmayee/Deliverable3.pdf
  • 37. 37 PageRank as a form of actual voting (liquid democracy) ● If alpha = 1, we can implement liquid democracy – In liquid democracy, people chose to either vote or to delegate their vote to somebody else ● If alpha < 1, we have a sort of “viscous” democracy where delegation is not total
  • 38. 38 PageRank as a form of liquid democracy
  • 39. 39 One of these two graphs has alpha = 0.9. The other has alpha = 0.2. Which one is which?
  • 40. 40 PageRank Implementation ● Suppose there are n pages and m links ● Trivial implementation of PageRank requires O(m+n) memory ● Streaming implementation requires O(n) memory … how? ● More on PageRank to follow in another lecture ...