SlideShare a Scribd company logo
1 of 22
Google
Page Rank Algorithm
Abhav Luthra
7th Semester
Computer Science Engineering
Facts
• Developed by Larry Page and Sergey Brin in 1998
• Patented by Stanford university
• Trademark of Google
• Backbone of Google Search Engine Technology
• http://infolab.stanford.edu/~backrub/google.html - research
paper
What is PageRank
• Link Analysis Algorithm
• Ranks pages based on the number of other pages that link to
• Gives an indication of the relative importance of a page
• Hence, an appropriate SERP(Search Engine Result Page)
listing
• Calculated by weight and number of back links
BACK LINKS INBOUND LINKS OUTBOUND LINKS
Definition
PageRank works by counting the number and quality of links to a page
to determine a rough estimate of how important the page is. The
underlying assumption is that more important pages are likely to receive
more links from other websites.
“We assume page A has pages B,C,D which points to it . The parameter
d is a damping factor which can be set from 0 and 1. We usually set d to
0.85. Also L(A) is outbound links going of page A.
The PageRank of a page A is given as follows:
PR(A)=(1-D) + D(PR(B)/L(B) + PR(C)/L(C) + PR(D)/L(D))”
PageRank forms a probability distribution over web pages, so the sum of
all the web pages, PageRank will be 1.
What is damping Factor????
• The theory is that an
imaginary surfer who is
randomly clicking on
links will eventually
stop clicking. The
probability, at any
step, that the person
will continue is a
damping factor d
Observe A
• It have inbound link only , no outbound
link
• D to A is Called Dangling links - simply
links that point to any page with no
outgoing links.
• They affect the model because it is not
clear where their weight should be
distributed, and there are a large
number of them.
• Because dangling links do not affect the
ranking of any other page directly, we
simply remove
Calculating PageRank
PageRank of a page is as follows:
PR(A)=(1-D)/N + D(PR(B)/L(B) + PR(C)/L(C) + PR(D)/L(D))
• The PR of each page depends on the PR of
pages pointing to it.
• We don’t know what PR those pages have
until the pages pointing to them have their
PR calculated.
Solution
• PageRank can be calculated by using
Simple iterative algorithm
• It means we can calculate one page’s PR
without knowing the final value of PR of
other pages
In this example each node have
equal weight 1 initially which we
have divided among each
outgoing node equally
So we got lucky, what if PR=0
PR(A) = 0.15 + 0.85*0 = 0.15
PR(B ) = 0.15 + 0.85*0.15 = 0.2775
AGAIN,
PR(A) = 0.15 + 0.85*0.2775 = 0.387875
PR(B ) = 0.15 + 0.85*0.385875 = 0.4779375
AND AGAIN,
PR(A) = 0.15 + 0.85*0. 4779375 = 0.5562946875
PR(B ) = 0.15 + 0.85*0. 5562946875 = 0.622850484375
TILL PR  1
It really doesn’t matter if PR is 1; 0 ; or any other number it will eventually settle at 1.0
Lets run the code
int main()
{
double d=0.85;
double a,b;
a=0;b=0;
int i=40;
while(i-->0){
printf("a: %5f b: %5fn",a,b);
a=(1-d)+d*b;
b=(1-d)+d*a;
}
printf("Average PageRank= %4f" ,(a+b)/2);
getch();
return 0;
}
PageRank eventually settle at 1 in a long run
Now Lets Try another example
int main()
{
double d=0.85;
double a,b,c,e;
a=0;b=0;c=0;e=0;
int i=40;
while(i-->0){
printf("a: %5f b: %5f c: %5f e: %5fn",a,b,c,e);
a=(1-d)+d*((b/3) +(c/3) +(e/3));
b=(1-d)+d*((c/2) +(e/2));
c=(1-d)+d*(a);
e=(1-d)+d*((c/2) +(a/2));
}
printf("Average PageRank= %4f" ,(a+b+c+e)/4);
getch();
return 0;
}
Issues with PageRank
• Prefer Old Documents than new.
• Pages Redirect to main page itself rising there rank –
spoofed PageRank
• Search optimizer selling High PageRank's to
webmasters
• Cloaking – show different content to google and
different to users
• Link Exchange - ” I’ll add you if you add me ”
• Buying Links – Buying link to your website
• Keyword Stuffing – Link in whitespaces
• Bot Writing – Automatically update , edit and copy
content
Some applications beyond Google
• Dynamic Price Setting
• Programmable Networks
• Stock market Trading
• Opinion polls
• Web Mining
• Theme based Ranking
• Reputation system for ecommerce
• Collaborative Filtering
• Business Intelligence
PageRank

More Related Content

What's hot

PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibEl Habib NFAOUI
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithmsAnkit Raj
 
Web development with Python
Web development with PythonWeb development with Python
Web development with PythonRaman Balyan
 
Link Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalLink Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalCarlos Castillo (ChaTo)
 
Importance of Backlinks In SEO
Importance of Backlinks In SEOImportance of Backlinks In SEO
Importance of Backlinks In SEOAarav Infotech
 
Page rank
Page rankPage rank
Page rankCarlos
 
Introduction to Web Scraping using Python and Beautiful Soup
Introduction to Web Scraping using Python and Beautiful SoupIntroduction to Web Scraping using Python and Beautiful Soup
Introduction to Web Scraping using Python and Beautiful SoupTushar Mittal
 
SEO for Beginners-- What is Search Engine Optimization (SEO) ?
SEO for Beginners-- What is Search Engine Optimization (SEO) ?SEO for Beginners-- What is Search Engine Optimization (SEO) ?
SEO for Beginners-- What is Search Engine Optimization (SEO) ?Naveen Srikantaiah
 
Google page rank
Google page rankGoogle page rank
Google page rankYifan Li
 
ppt presentation Google algorithm
ppt presentation Google algorithmppt presentation Google algorithm
ppt presentation Google algorithmjoeydutta
 
Google algorithms
Google algorithmsGoogle algorithms
Google algorithmsstudent
 
SEO Toronto Presentation
SEO Toronto PresentationSEO Toronto Presentation
SEO Toronto PresentationSEO Toronto
 
SEO-off page optimization technique
SEO-off page optimization technique SEO-off page optimization technique
SEO-off page optimization technique shrikant87
 

What's hot (20)

Page Rank
Page RankPage Rank
Page Rank
 
Page rank algortihm
Page rank algortihmPage rank algortihm
Page rank algortihm
 
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_Habib
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
Web development with Python
Web development with PythonWeb development with Python
Web development with Python
 
Link Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalLink Analysis for Web Information Retrieval
Link Analysis for Web Information Retrieval
 
Importance of Backlinks In SEO
Importance of Backlinks In SEOImportance of Backlinks In SEO
Importance of Backlinks In SEO
 
Page rank
Page rankPage rank
Page rank
 
Webcrawler
Webcrawler Webcrawler
Webcrawler
 
Introduction to Web Scraping using Python and Beautiful Soup
Introduction to Web Scraping using Python and Beautiful SoupIntroduction to Web Scraping using Python and Beautiful Soup
Introduction to Web Scraping using Python and Beautiful Soup
 
SEO for Beginners-- What is Search Engine Optimization (SEO) ?
SEO for Beginners-- What is Search Engine Optimization (SEO) ?SEO for Beginners-- What is Search Engine Optimization (SEO) ?
SEO for Beginners-- What is Search Engine Optimization (SEO) ?
 
Technical seo
Technical seoTechnical seo
Technical seo
 
WebCrawler
WebCrawlerWebCrawler
WebCrawler
 
Google page rank
Google page rankGoogle page rank
Google page rank
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
ppt presentation Google algorithm
ppt presentation Google algorithmppt presentation Google algorithm
ppt presentation Google algorithm
 
Google algorithms
Google algorithmsGoogle algorithms
Google algorithms
 
SEO Toronto Presentation
SEO Toronto PresentationSEO Toronto Presentation
SEO Toronto Presentation
 
SEO-off page optimization technique
SEO-off page optimization technique SEO-off page optimization technique
SEO-off page optimization technique
 

Viewers also liked

The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?Kundan Bhaduri
 
Clinical Cases from Resource Limited Settings: David Roesel
Clinical Cases from Resource Limited Settings: David RoeselClinical Cases from Resource Limited Settings: David Roesel
Clinical Cases from Resource Limited Settings: David RoeselUWGlobalHealth
 
How Google Search Engine Algorithm Works ??
How Google Search Engine Algorithm Works ??How Google Search Engine Algorithm Works ??
How Google Search Engine Algorithm Works ??Viral Shah
 
PageRank and Related Methods
PageRank and Related MethodsPageRank and Related Methods
PageRank and Related MethodsJohn Breslin
 
Understanding search engine algorithms
Understanding search engine algorithmsUnderstanding search engine algorithms
Understanding search engine algorithmsVijay Sankar
 
Mathematics project
Mathematics projectMathematics project
Mathematics projectgeetatyagi
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engineguestf460ed0
 
Page rank and hyperlink
Page rank and hyperlink Page rank and hyperlink
Page rank and hyperlink Silicon
 
Additional mathematics project 2014
Additional mathematics project 2014Additional mathematics project 2014
Additional mathematics project 2014Nabila Syuhada
 
Mathematics project for class 10th
Mathematics project for class 10thMathematics project for class 10th
Mathematics project for class 10thAtishay Jain
 
Google Penguin, Google Panda, and Google Algorithms 2013
Google Penguin, Google Panda, and Google Algorithms 2013Google Penguin, Google Panda, and Google Algorithms 2013
Google Penguin, Google Panda, and Google Algorithms 2013Bill Hartzer
 
Google hummingbird algorithm ppt
Google hummingbird algorithm pptGoogle hummingbird algorithm ppt
Google hummingbird algorithm pptPriyodarshini Dhar
 

Viewers also liked (19)

The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?
 
Google algorithim’s
Google  algorithim’sGoogle  algorithim’s
Google algorithim’s
 
Clinical Cases from Resource Limited Settings: David Roesel
Clinical Cases from Resource Limited Settings: David RoeselClinical Cases from Resource Limited Settings: David Roesel
Clinical Cases from Resource Limited Settings: David Roesel
 
How Google Search Engine Algorithm Works ??
How Google Search Engine Algorithm Works ??How Google Search Engine Algorithm Works ??
How Google Search Engine Algorithm Works ??
 
Samana m
Samana mSamana m
Samana m
 
PageRank and Related Methods
PageRank and Related MethodsPageRank and Related Methods
PageRank and Related Methods
 
Link Analysis (RBY)
Link Analysis (RBY)Link Analysis (RBY)
Link Analysis (RBY)
 
Understanding search engine algorithms
Understanding search engine algorithmsUnderstanding search engine algorithms
Understanding search engine algorithms
 
Lec5 Pagerank
Lec5 PagerankLec5 Pagerank
Lec5 Pagerank
 
Pagerank and hits
Pagerank and hitsPagerank and hits
Pagerank and hits
 
Mathematics project
Mathematics projectMathematics project
Mathematics project
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engine
 
Seo and page rank algorithm
Seo and page rank algorithmSeo and page rank algorithm
Seo and page rank algorithm
 
Page rank and hyperlink
Page rank and hyperlink Page rank and hyperlink
Page rank and hyperlink
 
Additional mathematics project 2014
Additional mathematics project 2014Additional mathematics project 2014
Additional mathematics project 2014
 
Mathematics project for class 10th
Mathematics project for class 10thMathematics project for class 10th
Mathematics project for class 10th
 
Google Penguin, Google Panda, and Google Algorithms 2013
Google Penguin, Google Panda, and Google Algorithms 2013Google Penguin, Google Panda, and Google Algorithms 2013
Google Penguin, Google Panda, and Google Algorithms 2013
 
Google hummingbird algorithm ppt
Google hummingbird algorithm pptGoogle hummingbird algorithm ppt
Google hummingbird algorithm ppt
 
Pagerank
PagerankPagerank
Pagerank
 

Similar to PageRank

Similar to PageRank (20)

Dm page rank
Dm page rankDm page rank
Dm page rank
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Page rank1
Page rank1Page rank1
Page rank1
 
PageRank & Searching
PageRank & SearchingPageRank & Searching
PageRank & Searching
 
Page rank2
Page rank2Page rank2
Page rank2
 
Google page rank
Google page rankGoogle page rank
Google page rank
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduce
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystification
 
Google page rank
Google page rankGoogle page rank
Google page rank
 
Topological methods
Topological methods Topological methods
Topological methods
 
Pr
PrPr
Pr
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
 
Motivation
MotivationMotivation
Motivation
 
Local Approximation of PageRank
Local Approximation of PageRankLocal Approximation of PageRank
Local Approximation of PageRank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
PageRank Algorithm
PageRank AlgorithmPageRank Algorithm
PageRank Algorithm
 
Pagerank
PagerankPagerank
Pagerank
 
Pagerank
PagerankPagerank
Pagerank
 
Page Rank
Page RankPage Rank
Page Rank
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

PageRank

  • 1. Google Page Rank Algorithm Abhav Luthra 7th Semester Computer Science Engineering
  • 2. Facts • Developed by Larry Page and Sergey Brin in 1998 • Patented by Stanford university • Trademark of Google • Backbone of Google Search Engine Technology • http://infolab.stanford.edu/~backrub/google.html - research paper
  • 3. What is PageRank • Link Analysis Algorithm • Ranks pages based on the number of other pages that link to • Gives an indication of the relative importance of a page • Hence, an appropriate SERP(Search Engine Result Page) listing • Calculated by weight and number of back links
  • 4. BACK LINKS INBOUND LINKS OUTBOUND LINKS
  • 5. Definition PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the page is. The underlying assumption is that more important pages are likely to receive more links from other websites. “We assume page A has pages B,C,D which points to it . The parameter d is a damping factor which can be set from 0 and 1. We usually set d to 0.85. Also L(A) is outbound links going of page A. The PageRank of a page A is given as follows: PR(A)=(1-D) + D(PR(B)/L(B) + PR(C)/L(C) + PR(D)/L(D))” PageRank forms a probability distribution over web pages, so the sum of all the web pages, PageRank will be 1.
  • 6. What is damping Factor???? • The theory is that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d
  • 7. Observe A • It have inbound link only , no outbound link • D to A is Called Dangling links - simply links that point to any page with no outgoing links. • They affect the model because it is not clear where their weight should be distributed, and there are a large number of them. • Because dangling links do not affect the ranking of any other page directly, we simply remove
  • 8. Calculating PageRank PageRank of a page is as follows: PR(A)=(1-D)/N + D(PR(B)/L(B) + PR(C)/L(C) + PR(D)/L(D)) • The PR of each page depends on the PR of pages pointing to it. • We don’t know what PR those pages have until the pages pointing to them have their PR calculated.
  • 9.
  • 10.
  • 11. Solution • PageRank can be calculated by using Simple iterative algorithm • It means we can calculate one page’s PR without knowing the final value of PR of other pages In this example each node have equal weight 1 initially which we have divided among each outgoing node equally
  • 12. So we got lucky, what if PR=0 PR(A) = 0.15 + 0.85*0 = 0.15 PR(B ) = 0.15 + 0.85*0.15 = 0.2775 AGAIN, PR(A) = 0.15 + 0.85*0.2775 = 0.387875 PR(B ) = 0.15 + 0.85*0.385875 = 0.4779375 AND AGAIN, PR(A) = 0.15 + 0.85*0. 4779375 = 0.5562946875 PR(B ) = 0.15 + 0.85*0. 5562946875 = 0.622850484375 TILL PR  1 It really doesn’t matter if PR is 1; 0 ; or any other number it will eventually settle at 1.0
  • 13.
  • 14.
  • 15. Lets run the code int main() { double d=0.85; double a,b; a=0;b=0; int i=40; while(i-->0){ printf("a: %5f b: %5fn",a,b); a=(1-d)+d*b; b=(1-d)+d*a; } printf("Average PageRank= %4f" ,(a+b)/2); getch(); return 0; }
  • 16. PageRank eventually settle at 1 in a long run
  • 17. Now Lets Try another example int main() { double d=0.85; double a,b,c,e; a=0;b=0;c=0;e=0; int i=40; while(i-->0){ printf("a: %5f b: %5f c: %5f e: %5fn",a,b,c,e); a=(1-d)+d*((b/3) +(c/3) +(e/3)); b=(1-d)+d*((c/2) +(e/2)); c=(1-d)+d*(a); e=(1-d)+d*((c/2) +(a/2)); } printf("Average PageRank= %4f" ,(a+b+c+e)/4); getch(); return 0; }
  • 18.
  • 19. Issues with PageRank • Prefer Old Documents than new. • Pages Redirect to main page itself rising there rank – spoofed PageRank • Search optimizer selling High PageRank's to webmasters
  • 20. • Cloaking – show different content to google and different to users • Link Exchange - ” I’ll add you if you add me ” • Buying Links – Buying link to your website • Keyword Stuffing – Link in whitespaces • Bot Writing – Automatically update , edit and copy content
  • 21. Some applications beyond Google • Dynamic Price Setting • Programmable Networks • Stock market Trading • Opinion polls • Web Mining • Theme based Ranking • Reputation system for ecommerce • Collaborative Filtering • Business Intelligence