Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Google PageRank By Abhijit Mondal Software Engineer
  2. 2. What is PageRank ? Its the algorithm developed by Google founders Larry Page and Sergey Brin toquantify the importance of a page or websitein the complex network of the world wide web
  3. 3. PageRank is only a criteria and not the onlycriteria by which Google decides where yourpage or website will rank in the search results
  4. 4. From the inception of PageRank, the searchresults ranking algorithms have developed somuch that at this moment nobody knows what is the exact algorithm (algorithms) Google uses to rank search results. If it was known there would be no need for SEO experts
  5. 5. Assume that world wide web is composed of only 4 pages which looks like a directedgraph where the arrows indicate a hyperlink from one page to another
  6. 6. Assuming that all the hyperlinks in a pagehave equal probability of being clicked (which is not true) then an edge weight is given as fraction of the total outgoing links from that page
  7. 7. Loosely speaking PageRank of a page A is a direct measure of the probability of visiting page A when a random user opens up abrowser and follows some hyperlinks to reach page A
  8. 8. In the given graph what is the probability ofreaching page 3 when a random user opens up the browser to surf the internet ?
  9. 9. How can the user reach page 3 ?He is on page 1 then clicks link of page 3 OrHe is on page 2 then clicks link of page 3 OrHe is on page 4 then clicks link of page 3 Or Directly types url of page 3
  10. 10. Denoting the probability of reaching page i as P(i), then P(3) = (1-d) x (P(1)x(1/3) + P(2)x(1/2) + P(4)x(1/2)) + d x (1/4) This formula follows from the laws of probabilityWhere d is the probability that user directlyvisits a page, hence (1-d) is the probability that user comes through a different page.
  11. 11. … Similarly the equations for P(1), P(2) and P(4) are P(1) = (1-d) x (P(3) + P(4)x(1/2)) + d x (1/4) P(2) = (1-d) x (P(1)x(1/3)) + d x (1/4) P(4) = (1-d) x (P(2)x(1/2) + P(1)x(1/3)) + d x (1/4)But we now have a problem, if we already do not knowwhat is P(1) and we need P(3) to compute it, then how can P(3) be computed using P(1) ???These are coupled equations and solved using Matrices (eigenvalues and eigenvectors) or more simply using repeated iterations till the values converge
  12. 12. But why calculates probabilities when wewant PageRank ? Because the probability of reaching page i is the direct measure of the PageRank of i. Letting PR(i) = P(i) where PR(i) is the PageRank of page i.Denoting PRk(i) as the PageRank computedusing the earlier formula in the kth iteration, in the (k+1)th iteration ...
  13. 13. PRk+1(3) = (1-d) x (PRk(1)x(1/3) + PRk(2)x(1/2) + PRk(4)x(1/2)) + d x (1/4) PRk+1(1) = (1-d) x (PRk(3) + PRk(4)x(1/2)) + d x (1/4) PRk+1(2) = (1-d) x (PRk(1)x(1/3)) + d x (1/4) PRk+1(4) = (1-d) x (PRk(2)x(1/2) + PRk(1)x(1/3)) + d x (1/4) Letting d=0.15 and PR0(1) = PR0(2) = PR0(3) = PR0(4) = 0.25Compute PRk(i) for each k until |PRk+1(i) – PRk(i)| < ɛ forall i =1, 2, 3, 4, where ɛ is some very small real number
  14. 14. Computing the PageRanks of each page using the above formula: PR(1) = 0.368 PR(2) = 0.142 PR(3) = 0.288 PR(4) = 0.202 Thus page 1 is the page with highestPageRank. Surprising since page 3 receives the most backlinks (from 1, 2 and 4), but 1 receives backlink from 3 and page 3 onlygives backlink to page 1, thus informing that page 1 is really important
  15. 15. What are the conclusions from the above results ?More number of backlinks, better PageRankBacklinks from pages with high PageRanks themselves improves my PageRank If there are good many backlinks from Wikipedia or some university website to my site then my PageRank will always improve