Google PageRank

          By
    Abhijit Mondal

  Software Engineer
   HolidayIQ.com
What is PageRank ?
   It's the algorithm developed by Google
  founders Larry 'Page' and Sergey Brin to
quantify the importance of a 'page' or website
in the complex network of the world wide web
PageRank is only a criteria and not the only
criteria by which Google decides where your
page or website will rank in the search results
From the inception of PageRank, the search
results ranking algorithms have developed so
much that at this moment nobody knows what
 is the exact algorithm (algorithms) Google
        uses to rank search results.


 If it was known there would be no need for
               SEO experts
Assume that world wide web is composed of
  only 4 pages which looks like a directed
graph where the arrows indicate a hyperlink
         from one page to another
Assuming that all the hyperlinks in a page
have equal probability of being clicked (which
 is not true) then an edge weight is given as
 fraction of the total outgoing links from that
                      page
Loosely speaking PageRank of a page A is a
 direct measure of the probability of visiting
  page A when a random user opens up a
browser and follows some hyperlinks to reach
                   page A
In the given graph what is the probability of
reaching page 3 when a random user opens
     up the browser to surf the internet ?
How can the user reach page 3 ?

He is on page 1 then clicks link of page 3
                    Or
He is on page 2 then clicks link of page 3
                    Or
He is on page 4 then clicks link of page 3
                    Or
       Directly types url of page 3
Denoting the probability of reaching page i as
                 P(i), then
 P(3) = (1-d) x (P(1)x(1/3) + P(2)x(1/2) + P(4)x(1/2)) + d x (1/4)


     This formula follows from the laws of
                  probability

Where 'd' is the probability that user directly
visits a page, hence (1-d) is the probability
 that user comes through a different page.
… Similarly the equations for P(1), P(2) and
                   P(4) are

      P(1) = (1-d) x (P(3) + P(4)x(1/2)) + d x (1/4)

         P(2) = (1-d) x (P(1)x(1/3)) + d x (1/4)

   P(4) = (1-d) x (P(2)x(1/2) + P(1)x(1/3)) + d x (1/4)

But we now have a problem, if we already do not know
what is P(1) and we need P(3) to compute it, then how
        can P(3) be computed using P(1) ???

These are coupled equations and solved using Matrices
 (eigenvalues and eigenvectors) or more simply using
      repeated iterations till the values converge
But why calculates probabilities when we
want PageRank ? Because the probability of
 reaching page i is the direct measure of the
  PageRank of i. Letting PR(i) = P(i) where
       PR(i) is the PageRank of page i.


Denoting PRk(i) as the PageRank computed
using the earlier formula in the kth iteration, in
            the (k+1)th iteration ...
PRk+1(3) = (1-d) x (PRk(1)x(1/3) + PRk(2)x(1/2) +
                PRk(4)x(1/2)) + d x (1/4)

  PRk+1(1) = (1-d) x (PRk(3) + PRk(4)x(1/2)) + d x (1/4)

       PRk+1(2) = (1-d) x (PRk(1)x(1/3)) + d x (1/4)

  PRk+1(4) = (1-d) x (PRk(2)x(1/2) + PRk(1)x(1/3)) + d x
                           (1/4)

                 Letting d=0.15 and
       PR0(1) = PR0(2) = PR0(3) = PR0(4) = 0.25

Compute PRk(i) for each k until |PRk+1(i) – PRk(i)| < ɛ for
all i =1, 2, 3, 4, where ɛ is some very small real number
Computing the PageRanks of each page
        using the above formula:

                PR(1) = 0.368
                PR(2) = 0.142
                PR(3) = 0.288
                PR(4) = 0.202

    Thus page 1 is the page with highest
PageRank. Surprising since page 3 receives
 the most backlinks (from 1, 2 and 4), but 1
  receives backlink from 3 and page 3 only
gives backlink to page 1, thus 'informing' that
          page 1 is really important
What are the conclusions from the above
                results ?

More number of backlinks, better PageRank

Backlinks from pages with high PageRanks
   themselves improves my PageRank

    If there are good many backlinks from
 Wikipedia or some university website like
iitk.ac.in to my site HolidayIQ.com then my
         PageRank will always improve

Pagerank

  • 1.
    Google PageRank By Abhijit Mondal Software Engineer HolidayIQ.com
  • 2.
    What is PageRank? It's the algorithm developed by Google founders Larry 'Page' and Sergey Brin to quantify the importance of a 'page' or website in the complex network of the world wide web
  • 3.
    PageRank is onlya criteria and not the only criteria by which Google decides where your page or website will rank in the search results
  • 4.
    From the inceptionof PageRank, the search results ranking algorithms have developed so much that at this moment nobody knows what is the exact algorithm (algorithms) Google uses to rank search results. If it was known there would be no need for SEO experts
  • 5.
    Assume that worldwide web is composed of only 4 pages which looks like a directed graph where the arrows indicate a hyperlink from one page to another
  • 6.
    Assuming that allthe hyperlinks in a page have equal probability of being clicked (which is not true) then an edge weight is given as fraction of the total outgoing links from that page
  • 7.
    Loosely speaking PageRankof a page A is a direct measure of the probability of visiting page A when a random user opens up a browser and follows some hyperlinks to reach page A
  • 8.
    In the givengraph what is the probability of reaching page 3 when a random user opens up the browser to surf the internet ?
  • 9.
    How can theuser reach page 3 ? He is on page 1 then clicks link of page 3 Or He is on page 2 then clicks link of page 3 Or He is on page 4 then clicks link of page 3 Or Directly types url of page 3
  • 10.
    Denoting the probabilityof reaching page i as P(i), then P(3) = (1-d) x (P(1)x(1/3) + P(2)x(1/2) + P(4)x(1/2)) + d x (1/4) This formula follows from the laws of probability Where 'd' is the probability that user directly visits a page, hence (1-d) is the probability that user comes through a different page.
  • 11.
    … Similarly theequations for P(1), P(2) and P(4) are P(1) = (1-d) x (P(3) + P(4)x(1/2)) + d x (1/4) P(2) = (1-d) x (P(1)x(1/3)) + d x (1/4) P(4) = (1-d) x (P(2)x(1/2) + P(1)x(1/3)) + d x (1/4) But we now have a problem, if we already do not know what is P(1) and we need P(3) to compute it, then how can P(3) be computed using P(1) ??? These are coupled equations and solved using Matrices (eigenvalues and eigenvectors) or more simply using repeated iterations till the values converge
  • 12.
    But why calculatesprobabilities when we want PageRank ? Because the probability of reaching page i is the direct measure of the PageRank of i. Letting PR(i) = P(i) where PR(i) is the PageRank of page i. Denoting PRk(i) as the PageRank computed using the earlier formula in the kth iteration, in the (k+1)th iteration ...
  • 13.
    PRk+1(3) = (1-d)x (PRk(1)x(1/3) + PRk(2)x(1/2) + PRk(4)x(1/2)) + d x (1/4) PRk+1(1) = (1-d) x (PRk(3) + PRk(4)x(1/2)) + d x (1/4) PRk+1(2) = (1-d) x (PRk(1)x(1/3)) + d x (1/4) PRk+1(4) = (1-d) x (PRk(2)x(1/2) + PRk(1)x(1/3)) + d x (1/4) Letting d=0.15 and PR0(1) = PR0(2) = PR0(3) = PR0(4) = 0.25 Compute PRk(i) for each k until |PRk+1(i) – PRk(i)| < ɛ for all i =1, 2, 3, 4, where ɛ is some very small real number
  • 14.
    Computing the PageRanksof each page using the above formula: PR(1) = 0.368 PR(2) = 0.142 PR(3) = 0.288 PR(4) = 0.202 Thus page 1 is the page with highest PageRank. Surprising since page 3 receives the most backlinks (from 1, 2 and 4), but 1 receives backlink from 3 and page 3 only gives backlink to page 1, thus 'informing' that page 1 is really important
  • 15.
    What are theconclusions from the above results ? More number of backlinks, better PageRank Backlinks from pages with high PageRanks themselves improves my PageRank If there are good many backlinks from Wikipedia or some university website like iitk.ac.in to my site HolidayIQ.com then my PageRank will always improve