1.
Google PageRank
Yifan Li
GA DC Data Science, 19 April
2014
2.
2
Outline
What is PageRank
Why it is important
History of PageRank
Understand PageRank
Simplified PageRank Algorithm
Current state of the art
3.
What is PageRank
PageRank is a link analysis algorithm
which assigns a numerical weighting to
each Web page, with the purpose of
"measuring" relative importance.
Based on the hyperlinks
map
An excellent way to
prioritize the results of
web keyword searches
4.
4
Why it is important
• At the time that Page and Brin met, search engines
typically linked to pages that had the highest
keyword density, which meant people could game
the system by repeating the same phrase over and
over to attract higher search page results.
• PageRank provides a Search Engine Optimization to
determine a rough estimate of how important the
website is. The underlying assumption is that more
important websites are likely to receive more links
from other websites.
5.
History of PageRank
• PageRank was developed by Google founders Larry
Page and Sergey Brin at Stanford. PageRank is patented
by Stanford, and the name PageRank likely comes from
Larry Page.
• PageRank is now one of 200 ranking factors that Google
uses to determine a page’s popularity. Even though
PageRank is no longer directly important for SEO(Search
Engine Optimization) purposes, the existence of back-
links from more popular websites continues to push a
webpage higher up in search rankings.
6.
6
Understand PageRank
PageRank is a probability distribution used to represent
the likelihood that a person randomly clicking on links will
arrive at any particular page.
7.
Understand PageRank(cont.)
A "random surfer" who is given a web page at random and
keeps clicking on links, never hitting "back“, but eventually
gets bored and starts on another random page.
d damping factor is the probability, at any step, that the
surfer will continue surfing.（1- d) is the probability at each
page the "random surfer" will get bored and request
another random page. Google uses d as 0.85.
Without damping, all web surfers would eventually end up
on Pages A, B, or C, and all other pages would have
PageRank zero.
A page can have a high PageRank
If there are many pages that point to it
Or if there are some pages that point to it, and have a high
PageRank.
8.
Simplified PageRank algorithm
Assume four web pages: A, B,C and D. Let each page would begin
with an estimated PageRank of 0.25.
L(A) is defined as the number of links going out of page A. The
PageRank of a page A is given as follows:
A
B
C
D
A
B
C
D
9.
Simplified PageRank algorithm(cont.)
Assume page A has pages B, C, D ..., which
point to it. The parameter d is a damping
factor which can be set between 0 and 1.
Usually set d to 0.85. The PageRank of a
page A is given as follows:
10.
State of the art
• PageRank is now one of 200 ranking factors
that Google uses to determine a page’s
popularity. Google Panda is one of the other
strategies Google now relies on to rank
popularity of pages.Even though PageRank is
no longer directly important for SEO(Search
Engine Optimization) purposes, the existence of
back-links from more popular websites
continues to push a webpage higher up in
search rankings.
Be the first to comment