Google page rank

  • 87 views
Uploaded on

changed the (formerly wrongly) definition of damping factor.

changed the (formerly wrongly) definition of damping factor.

More in: Internet , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
87
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Google PageRank Yifan Li GA DC Data Science, 19 April 2014
  • 2. 2 Outline  What is PageRank  Why it is important  History of PageRank  Understand PageRank  Simplified PageRank Algorithm  Current state of the art
  • 3. What is PageRank  PageRank is a link analysis algorithm which assigns a numerical weighting to each Web page, with the purpose of "measuring" relative importance.  Based on the hyperlinks map  An excellent way to prioritize the results of web keyword searches
  • 4. 4 Why it is important • At the time that Page and Brin met, search engines typically linked to pages that had the highest keyword density, which meant people could game the system by repeating the same phrase over and over to attract higher search page results. • PageRank provides a Search Engine Optimization to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
  • 5. History of PageRank • PageRank was developed by Google founders Larry Page and Sergey Brin at Stanford. PageRank is patented by Stanford, and the name PageRank likely comes from Larry Page. • PageRank is now one of 200 ranking factors that Google uses to determine a page’s popularity. Even though PageRank is no longer directly important for SEO(Search Engine Optimization) purposes, the existence of back- links from more popular websites continues to push a webpage higher up in search rankings.
  • 6. 6 Understand PageRank  PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.
  • 7. Understand PageRank(cont.)  A "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back“, but eventually gets bored and starts on another random page.  d damping factor is the probability, at any step, that the surfer will continue surfing.(1- d) is the probability at each page the "random surfer" will get bored and request another random page. Google uses d as 0.85.  Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have PageRank zero.  A page can have a high PageRank  If there are many pages that point to it  Or if there are some pages that point to it, and have a high PageRank.
  • 8. Simplified PageRank algorithm  Assume four web pages: A, B,C and D. Let each page would begin with an estimated PageRank of 0.25.  L(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: A B C D A B C D
  • 9. Simplified PageRank algorithm(cont.)  Assume page A has pages B, C, D ..., which point to it. The parameter d is a damping factor which can be set between 0 and 1. Usually set d to 0.85. The PageRank of a page A is given as follows:
  • 10. State of the art • PageRank is now one of 200 ranking factors that Google uses to determine a page’s popularity. Google Panda is one of the other strategies Google now relies on to rank popularity of pages.Even though PageRank is no longer directly important for SEO(Search Engine Optimization) purposes, the existence of back-links from more popular websites continues to push a webpage higher up in search rankings.
  • 11. Thanks!