Your SlideShare is downloading. ×
Google PageRank
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Google PageRank

8,964
views

Published on

Explanation of Google's PageRank algorithm with some examples followed by a discussion of related Search engine optimisation (SEO) issues.

Explanation of Google's PageRank algorithm with some examples followed by a discussion of related Search engine optimisation (SEO) issues.

Published in: Technology

1 Comment
18 Likes
Statistics
Notes
No Downloads
Views
Total Views
8,964
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
289
Comments
1
Likes
18
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Google PageRankProf. Beat Signer<bsigner@vub.ac.be>Department of Computer ScienceVrije Universiteit Brusselhttp://www.beatsigner.com December 9, 2008
  • 2. Overview History of PageRank PageRank algorithm Examples Implications for website developmentDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 2
  • 3. History of PageRank Developed as part of an academic project at Stanford University  research platform to aid under- Larry Page Sergey Brin standing of large-scale web data and enable researches to easily experiment with new search technologies  Larry Page and Sergey Brin worked on the project about a new kind of search engine (1995-1998) which finally led to a functional prototype called GoogleDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 3
  • 4. Web Search Until 1998 Find all documents using a query term  use information retrieval (IR) solutions  ranking based on "on-page factors"  problem: poor quality of search results (order) Page and Brin proposed to compute the absolute qualtity of a page (PageRank)  based on the number and quality of pages linking to a page (votes)December 9, 2008 Beat Signer, signer@inf.ethz.ch 4
  • 5. PageRank P1 P5 P7 P8 R1 R5 R7 R8 P2 P3 P4 P6 R2 R3 R4 R6 A page has a high PageRank R if  there are many pages linking to it  or, if there are some pages with a high PageRank linking to it Total score = IR score x PageRankDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 5
  • 6. PageRank Algorithm R( Pj )R( Pi )   Pj Bi Lj P1 P2 1.5 1 1.5 1 where  Bi is the set of pages that link to page Pi P3  Lj is the number of 0.75 1 outgoing links for page PjDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 6
  • 7. Matrix Representation Let us define a hyperlink P1 P2 matrix H 1 L j if Pj  BiH ij    0 otherwise 0 1 2 1 and R  RPi  P3 H  1 0 0   R  HR 0 1 2 0   R is an eigenvector of Hwith eigenvalue 1December 9, 2008 Beat Signer, signer@inf.ethz.ch 7
  • 8. Matrix Representation ... We can use the power method to find R t 1R  HR t 0 1 2 1 For our example H  1 0 0   0 1 2 0   this results in R  2 2 1 or 0.4 0.4 0.2December 9, 2008 Beat Signer, signer@inf.ethz.ch 8
  • 9. Dangling Pages Problem with pages P1 P2 C that have no outbound links (P2) 0 0  CH  and R  0 0 1 0 0 1 2  0 1 2C  and S  H  C     0 1 2 1 1 2December 9, 2008 Beat Signer, signer@inf.ethz.ch 9
  • 10. Strongly Connected Pages (Graph) 1-d Add new transition P1 P2 probabilities between all pages  with probability d we follow P4 P3 the hyperlink structure S  with probability 1-d we choose a random page P5 1-d 1-dG  1  d  1  dS 1 R  GR nDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 10
  • 11. G  1  d  1  dS 1Examples n A2 0.37 A1 A3 0.26 0.37December 9, 2008 Beat Signer, signer@inf.ethz.ch 11
  • 12. G  1  d  1  dS 1Examples ... n A2 B2 0.185 0.185 A1 A3 B1 B3 0.13 0.185 0.13 0.185 P A  0.5 PB   0.5December 9, 2008 Beat Signer, signer@inf.ethz.ch 12
  • 13. G  1  d  1  dS 1Examples ... n A2 B2 0.14 0.20 A1 A3 B1 B3 0.10 0.14 0.22 0.20 P A  0.38 PB   0.62December 9, 2008 Beat Signer, signer@inf.ethz.ch 13
  • 14. G  1  d  1  dS 1Examples ... n A2 B2 0.23 0.095 A1 A3 B1 B3 0.3 0.18 0.10 0.095 P A  0.71 PB   0.29December 9, 2008 Beat Signer, signer@inf.ethz.ch 14
  • 15. G  1  d  1  dS 1Examples ... n A2 B2 0.24 0.07 A1 A3 B1 B3 0.35 0.18 0.09 0.07 P A  0.77 PB   0.23December 9, 2008 Beat Signer, signer@inf.ethz.ch 15
  • 16. G  1  d  1  dS 1Examples ... n A2 B2 0.17 0.06 A1 A3 B1 B3 0.33 0.175 0.08 0.06 A4 PB   0.20P A  0.80 0.125December 9, 2008 Beat Signer, signer@inf.ethz.ch 16
  • 17. Implications for Website Development First make sure that your page gets indexed  on-page factors Think about your sites internal link structure  create many internal links for important pages  be "careful" about where to put outgoing links Increase the number of pages Ensure that webpages are addressed consistently  http://www.vub.ac.be  http://www.vub.ac.be/index.php Make sure that you get links from good websitesDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 17
  • 18. Consistent Addressing of WebpagesDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 18
  • 19. Search Engine Optimisations (SEO) Internet marketing has become a big business  white hat and black hat optimisations Bad ranking or removal from index can cost a company a lot of money  e.g. supplemental index ("Google hell")December 9, 2008 Beat Signer, signer@inf.ethz.ch 19
  • 20. Black Hat Optimisations (Donts) Link farms Spamdexing in guestbooks, Wikipedia etc.  "solution": <a rel="nofollow" href="…">…</a> Doorway pages (cloaking)  e.g. BMW Germany and Ricoh Germany banned in February 2006 Selling/buying links ...December 9, 2008 Beat Signer, signer@inf.ethz.ch 20
  • 21. On-Page Factors (Speculative) It is assumed that there are over 200 on-page and off-page factors Positive factors  keyword in title tag  keyword in URL  keyword in domain name  quality of HTML code  page freshness (occasional changes)  …December 9, 2008 Beat Signer, signer@inf.ethz.ch
  • 22. On-Page Factors (Speculative) … Negative factors  links to "bad neighbourhood"  over optimisation penalty (keyword stuffing)  text with same colour as background (hidden content)  automatic redirects via the refresh meta tag  any copyright violations  …December 9, 2008 Beat Signer, signer@inf.ethz.ch
  • 23. Off-Page Factors (Speculative) Positive factors  high PageRank  anchor text of inbound links  links from authority sites (Hilltop algorithm)  listed in DMOZ (ODP) and Yahoo directories  site age (stability)  domain expiration date  …December 9, 2008 Beat Signer, signer@inf.ethz.ch
  • 24. Off-Page Factors (Speculative) … Negative factors  link buying (fast increasing number of inbound links)  link farms  cloaking (different pages for spider and user)  limited (temporal) availability of site  links from bad neighbourhood?  competitor attack (e.g. duplicate content)?  …December 9, 2008 Beat Signer, signer@inf.ethz.ch
  • 25. Tools Google toolbar  PageRank information not frequently updated Google webmaster tools  meta description issues  title tag issues  non-indexable content issues  number and URLs of indexed pages  number and URLs of inbound/outbound links  ...December 9, 2008 Beat Signer, signer@inf.ethz.ch 25
  • 26. Questions Is PageRank fair? What about Googles power and influence?December 9, 2008 Beat Signer, signer@inf.ethz.ch 26
  • 27. Conclusions PageRank algorithm  absolute quality of a page based on incoming links  random surfer model  computed as eigenvector of Google matrix G Implications for website development and SEO PageRank is just one (important) factorDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 27
  • 28. References The PageRank Citation Ranking: Bringing Order to the Web, L. Page, S. Brin, R. Motwani and T. Winograd, January 1998 The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin and L. Page, Computer Networks and ISDN Systems, 30(1-7), April 1998December 9, 2008 Beat Signer, signer@inf.ethz.ch
  • 29. References … PageRank Uncovered, C. Ridings and M. Shishigin, September 2002 PageRank Calculator, http://www.webworkshop.net/pagerank_ calculator.phpDecember 9, 2008 Beat Signer, signer@inf.ethz.ch 29