2. What is PageRank?
• PageRank is a ranking algorithm first devised by Google's
founder Larry Page
• One of the most influential algorithms of the previous decade.
• Revolutionized ranking and it's application in IR made the
Search Engine giant it is today.
3. What does it do?
How does it work?
• PageRank is a link analysis algorithm that essentially
operates on the webgraph.
• PageRank works by counting the number and quality of
links to a page to determine a rough estimate of how
important the website is.
• The underlying assumption is that more important websites
are likely to receive more links from other websites.
4. Drawbacks of PageRank
• Page rank is a very compute intensive algorithm.
• In essence PageRank does matrix multiplication on huge matrix
sizes, this requires large amounts of computation resources.
• Best possible way to do matrix multiplication is power
exponentiation, but given the size of the webgraph (or even a
smaller corpus like wikipedia) the compute time still runs into hours
depending on the workstation.
• Every time a new page/node is introduced into the graph, the entire
computation needs to be re-run to get the new precise PageRank.
5. What is the Local Approximation of
PageRank? Why do we need it?
• Considering all the disadvantages of the original PageRank
algorithm a number of improvements for it have been proposed
over the years.
!
• We use a local approximation technique to drastically reduce the
computation time while still keeping the results accurate and
relevant within a certain degree.
!
• Instead of computing the PageRank of a page after traversing the
entire webgraph, we iterate over a sub-section of it and
approximate the PageRank. This approximation is called the Local
Approximation of the PageRank.
6. Advantages
• Drastically reduced computation time.
• Reduced compute power requirements implies it can be run
on a regular workstation and still generate respectable results.
• If and when a new node/page is introduced into the webgraph
it's PageRank can be quickly computed by a local
approximation rather than waiting for the entire PageRank
computations to finish.
This allows for faster indexing and retrieval, thus giving us a
fresher index.
7. Drawbacks
• Results can vary drastically depending on the density of links in
the webgraph being processed.
• The PageRank computed is an approximation although the
accuracy of the results can be improved if the the radius of the
sub-graph being considered for approximation is increased.
• Increasing the radius of the subgraph yields diminishing
returns as compared to the increased compute power being
added.
10. Screenshots
The Local Approximation PageRank algorithm is a backend feature and thus is not
directly visible to nor interacts with the end user.
Our code is designed to be integration friendly rather than for stand-alone usage by
the layman.
As a consequence of this we provide a cli interface rather than a un-integratable GUI.
11. Documentation and Code
Source code: https://bitbucket.org/pagerank/lapagerank/
!
"Talk is cheap, show me the code" -- Linus Torvalds
Website Link: http://web.iiit.ac.in/~rishi.mittal/ire/
!
"Ink is better than the best memory." -- Chinese proverb