Comparative study of different ranking algorithms adopted by search engine
COMPARATIVE STUDY OF
DIFFERENT RANKING ALGORITHMS
ADOPTED BY SEARCH ENGINE
Under the guidance of ,
Dr. Manoj Wadhwa
When searching for information on the WWW, user perform a query
to a search engine. The engine return, as the query’s result, a list of
Web sites which usually is a huge set. So the ranking of these web
sites is very important. Because much information is contained in the
link-structure of the WWW, information such as which pages are
linked to others can be used to augment search algorithms.
It is so important for any web search engine to rank the pages with
the aim of providing more useful data, by listing the pages containing
the data at higher places, to the searcher about the searched keyword
So to be able to provide desired ordering for the web pages: A page
ranking algorithm is the technique to rank websites in their search
Together with the development of the Internet and the popularity of
World Wide Web, Web page ranking systems have drawn significant
Many Web Search Engines have been introduced until now, but still
have difficulty in providing completely relevant answers to the
general subject of queries.
The main reason is not the lack of data but rather an excess of data.
WHAT IS SEARCH ENGINE??
Web Search Engine is a tool enabling document search,
with respect to specified keywords, in the Web and returns
a list of documents where the keywords were found.
Early search engines mainly compare content similarity of the query
and the indexed pages.
From 1996, it became clear that content similarity alone was no
The number of pages grew rapidly in the mid-late 1990’s.
Content similarity is easily spammed.
A page owner can repeat some words and add many related
words to boost the rankings of his pages and/or to make the
pages relevant to a large number of queries.
Starting around 1996, researchers began to work on the problem.
They resort to hyperlinks.
Web pages on the other hand are connected through hyperlinks,
which carry important information.
Some hyperlinks: organize information at the same site.
Other hyperlinks: point to pages from other Web sites.
Those pages that are pointed to by many other pages are likely to
contain authoritative information.
During 1997-1998, two most influential hyperlink based search
algorithms PageRank and HITS were reported.
PageRank is an algorithm used by the Google web search
engine to rank websites in their search engine results.
PageRank works by counting the number and quality of
links to a page to determine a rough estimate of how
important the website is. The underlying assumption is
that more important websites are likely to receive more
links from other websites.
It is an excellent way to prioritize the result of web
Example of the PageRank indicator as found on the Google
The HITS algorithm stands for “Hypertext Induced Topic Selection”
and is used for rating and ranking websites based on the link
information when identifying topic areas.
Unlike PageRank which is a static ranking algorithm, HITS is search
It is a very popular and effective algorithm to rank documents based
on the link information among a set of documents.
An authority value is computed as the sum of the scaled hub values
that point to that page.
A hub value is the sum of the scaled authority values of the pages it
When the user issues a search query,
HITS first expands the list of relevant pages returned by a
search engine and then produces two rankings of the
expanded set of pages, authority ranking and hub ranking.
Authority: Roughly, a authority is a page with many in-links.
The idea is that the page may have good or authoritative
content on some topic and
thus many people trust it and link to it.
Hub: A hub is a page with many out-links.
The page serves as an organizer of the information on a
particular topic and
points to many good authority pages on the topic.
SALSA- The Stochastic Approach for Link- Structure
Analysis (Lempel, Moran 2001)
Probabilistic extension of the HITS algorithm
Combines ideas from both HITS and PAGERANK
Random walk is carried out by following hyperlinks
both in the forward and in the backward direction
SALSA uses authority and hub score
SALSA creates a neighborhood graph using authority and
hub pages and links
Weighted Page Rank algorithm is an extension of the PageRank algorithm.
This algorithm allocates a higher rank values to the more
significant pages rather than dividing the rank value of a
page evenly among its outgoing linked web pages.
Each outgoing link gets a value proportional to its
WPR takes into account the importance of both the inlinks
and outlinks of the pages and distributes rank scores based
on the popularity of the pages.
DISTANCE RANK ALGORITHM,
The distance between pages is considered as a factor.
The algorithm calculates the minimum average distance
between two web pages and more pages.
This adopts the Page-Rank properties i.e. the rank of each
page is computed as the weighted sum of ranks of all
incoming pages to that particular page.
Then, a page has a high page rank value if it has more
incoming links on a page.
TOPIC SENSITIVE PAGE-RANK
This algorithm computes the scores of web page according
to the importance of content available on web page.
Pages receiving only a few incoming links, but from very
related web sites, will be given much more consideration for
that topic. The result will be a higher Topic-Sensitive Page
Rank for that site, for that specific search query, despite a
lower Page Rank under the current system
DIFFERENT SEARCH ENGINES
CRITERI PAGERA HITS
Came into 1998
s based on
n among a
CRITERIA PAGERAN HITS
their in and
High. It is
but not as
Needs to work
The proposed work in the Page Rank algorithm includes
the implementation to solve the problem of Dangling Page.
Dangling pages are pages which do not have any outbound
link or the page which does not provide any reference to
other pages. These Dangling pages create many issues to
calculate efficient page rank of different pages of a
Mridula Batra, Sachin Sharma, “Comparative Study Of Page rank algorithm with different
ranking algorithms adopted by search engine for website ranking” , Int.J.Computer
Technology & Applications,Vol 4 (1), 8-18, Jan-Feb 2013
Ankur gupta, Rajni Jindal, “An overwiew of ranking algorithm for search engines”,
INDIAcom-2008CFND, Feb 08-09,2008
Alessio Signorini, “A Survey of Ranking Algorithms”, Department of Computer Science
University of Iowa, September 11, 2005
Mitali Desai, Sanjaysinh Parmar, Nitesh Shah, Jitendra Upadhyay, “A Study of different
Page Rank Algorithms: Issues”, International Journal of Computer Science Research &
Technology, ISSN: 2321-8827 IJCSRTIJCSRT www.ijcsrt.org IJCSRTV1IS040089 Vol. 1
Issue 4, September - 2013
Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual Web Search
Marc Najork, “Comparing the Effectiveness of HITS and SALSA”, Microsoft Research,
1065 La Avenida, Mountain View, CA 94043, USA, firstname.lastname@example.org.
Dilip Kumar Sharma, A.K.Sharma, ”A Comparative Analysis Of Web Page Ranking
Algorithms” in proceedings of the International Journal Computer Science and
Engineering,Vol. 02,No. 08,2010,2670-2676.
R. lempel and S. moran, “SALSA: The Stochastic Approach for Link-Structure Analysis”
Allan borodin, Gareth o. roberts, Ieffrey s. rosenthal and Panayiotis tsaparas, “Link
Analysis Ranking: Algorithms, Theory, and Experiments”