This presentation is based on ranking of web pages, mainly it consist of PageRank algorithm and HITS algorithm. It gives brief knowledge of how to calculate page rank by looking at the links between the pages. It tells you about different techniques of search engine optimization.
[DESCRIBES PAGE RANKING AND HITS ALGORITHM]
BY ANKIT RAJ
SEARCH ENGINE OPTIMIZATION [SEO]
TECHNIQUES OF SEO
TYPES OF RANKING ALGORITHM
PRECISION AND RECALL
The Internet is the global system of interconnected mainframe, personal,
and wireless computer networks that use the internet protocol
suit (TCP/IP) to link billions of devices worldwide.
It is a network of networks that consists of millions of private, public,
academic, business, and government networks of local to global scope.
The Web has also enabled individuals and organizations to publish ideas
and information to a potentially large audience online at greatly reduced
expense and time delay.
What is searching?????? Trying to find something by looking.
When its talk about searching on web, then we can’t search any specified
thing by just simply looking.
Because there huge and voluminous amount of data, files, directories and
content are present on web.
So we need a tool to search the required content on web. That tool is
A search engine is a software system that is designed to search for
information on the World Wide Web.
Examples are Google, Bing, Yahoo, etc….
SEARCH ENGINE OPTIMIZATION
[HOW ONE SEARCH ENGINE DIFFERS FROM OTHER OF ITS KIND]
Search engine optimization (SEO) is the process of affecting the visibility of
a website or a web page in a search engine.
The optimization techniques of the search engine differs from one search
engine to another.
The better the optimization technique they have, more will be the visitors
and then that will be considered as better search engine.
TECHNIQUE OF SEO
There are lots of parameters on which search engine efficiency and
effectiveness depends on but the basic among them are following:
What is rank? A position in a hierarchy or scale.
Searching anything on web using search engine will be a hectic task
without the use of proper ranking technique.
It is very important for any search engine to use algorithm to rank the
searched pages according to the requirement of user.
Because just simply giving the search result will not much pleased to the
user as compared to better ranked data.
TYPES OF RANKING ALGORITHMS
Text-based ranking algorithm: The ranking scheme used in the
conventional search engines is purely Text-Based i.e. the pages are ranked
based on their textual content and number of matched terms with the
query string. , which seems to be logical.
HITS (Hyperlink Induced Topic Search)
SALSA: The Stochastic Approach for Link- Structure Analysis. Probabilistic
extension of the HITS algorithm.
1st rank…..2nd rank……3rd rank……10th rank………….
Weighted Page Rank algorithm: Weighted Page Rank algorithm is an
extension of the Page-Rank algorithm. This algorithm allocates a higher
rank values to the more significant pages rather than dividing the rank
value of a page evenly among its outgoing linked web pages.
Distance Rank Algorithm: The distance between pages is considered as a
factor. The algorithm calculates the minimum average distance between
two or more web pages.
Topic sensitive Rank Algorithm : This algorithm computes the scores of
web page according to the importance of content available on web page.
In “PageRank” the page word is not for web page though it is used for
The PageRank algorithm originally developed at Stanford University by
Larry Page in 1996 as part of a research project about a new search
engine. So it got its name from Larry Page.
PageRank is an algorithm used by the Google web search engine to rank
websites in their search engine results.
The PageRank algorithm does not rank the whole website, but it’s
determined for each page individually.
Formula for calculating the web page rank :
PR(A) = PageRank of page A
T1….Tn=All pages that link to page A
PR(Ti) =Page rank of page Ti
C(Ti) =the number of pages to which Ti links to
d =damping factor which can be set between 0 and 1
Now lets take a look at how it works: http://www.math.cornell.edu/~mec/Winter2009/R
Now taking a look at 7th and 8th iteration, the values seems to become constant. So
this is the final rank value of algorithm.
The HITS algorithm stands for “Hypertext Induced Topic Selection” and is used
for rating and ranking websites based on the link information when identifying
Clever builds on the HITS (Hypertext-Induced Topic Search) algorithm
developed at IBM’s Almaden Research Lab in San Jose, CA.
Unlike PageRank which is a static ranking algorithm, HITS is search query
dependent. Thus, ranking of the web page is decided by analysing its textual
contents against a given query.
The algorithm produces two types of pages:
Authority: pages that provide an important.
Hub: pages that contain links to authorities
In this algorithm a web page is named as authority if the web page is
pointed by many hyper links and a web page is named as HUB if the page
point to various hyperlinks .
HITS is a topic specific search. First of all a subset of web pages containing
good hub and authority pages with respect to a query is created. This is
done by first firing the query and getting an initial set of documents
relevant to the query. This is called the root set for the query.
[Sources : International
Journal of Engineering
Research & Technology
(IJERT) Vol. 1 Issue 8,
October - 2012 ISSN: 2278-
PRECISION AND RECALL
[TO CHECK EFFICIENCY OF RANKING ALGORITHM]
precision (also called positive predictive value) is the fraction of retrieved instances
that are relevant, while recall (also known as sensitivity) is the fraction of relevant
instances that are retrieved.
Both precision and recall are therefore based on an understanding and measure
Comparison between SVM[space vector model] vs PageRank:
Comparison between HITS vs SVM:
To optimise the search we required a better ranking algorithm.
On the basis of this study we conclude that both page rank and HITS algorithm are
different link analysis algorithms that employ different models to calculate web
Page Rank is a more popular algorithm used as the basis for the very popular
Google search engine.
This popularity is due to the features like efficiency, feasibility, less query time cost,
less susceptibility to localized links etc. which are absent in HITS algorithm.
However though the HITS algorithm itself has not been very popular, different
extensions of the same have been employed in a number of different web sites.
The proposed work in the Page Rank algorithm includes the implementation to
solve the problem of Dangling Page. Dangling pages are pages which do not have
any outbound link or the page which does not provide any reference to other
pages. These Dangling pages create many issues to calculate efficient page rank of
different pages of a websites.
Even the work is going on to remove circular references, so that proper ranking
can be done.
International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 8,
October - 2012 ISSN: 2278-0181
International Journal of Advanced Research in Computer and Communication
Engineering,Vol. 3, Issue 2, February 2014. ISSN (Online) : 2278-1021.ISSN (Print) :