Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ranking algorithms

8,646 views

Published on

This presentation is based on ranking of web pages, mainly it consist of PageRank algorithm and HITS algorithm. It gives brief knowledge of how to calculate page rank by looking at the links between the pages. It tells you about different techniques of search engine optimization.

Published in: Internet

Ranking algorithms

  1. 1. RANKING ALGORITHMS [DESCRIBES PAGE RANKING AND HITS ALGORITHM] BY ANKIT RAJ 1309113012 [IT-1]
  2. 2. CONTENT  INTRODUCTION  SEARCHING  SEARCH ENGINE OPTIMIZATION [SEO]  TECHNIQUES OF SEO  RANKING  TYPES OF RANKING ALGORITHM  PAGERANK ALGORITHM  HITS ALGORITHM  PRECISION AND RECALL  CONCLUSION  FUTURE ASPECTS  REFERENCES
  3. 3. INTRODUCTION  The Internet is the global system of interconnected mainframe, personal, and wireless computer networks that use the internet protocol suit (TCP/IP) to link billions of devices worldwide.  It is a network of networks that consists of millions of private, public, academic, business, and government networks of local to global scope.  The Web has also enabled individuals and organizations to publish ideas and information to a potentially large audience online at greatly reduced expense and time delay. WEB…WEB…..WEB….SEARCH………
  4. 4. SEARCHING [SEARCH ENGINES]  What is searching?????? Trying to find something by looking.  When its talk about searching on web, then we can’t search any specified thing by just simply looking.  Because there huge and voluminous amount of data, files, directories and content are present on web.  So we need a tool to search the required content on web. That tool is search engine.  A search engine is a software system that is designed to search for information on the World Wide Web.  Examples are Google, Bing, Yahoo, etc….
  5. 5. SEARCH ENGINE OPTIMIZATION [HOW ONE SEARCH ENGINE DIFFERS FROM OTHER OF ITS KIND]  Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine.  The optimization techniques of the search engine differs from one search engine to another.  The better the optimization technique they have, more will be the visitors and then that will be considered as better search engine. [Sources: http://www.oshup.com/3- defining-parameters-for-search- engine-marketing/]
  6. 6. TECHNIQUE OF SEO There are lots of parameters on which search engine efficiency and effectiveness depends on but the basic among them are following: SEO links page update rank content Keywords Crawling indexing
  7. 7. RANKING  What is rank? A position in a hierarchy or scale.  Searching anything on web using search engine will be a hectic task without the use of proper ranking technique.  It is very important for any search engine to use algorithm to rank the searched pages according to the requirement of user.  Because just simply giving the search result will not much pleased to the user as compared to better ranked data. Sources: http://www.shutterstock.com/s/angry+person +computer/search.html
  8. 8. TYPES OF RANKING ALGORITHMS  Text-based ranking algorithm: The ranking scheme used in the conventional search engines is purely Text-Based i.e. the pages are ranked based on their textual content and number of matched terms with the query string. , which seems to be logical.  HITS (Hyperlink Induced Topic Search)  SALSA: The Stochastic Approach for Link- Structure Analysis. Probabilistic extension of the HITS algorithm.  PageRank algorithm 1st rank…..2nd rank……3rd rank……10th rank………….
  9. 9. .  Weighted Page Rank algorithm: Weighted Page Rank algorithm is an extension of the Page-Rank algorithm. This algorithm allocates a higher rank values to the more significant pages rather than dividing the rank value of a page evenly among its outgoing linked web pages.  Distance Rank Algorithm: The distance between pages is considered as a factor. The algorithm calculates the minimum average distance between two or more web pages.  Topic sensitive Rank Algorithm : This algorithm computes the scores of web page according to the importance of content available on web page.
  10. 10. PAGERANK ALGORITHM  In “PageRank” the page word is not for web page though it is used for ranking pages.  The PageRank algorithm originally developed at Stanford University by Larry Page in 1996 as part of a research project about a new search engine. So it got its name from Larry Page.  PageRank is an algorithm used by the Google web search engine to rank websites in their search engine results.  The PageRank algorithm does not rank the whole website, but it’s determined for each page individually.
  11. 11. .  Formula for calculating the web page rank :  PR(A)=(1-d)+d(PR(T1)/C(T1)+………+ PR(Tn)/C(Tn))  Where: PR(A) = PageRank of page A T1….Tn=All pages that link to page A PR(Ti) =Page rank of page Ti C(Ti) =the number of pages to which Ti links to d =damping factor which can be set between 0 and 1
  12. 12. Now lets take a look at how it works: http://www.math.cornell.edu/~mec/Winter2009/R alucaRemus/Lecture3/lecture3.html
  13. 13. STEP: 1 STEP: 2
  14. 14. . 0 0 0 ½ 1/3 0 0 0 1/3 1/2 0 ½ 1/3 1/2 0 0 A= V= 0.25 0.25 0.25 0.25 A matrix is made by studying graph of page relation. V matrix is made by 1/(number of pages).
  15. 15. . . 1st iteration: 2nd iteration: 3rd …4th…5th iteration:
  16. 16. . Now taking a look at 7th and 8th iteration, the values seems to become constant. So this is the final rank value of algorithm. 6th..7th..8th..iteration RANK 1—page 1 2—page 3 3—page 4 4—page 2
  17. 17. HITS ALGORITHM  The HITS algorithm stands for “Hypertext Induced Topic Selection” and is used for rating and ranking websites based on the link information when identifying topic areas.  Clever builds on the HITS (Hypertext-Induced Topic Search) algorithm developed at IBM’s Almaden Research Lab in San Jose, CA.  Unlike PageRank which is a static ranking algorithm, HITS is search query dependent. Thus, ranking of the web page is decided by analysing its textual contents against a given query.  The algorithm produces two types of pages: Authority: pages that provide an important. Hub: pages that contain links to authorities
  18. 18. .  In this algorithm a web page is named as authority if the web page is pointed by many hyper links and a web page is named as HUB if the page point to various hyperlinks .  HITS is a topic specific search. First of all a subset of web pages containing good hub and authority pages with respect to a query is created. This is done by first firing the query and getting an initial set of documents relevant to the query. This is called the root set for the query. [Sources : International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 8, October - 2012 ISSN: 2278- 0181]
  19. 19. PRECISION AND RECALL [TO CHECK EFFICIENCY OF RANKING ALGORITHM]  precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved.  Both precision and recall are therefore based on an understanding and measure of relevance. [Sources:www2.hawaii.edu/~donnab/lis670/]
  20. 20. Comparison between SVM[space vector model] vs PageRank: . [Sources:http://www.webology.org/2007/v4n3/a44.html]
  21. 21. Comparison between HITS vs SVM: . [Sources:http://www.webology.org/2007/v4n3/a44.html]
  22. 22. CONCLUSION  To optimise the search we required a better ranking algorithm.  On the basis of this study we conclude that both page rank and HITS algorithm are different link analysis algorithms that employ different models to calculate web page rank.  Page Rank is a more popular algorithm used as the basis for the very popular Google search engine.  This popularity is due to the features like efficiency, feasibility, less query time cost, less susceptibility to localized links etc. which are absent in HITS algorithm.  However though the HITS algorithm itself has not been very popular, different extensions of the same have been employed in a number of different web sites.
  23. 23. FUTURE ASPECTS  The proposed work in the Page Rank algorithm includes the implementation to solve the problem of Dangling Page. Dangling pages are pages which do not have any outbound link or the page which does not provide any reference to other pages. These Dangling pages create many issues to calculate efficient page rank of different pages of a websites.  Even the work is going on to remove circular references, so that proper ranking can be done.
  24. 24. REFERENCES  http://www.webology.org/2007/v4n3/a44.html  www2.hawaii.edu/~donnab/lis670/  International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 8, October - 2012 ISSN: 2278-0181  http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.ht ml  International Journal of Advanced Research in Computer and Communication Engineering,Vol. 3, Issue 2, February 2014. ISSN (Online) : 2278-1021.ISSN (Print) : 2319-5940
  25. 25. . .

×