MotivationWhen searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web sites which usually is a huge set. So the ranking of these web sites is very important. Because much information is contained in the link-structure of the WWW, information such as which pages are linked to others can be used to augment search algorithms.
The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC EffectThe PageRank Citation Ranking: Bringing Order to the Web
Paper 1----SALSAauthorities: web pages that have many outlinks
hubs: web pages that point to many authoritative sites
Hubs and authorities form communities, the most  prominent community  is called the principal community.SALSA----IdeaSALSA is based upon the theory of Markov chains, and relies on the stochastic properties of random walks performed on our collection of sites.The input to our scheme consists of a collection ofsites C which is built around a topic t. Intuition suggests that authoritative sites on topic t should bevisible from many sites in the subgraph induced by C. Thus, a random walk on this subgraph will visitt-authorities with high probability.
  SALSA----IdeaCombine the theory of random walks with the notion of the two distinct types of Web sites, hubs and authorities, and actually analyze two different Markov chains: A chain of hubs and a chain of authorities. Analyzing both chains allows our approach to giveeach Web site two distinct scores, a hub score and an authority score.
SALSA----ComputingNow define two stochastic matrices, which are the transition matrices of the two Markov chains at interest:The hub-matrixH: The authority-matrixà:
SALSAthe principal community of authorities(hubs) found by the SALSA will be composed of the sites whose entries in the principal eigenvector of A (H) are the highest.
SALSA----Conclusion     SALSA is a new stochastic approach for link structure analysis, which examines random walks on graphs derived from the link structure. The principal community of authorities(hubs) corresponds to the sites that are most frequently visited by the random walk defined by the authority(hub) Markov chain.
The PageRank Citation Ranking:Bringing Order to the WebLarry Page etc.Stanford University
PageRank----IdeaEvery page has some number of forward links(outedges) and backlinks(inedges)
PageRank----IdeaTwo cases PageRank is interesting:Web pages vary greatly in terms of the number of backlinks they have. For example, the Netscape home page has 62,804 backlinks compared to most pages which have just a few backlinks. Generally, highly linked pages are more “important” than pages with few links.
PageRank----IdeaBacklinks coming from important pages convey more importance to a page. For example, if a web page has a link off the yahoo home page, it may be just one link but it is a very important one.A page has high rank if the sum of the ranks of its backlinks is high. This covers both the case when a page has many backlinks and when a page has a few highly ranked backlinks.
PageRank----Definitionu: a web pageFu:  set of pages u points to Bu:  set of pages that point to uNu=|Fu|:  the number of links from u c: a factor used for normalizationThe equation is recursive, but it may be computed by starting with any set of ranks and iterating the computation until it converges.
PageRank----definitionA problem with above definition: rank sinkIf two web pages point to each other but to no other page, during the iteration, this loop will accumulate rank but  never distribute any rank.
PageRank----definitionDefinition modified:E(u) is some vector over the web pages(for example uniform, favorite page etc.) that corresponds to a source of rank. E(u) is a user designed parameter.
PageRank----Random Surfer ModelThe definition corresponds to the probability distribution of a random walk on the web graphs.
E(u) can be thought as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever.PageRank----ConclutionPageRank is a global ranking based on the web's graph structure

Pagerank

  • 1.
    MotivationWhen searching forinformation on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web sites which usually is a huge set. So the ranking of these web sites is very important. Because much information is contained in the link-structure of the WWW, information such as which pages are linked to others can be used to augment search algorithms.
  • 2.
    The Stochastic Approachfor Link-Structure Analysis (SALSA) and the TKC EffectThe PageRank Citation Ranking: Bringing Order to the Web
  • 3.
    Paper 1----SALSAauthorities: webpages that have many outlinks
  • 4.
    hubs: web pagesthat point to many authoritative sites
  • 5.
    Hubs and authoritiesform communities, the most prominent community is called the principal community.SALSA----IdeaSALSA is based upon the theory of Markov chains, and relies on the stochastic properties of random walks performed on our collection of sites.The input to our scheme consists of a collection ofsites C which is built around a topic t. Intuition suggests that authoritative sites on topic t should bevisible from many sites in the subgraph induced by C. Thus, a random walk on this subgraph will visitt-authorities with high probability.
  • 6.
    SALSA----IdeaCombinethe theory of random walks with the notion of the two distinct types of Web sites, hubs and authorities, and actually analyze two different Markov chains: A chain of hubs and a chain of authorities. Analyzing both chains allows our approach to giveeach Web site two distinct scores, a hub score and an authority score.
  • 7.
    SALSA----ComputingNow define twostochastic matrices, which are the transition matrices of the two Markov chains at interest:The hub-matrixH: The authority-matrixà:
  • 8.
    SALSAthe principal communityof authorities(hubs) found by the SALSA will be composed of the sites whose entries in the principal eigenvector of A (H) are the highest.
  • 9.
    SALSA----Conclusion SALSA is a new stochastic approach for link structure analysis, which examines random walks on graphs derived from the link structure. The principal community of authorities(hubs) corresponds to the sites that are most frequently visited by the random walk defined by the authority(hub) Markov chain.
  • 10.
    The PageRank CitationRanking:Bringing Order to the WebLarry Page etc.Stanford University
  • 11.
    PageRank----IdeaEvery page hassome number of forward links(outedges) and backlinks(inedges)
  • 12.
    PageRank----IdeaTwo cases PageRankis interesting:Web pages vary greatly in terms of the number of backlinks they have. For example, the Netscape home page has 62,804 backlinks compared to most pages which have just a few backlinks. Generally, highly linked pages are more “important” than pages with few links.
  • 13.
    PageRank----IdeaBacklinks coming fromimportant pages convey more importance to a page. For example, if a web page has a link off the yahoo home page, it may be just one link but it is a very important one.A page has high rank if the sum of the ranks of its backlinks is high. This covers both the case when a page has many backlinks and when a page has a few highly ranked backlinks.
  • 14.
    PageRank----Definitionu: a webpageFu: set of pages u points to Bu: set of pages that point to uNu=|Fu|: the number of links from u c: a factor used for normalizationThe equation is recursive, but it may be computed by starting with any set of ranks and iterating the computation until it converges.
  • 15.
    PageRank----definitionA problem withabove definition: rank sinkIf two web pages point to each other but to no other page, during the iteration, this loop will accumulate rank but never distribute any rank.
  • 16.
    PageRank----definitionDefinition modified:E(u) issome vector over the web pages(for example uniform, favorite page etc.) that corresponds to a source of rank. E(u) is a user designed parameter.
  • 17.
    PageRank----Random Surfer ModelThedefinition corresponds to the probability distribution of a random walk on the web graphs.
  • 18.
    E(u) can bethought as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever.PageRank----ConclutionPageRank is a global ranking based on the web's graph structure