SlideShare a Scribd company logo
Link analysis

                                  Carlos Castillo

                              Outline

                              Motivation

                              Bibliometrics
                              Bibliometric laws
                              Measures of similarity

                              Ranking
   Link analysis              Pagerank
                              HITS

                              Web graph
                              characterization
                              Scale-free networks
     Carlos Castillo          Macroscopic structure

                              Summary
  Center for Web Research     References
Computer Science Department
    University of Chile
        www.cwr.cl
Link analysis

Motivation                       Carlos Castillo

                             Outline
Bibliometrics                Motivation
   Bibliometric laws         Bibliometrics
   Measures of similarity    Bibliometric laws
                             Measures of similarity

                             Ranking
Ranking                      Pagerank
                             HITS

  Pagerank                   Web graph
                             characterization
  HITS                       Scale-free networks
                             Macroscopic structure

                             Summary
Web graph characterization
                             References
  Scale-free networks
  Macroscopic structure

Summary

References
Motivation                                                 Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
Hyperlinks are not at random, they provide valuable   Measures of similarity

                                                      Ranking
information for:                                      Pagerank
                                                      HITS
    Link-based ranking                                Web graph
                                                      characterization
    Structure analysis                                Scale-free networks
                                                      Macroscopic structure
    Detection of communities                          Summary

    Spam detection                                    References

     ...
Topical locality                                            Link analysis

                                                           Carlos Castillo

                                                       Outline

                                                       Motivation

                                                       Bibliometrics
                                                       Bibliometric laws
                                                       Measures of similarity

                                                       Ranking
    “We found that pages are significantly more         Pagerank
                                                       HITS
    likely to be related topically to pages to which   Web graph
    they are linked, as opposed to other pages         characterization
                                                       Scale-free networks
    selected at random or other nearby pages.”         Macroscopic structure

                                                       Summary
    [Davison, 2000]
                                                       References
What type of relationship?                                Link analysis

                                                         Carlos Castillo

                                                     Outline

                                                     Motivation

                                                     Bibliometrics
                                                     Bibliometric laws
                                                     Measures of similarity
Link from P to P means that the content of P is      Ranking
                                                     Pagerank
endorsed by author of P , but links can also mean:   HITS

    Disagreement                                     Web graph
                                                     characterization
                                                     Scale-free networks
    Self-citation                                    Macroscopic structure

    Citation to popular document                     Summary

                                                     References
    Citation to methodological document
Bibliometrics                                                       Link analysis

                                                                   Carlos Castillo

                                                               Outline

                                                               Motivation

                                                               Bibliometrics
                                                               Bibliometric laws
                                                               Measures of similarity

                                                               Ranking
                                                               Pagerank
                                                               HITS

Quantitative analysis and statistics to describe patterns of   Web graph
                                                               characterization
publication.                                                   Scale-free networks
                                                               Macroscopic structure

                                                               Summary

                                                               References
Example 1: Lotka’s Law                           Link analysis

                                                Carlos Castillo

                                            Outline

                                            Motivation

                                            Bibliometrics
                                            Bibliometric laws
                                            Measures of similarity


Fraction of authors with n papers ∝ 1/n2    Ranking
                                            Pagerank
                                            HITS
    1 publication = 0.60                    Web graph
    2 publications = (1/2)2 × 0.60 = 0.15   characterization
                                            Scale-free networks
                                            Macroscopic structure
    3 publications = (1/3)2 × 0.60 = 0.07   Summary
    7 publications = (1/7)2 × 0.60 = 0.01   References
Example 2: Bradford’s Law                                       Link analysis

                                                               Carlos Castillo

                                                           Outline

                                                           Motivation

                                                           Bibliometrics
                                                           Bibliometric laws
                                                           Measures of similarity

                                                           Ranking
                                                           Pagerank
                                                           HITS
Journals in a field can be divided in 3 equal parts (each   Web graph
part with the same number of journals). The number of      characterization
                                                           Scale-free networks
papers in each part will be ∝ 1 : n : n2 .                 Macroscopic structure

                                                           Summary

                                                           References
Counting the number of citations                                Link analysis

                                                               Carlos Castillo

                                                           Outline

                                                           Motivation

                                                           Bibliometrics
Problems:                                                  Bibliometric laws
                                                           Measures of similarity

    Quantity, not quality                                  Ranking
                                                           Pagerank

    Self-citations are frequent                            HITS

                                                           Web graph
    In some fields there are many publications, in others   characterization
                                                           Scale-free networks
    there are less                                         Macroscopic structure

                                                           Summary
    Citations go from newer to older article
                                                           References
    New documents have few citations
    1/3 of the citations in a paper are not relevant
Measures of similarity                                    Link analysis

                                                         Carlos Castillo

                                                     Outline

                                                     Motivation

                                                     Bibliometrics
                                                     Bibliometric laws
                                                     Measures of similarity

Studying the citations of papers:                    Ranking
                                                     Pagerank

    Bibliographic coupling: two documents share a    HITS

                                                     Web graph
    significant portion of their bibliographies       characterization
                                                     Scale-free networks
    Co-citation: two documents are cited             Macroscopic structure

                                                     Summary
    simultaneously by a significant number of other
                                                     References
    documents
Bibliographic coupling        Link analysis

                             Carlos Castillo

                         Outline

                         Motivation

                         Bibliometrics
                         Bibliometric laws
                         Measures of similarity

                         Ranking
                         Pagerank
                         HITS

                         Web graph
                         characterization
                         Scale-free networks
                         Macroscopic structure

                         Summary

                         References
Co-citation        Link analysis

                  Carlos Castillo

              Outline

              Motivation

              Bibliometrics
              Bibliometric laws
              Measures of similarity

              Ranking
              Pagerank
              HITS

              Web graph
              characterization
              Scale-free networks
              Macroscopic structure

              Summary

              References
Types of ranking                     Link analysis

                                    Carlos Castillo

                                Outline

                                Motivation

                                Bibliometrics
                                Bibliometric laws
                                Measures of similarity

                                Ranking
                                Pagerank
                                HITS

    Query-independent ranking   Web graph
                                characterization
    Query-dependent ranking     Scale-free networks
                                Macroscopic structure

                                Summary

                                References
Adversarial-IR in link-based ranking                         Link analysis

                                                            Carlos Castillo

                                                        Outline

                                                        Motivation

                                                        Bibliometrics
    “With a simple program, huge numbers of             Bibliometric laws
                                                        Measures of similarity

    pages can be created easily, artificially inflating   Ranking
                                                        Pagerank
    citation counts. Because the Web environment        HITS

    contains profit seeking ventures, attention          Web graph
                                                        characterization
    getting strategies evolve in response to search     Scale-free networks
                                                        Macroscopic structure
    engine algorithms. For this reason, any             Summary
    evaluation strategy which counts replicable         References
    features of web pages is prone to
    manipulation” [Page et al., 1998].
Notation                                           Link analysis

                                                  Carlos Castillo

                                              Outline

                                              Motivation

                                              Bibliometrics
                                              Bibliometric laws
                                              Measures of similarity

                                              Ranking
       Symbol             Meaning             Pagerank
                                              HITS

          p             A Web page            Web graph
                                              characterization
        Γ+ (p)   Links pointing from page p   Scale-free networks

        Γ− (p)
                                              Macroscopic structure
                  Links pointing to page p    Summary

                                              References
Hyperlink vector voting                                    Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
                                                      Measures of similarity
Each link counts as a “vote” on certain keywords
                                                      Ranking
[Li, 1998]:                                           Pagerank
                                                      HITS

    h(P , P, w ) = 1 ⇐⇒ there is a link from P to P   Web graph
                                                      characterization
    with anchor text w                                Scale-free networks
                                                      Macroscopic structure
    HVV (P, w ) =    P ∈Γ− (P) h(P   , P, w )         Summary

    Only the count of links is used ⇒ easy to         References

    manipulate
Pagerank                                                   Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
Measure of importance of a page, defined recursively   Measures of similarity

[Page et al., 1998]: A page with high Pagerank is a   Ranking
                                                      Pagerank
page referenced by many pages with high Pagerank      HITS

                                                      Web graph
                                                      characterization
This is a simplified version:                          Scale-free networks
                                                      Macroscopic structure

                                                      Summary
                                      Pagerank (x)
          Pagerank (P) =                              References
                                        |Γ+ (x)|
                           x∈Γ− (P)
Iterations with seudo-Pagerank                        Link analysis

                                                     Carlos Castillo

                                                 Outline

                                                 Motivation

                                                 Bibliometrics
                                                 Bibliometric laws
                                                 Measures of similarity

                                                 Ranking
                                                 Pagerank
                                                 HITS

                                                 Web graph
                                                 characterization
                                                 Scale-free networks
                                                 Macroscopic structure

                                                 Summary

                                                 References




Note: we should normalize after each iteration
Convergence with seudo-Pagerank        Link analysis

                                      Carlos Castillo

                                  Outline

                                  Motivation

                                  Bibliometrics
                                  Bibliometric laws
                                  Measures of similarity

                                  Ranking
                                  Pagerank
                                  HITS

                                  Web graph
                                  characterization
                                  Scale-free networks
                                  Macroscopic structure

                                  Summary

                                  References
Random jumps                                             Link analysis

                                                        Carlos Castillo

                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
   So far, so good, but ...                         Pagerank
                                                    HITS
   The Web includes many pages with no out-links,   Web graph
                                                    characterization
   these will accumulate all of the score           Scale-free networks
                                                    Macroscopic structure
   We would like Web pages to accumulate ranking    Summary

   We add random jumps (teleportation)              References
Random jumps        Link analysis

                   Carlos Castillo

               Outline

               Motivation

               Bibliometrics
               Bibliometric laws
               Measures of similarity

               Ranking
               Pagerank
               HITS

               Web graph
               characterization
               Scale-free networks
               Macroscopic structure

               Summary

               References
Random jumps to single node        Link analysis

                                  Carlos Castillo

                              Outline

                              Motivation

                              Bibliometrics
                              Bibliometric laws
                              Measures of similarity

                              Ranking
                              Pagerank
                              HITS

                              Web graph
                              characterization
                              Scale-free networks
                              Macroscopic structure

                              Summary

                              References
Pagerank with random jumps                                        Link analysis

                                                                 Carlos Castillo

                                                             Outline

                                                             Motivation

                                                             Bibliometrics
                                                             Bibliometric laws
                                                             Measures of similarity

                                                             Ranking
                                                             Pagerank
                                               Pagerank(x)   HITS
                         + (1 − )
    Pagerank(P) =
                                                 |Γ+ (x)|
                     N                                       Web graph
                                    x∈Γ− (P)                 characterization
                                                             Scale-free networks
                                                             Macroscopic structure

                                                             Summary

                                                             References


            ≈ 0.15
Typically
Pagerank calculation                                            Link analysis

                                                               Carlos Castillo

                                                           Outline

                                                           Motivation

                                                           Bibliometrics
                                                           Bibliometric laws
                                                           Measures of similarity

                                                           Ranking
    Iterative calculation                                  Pagerank
                                                           HITS
    Exploit blocks in the matrix – locality of links       Web graph
                                                           characterization
    The extra node can be added after the last iteration   Scale-free networks
                                                           Macroscopic structure
    Dangling nodes can be added after the last iteration   Summary

    [Eiron et al., 2004]                                   References
Alternatives                                             Link analysis

                                                        Carlos Castillo

                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
                                                    Pagerank
    Ranking as a vector: topic-sensitive Pagerank   HITS

    [Haveliwala, 2002]                              Web graph
                                                    characterization
    Network flows: TrafficRank [Tomlin, 2003]          Scale-free networks
                                                    Macroscopic structure

    Dynamic absorbing model [Amati et al., 2003]    Summary

                                                    References
Dynamic absorbing model                                  Link analysis

                                                        Carlos Castillo

                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
                                                    Pagerank
                                                    HITS

                                                    Web graph
                                                    characterization
                                                    Scale-free networks
                                                    Macroscopic structure

                                                    Summary

                                                    References




Scores are the probabilities of the “clone” nodes
Hubs and authorities        Link analysis

                           Carlos Castillo

                       Outline
[Kleinberg, 1999]      Motivation

                       Bibliometrics
                       Bibliometric laws
                       Measures of similarity

                       Ranking
                       Pagerank
                       HITS

                       Web graph
                       characterization
                       Scale-free networks
                       Macroscopic structure

                       Summary

                       References
Algorithm                                                  Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
                                                      Measures of similarity

                                                      Ranking
                                                      Pagerank
                                                      HITS
      Obtain a results set using text search
  1
                                                      Web graph
                                                      characterization
      Expand the results set with in- and out-links
  2
                                                      Scale-free networks
                                                      Macroscopic structure
      Calculate hubs and authorities
  3
                                                      Summary

                                                      References
Expansion phase        Link analysis

                      Carlos Castillo

                  Outline

                  Motivation

                  Bibliometrics
                  Bibliometric laws
                  Measures of similarity

                  Ranking
                  Pagerank
                  HITS

                  Web graph
                  characterization
                  Scale-free networks
                  Macroscopic structure

                  Summary

                  References
Calculation phase                                            Link analysis

                                                            Carlos Castillo

                                                        Outline

                                                        Motivation

                                                        Bibliometrics
Initialize:                                             Bibliometric laws
                                                        Measures of similarity
                 hub(p, 0) = auth(p, 0) = 1
                                                        Ranking
                                                        Pagerank
Iterate:                                                HITS

                                                        Web graph
                                       auth(x, t − 1)   characterization
              hub(p, t) =                               Scale-free networks
                                          |Γ− (x)|      Macroscopic structure
                            x∈Γ− (p)                    Summary

                                                        References
                                        hub(x, t − 1)
              auth(p, t) =
                                          |Γ− (x)|
                             x∈Γ− (p)
Web graph characterization                                 Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
                                                      Measures of similarity

                                                      Ranking
                                                      Pagerank
   “While entirely of human design, the emerging      HITS

   network appears to have more in common with        Web graph
                                                      characterization
   a cell or an ecological system than with a Swiss   Scale-free networks
                                                      Macroscopic structure
   watch.” [Barab´si, 2001]
                   a                                  Summary

                                                      References
The Web is not a random network            Link analysis

                                          Carlos Castillo

                                      Outline
Scale-free network [Barab´si, 2002]
                         a            Motivation

                                      Bibliometrics
                                      Bibliometric laws
                                      Measures of similarity

                                      Ranking
                                      Pagerank
                                      HITS

                                      Web graph
                                      characterization
                                      Scale-free networks
                                      Macroscopic structure

                                      Summary

                                      References
Relationship of Web links with goods        Link analysis

                                           Carlos Castillo
imports
                                       Outline

                                       Motivation

[To be available]                      Bibliometrics
                                       Bibliometric laws
                                       Measures of similarity

                                       Ranking
                                       Pagerank
                                       HITS

                                       Web graph
                                       characterization
                                       Scale-free networks
                                       Macroscopic structure

                                       Summary

                                       References
Other scale-free networks                        Link analysis

                                                Carlos Castillo

                                            Outline

                                            Motivation

                                            Bibliometrics
                                            Bibliometric laws
                                            Measures of similarity

                                            Ranking
    Power grid designs                      Pagerank
                                            HITS
    Sexual partners in humans               Web graph
                                            characterization
    Collaboration of movie actors in films   Scale-free networks
                                            Macroscopic structure
    Citations in scientific publications     Summary

    Protein interactions                    References
Distribution of degree        Link analysis

                             Carlos Castillo

                         Outline

                         Motivation

                         Bibliometrics
                         Bibliometric laws
                         Measures of similarity

                         Ranking
                         Pagerank
                         HITS

                         Web graph
                         characterization
                         Scale-free networks
                         Macroscopic structure

                         Summary

                         References
Giant strongly-connected component                            Link analysis

                                                             Carlos Castillo

                                                         Outline

                                                         Motivation

                                                         Bibliometrics
                                                         Bibliometric laws
                                                         Measures of similarity
[Broder et al., 2000]
                                                         Ranking
    The core of the Web is a strongly-connected          Pagerank
                                                         HITS

    component (MAIN)                                     Web graph
                                                         characterization
    Nodes reachable from this component are in the OUT   Scale-free networks
                                                         Macroscopic structure
    component                                            Summary

    Nodes that reach this component are in the IN        References

    component
Bow-tie structure        Link analysis

                        Carlos Castillo

                    Outline

                    Motivation

                    Bibliometrics
                    Bibliometric laws
                    Measures of similarity

                    Ranking
                    Pagerank
                    HITS

                    Web graph
                    characterization
                    Scale-free networks
                    Macroscopic structure

                    Summary

                    References
Bow-tie structure and details           Link analysis

                                       Carlos Castillo

                                   Outline
[Baeza-Yates and Castillo, 2001]   Motivation

                                   Bibliometrics
                                   Bibliometric laws
                                   Measures of similarity

                                   Ranking
                                   Pagerank
                                   HITS

                                   Web graph
                                   characterization
                                   Scale-free networks
                                   Macroscopic structure

                                   Summary

                                   References
Bow-tie and search behavior        Link analysis

                                  Carlos Castillo

                              Outline

                              Motivation

                              Bibliometrics
                              Bibliometric laws
                              Measures of similarity

                              Ranking
                              Pagerank
                              HITS

                              Web graph
                              characterization
                              Scale-free networks
                              Macroscopic structure

                              Summary

                              References
Summary                                                  Link analysis

                                                        Carlos Castillo

                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
                                                    Pagerank
                                                    HITS
   Hyperlinks should be exploited for extracting    Web graph
   information                                      characterization
                                                    Scale-free networks

   Link analysis should exploit global properties   Macroscopic structure

                                                    Summary

                                                    References
Amati, G., Ounis, I., and V., P. (2003).                   Link analysis

The dynamic absorbing model for the web.                  Carlos Castillo

Technical Report TR-2003-137, Department of           Outline
Computing Science, University of Glasgow.             Motivation

                                                      Bibliometrics
Baeza-Yates, R. and Castillo, C. (2001).              Bibliometric laws
Relating Web characteristics with link based Web      Measures of similarity

                                                      Ranking
page ranking.                                         Pagerank
                                                      HITS
In Proceedings of String Processing and Information
                                                      Web graph
Retrieval, pages 21–32, Laguna San Rafael, Chile.     characterization
                                                      Scale-free networks
IEEE CS Press.                                        Macroscopic structure

                                                      Summary
Barab´si, A.-L. (2001).
     a
                                                      References
The physics of the web.
PhysicsWeb.ORG, online journal.
Barab´si, A.-L. (2002).
     a
Linked: the new science of networks.
Perseus Publishing.
Broder, A., Kumar, R., Maghoul, F., Raghavan, P.,            Link analysis

Rajagopalan, S., Stata, R., Tomkins, A., and Wiener,        Carlos Castillo

J. (2000).                                              Outline
Graph structure in the web: Experiments and models.     Motivation

                                                        Bibliometrics
In Proceedings of the Ninth Conference on World         Bibliometric laws
                                                        Measures of similarity

Wide Web, pages 309–320, Amsterdam, Netherlands.        Ranking
                                                        Pagerank
                                                        HITS
Davison, B. D. (2000).
                                                        Web graph
Topical locality in the web.                            characterization
                                                        Scale-free networks
In Proceedings of the 23rd annual international ACM     Macroscopic structure

SIGIR conference on research and development in         Summary

information retrieval, pages 272–279. ACM Press.        References


Eiron, N., McCurley, K. S., and Tomlin, J. A. (2004).
Ranking the web frontier.
In Proceedings of the 13th international conference
on World Wide Web, pages 309–318. ACM Press.
Haveliwala, T. H. (2002).
Topic-sensitive pagerank.
In Proceedings of the Eleventh World Wide Web              Link analysis
Conference, pages 517–526, Honolulu, Hawaii, USA.         Carlos Castillo
ACM Press.
                                                      Outline
Kleinberg, J. M. (1999).                              Motivation
Authoritative sources in a hyperlinked environment.   Bibliometrics
Journal of the ACM, 46(5):604–632.                    Bibliometric laws
                                                      Measures of similarity

                                                      Ranking
Li, Y. (1998).                                        Pagerank

Toward a qualitative search engine.                   HITS

                                                      Web graph
IEEE Internet Computing, pages 24 – 29.               characterization
                                                      Scale-free networks
                                                      Macroscopic structure
Page, L., Brin, S., Motwani, R., and Winograd, T.
                                                      Summary
(1998).
                                                      References
The Pagerank citation algorithm: bringing order to
the web.
In Proceedings of the seventh conference on World
Wide Web, Brisbane, Australia.
Tomlin, J. A. (2003).
A new paradigm for ranking pages on the world wide
web.
In Proceedings of the Twelfth Conference on World        Link analysis
Wide Web, pages 350–355, Budapest, Hungary. ACM         Carlos Castillo
Press.
                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
                                                    Pagerank
                                                    HITS

                                                    Web graph
                                                    characterization
                                                    Scale-free networks
                                                    Macroscopic structure

                                                    Summary

                                                    References
Link analysis

    Carlos Castillo

Outline

Motivation

Bibliometrics
Bibliometric laws
Measures of similarity

Ranking
Pagerank
HITS

Web graph
characterization
Scale-free networks
Macroscopic structure

Summary

References
Link analysis

    Carlos Castillo

Outline

Motivation

Bibliometrics
Bibliometric laws
Measures of similarity

Ranking
Pagerank
HITS

Web graph
characterization
Scale-free networks
Macroscopic structure

Summary

References

More Related Content

What's hot

Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Ashis Chanda
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challengeGan Keng Hoon
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsVaibhav Khanna
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalA. LE
 
Text Data Mining
Text Data MiningText Data Mining
Text Data MiningKU Leuven
 
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-InformationMeta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-Informationrecsysfr
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 
Slide 6 er strong & weak entity
Slide 6 er  strong & weak entitySlide 6 er  strong & weak entity
Slide 6 er strong & weak entityVisakh V
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalizationAmeenababs
 
Page rank
Page rankPage rank
Page rankCarlos
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explainedjdhaar
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Primya Tamil
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patternsKrish_ver2
 

What's hot (20)

Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 
Google PageRank
Google PageRankGoogle PageRank
Google PageRank
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information Retrieval
 
Text Data Mining
Text Data MiningText Data Mining
Text Data Mining
 
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-InformationMeta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Slide 6 er strong & weak entity
Slide 6 er  strong & weak entitySlide 6 er  strong & weak entity
Slide 6 er strong & weak entity
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalization
 
Page rank
Page rankPage rank
Page rank
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Classification of data
Classification of dataClassification of data
Classification of data
 
Lec 1 indexing and hashing
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explained
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Link Analysis

  • 1. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Link analysis Pagerank HITS Web graph characterization Scale-free networks Carlos Castillo Macroscopic structure Summary Center for Web Research References Computer Science Department University of Chile www.cwr.cl
  • 2. Link analysis Motivation Carlos Castillo Outline Bibliometrics Motivation Bibliometric laws Bibliometrics Measures of similarity Bibliometric laws Measures of similarity Ranking Ranking Pagerank HITS Pagerank Web graph characterization HITS Scale-free networks Macroscopic structure Summary Web graph characterization References Scale-free networks Macroscopic structure Summary References
  • 3. Motivation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Hyperlinks are not at random, they provide valuable Measures of similarity Ranking information for: Pagerank HITS Link-based ranking Web graph characterization Structure analysis Scale-free networks Macroscopic structure Detection of communities Summary Spam detection References ...
  • 4. Topical locality Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking “We found that pages are significantly more Pagerank HITS likely to be related topically to pages to which Web graph they are linked, as opposed to other pages characterization Scale-free networks selected at random or other nearby pages.” Macroscopic structure Summary [Davison, 2000] References
  • 5. What type of relationship? Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Link from P to P means that the content of P is Ranking Pagerank endorsed by author of P , but links can also mean: HITS Disagreement Web graph characterization Scale-free networks Self-citation Macroscopic structure Citation to popular document Summary References Citation to methodological document
  • 6. Bibliometrics Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Quantitative analysis and statistics to describe patterns of Web graph characterization publication. Scale-free networks Macroscopic structure Summary References
  • 7. Example 1: Lotka’s Law Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Fraction of authors with n papers ∝ 1/n2 Ranking Pagerank HITS 1 publication = 0.60 Web graph 2 publications = (1/2)2 × 0.60 = 0.15 characterization Scale-free networks Macroscopic structure 3 publications = (1/3)2 × 0.60 = 0.07 Summary 7 publications = (1/7)2 × 0.60 = 0.01 References
  • 8. Example 2: Bradford’s Law Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Journals in a field can be divided in 3 equal parts (each Web graph part with the same number of journals). The number of characterization Scale-free networks papers in each part will be ∝ 1 : n : n2 . Macroscopic structure Summary References
  • 9. Counting the number of citations Link analysis Carlos Castillo Outline Motivation Bibliometrics Problems: Bibliometric laws Measures of similarity Quantity, not quality Ranking Pagerank Self-citations are frequent HITS Web graph In some fields there are many publications, in others characterization Scale-free networks there are less Macroscopic structure Summary Citations go from newer to older article References New documents have few citations 1/3 of the citations in a paper are not relevant
  • 10. Measures of similarity Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Studying the citations of papers: Ranking Pagerank Bibliographic coupling: two documents share a HITS Web graph significant portion of their bibliographies characterization Scale-free networks Co-citation: two documents are cited Macroscopic structure Summary simultaneously by a significant number of other References documents
  • 11. Bibliographic coupling Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 12. Co-citation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 13. Types of ranking Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Query-independent ranking Web graph characterization Query-dependent ranking Scale-free networks Macroscopic structure Summary References
  • 14. Adversarial-IR in link-based ranking Link analysis Carlos Castillo Outline Motivation Bibliometrics “With a simple program, huge numbers of Bibliometric laws Measures of similarity pages can be created easily, artificially inflating Ranking Pagerank citation counts. Because the Web environment HITS contains profit seeking ventures, attention Web graph characterization getting strategies evolve in response to search Scale-free networks Macroscopic structure engine algorithms. For this reason, any Summary evaluation strategy which counts replicable References features of web pages is prone to manipulation” [Page et al., 1998].
  • 15. Notation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Symbol Meaning Pagerank HITS p A Web page Web graph characterization Γ+ (p) Links pointing from page p Scale-free networks Γ− (p) Macroscopic structure Links pointing to page p Summary References
  • 16. Hyperlink vector voting Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Each link counts as a “vote” on certain keywords Ranking [Li, 1998]: Pagerank HITS h(P , P, w ) = 1 ⇐⇒ there is a link from P to P Web graph characterization with anchor text w Scale-free networks Macroscopic structure HVV (P, w ) = P ∈Γ− (P) h(P , P, w ) Summary Only the count of links is used ⇒ easy to References manipulate
  • 17. Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measure of importance of a page, defined recursively Measures of similarity [Page et al., 1998]: A page with high Pagerank is a Ranking Pagerank page referenced by many pages with high Pagerank HITS Web graph characterization This is a simplified version: Scale-free networks Macroscopic structure Summary Pagerank (x) Pagerank (P) = References |Γ+ (x)| x∈Γ− (P)
  • 18. Iterations with seudo-Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References Note: we should normalize after each iteration
  • 19. Convergence with seudo-Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 20. Random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking So far, so good, but ... Pagerank HITS The Web includes many pages with no out-links, Web graph characterization these will accumulate all of the score Scale-free networks Macroscopic structure We would like Web pages to accumulate ranking Summary We add random jumps (teleportation) References
  • 21. Random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 22. Random jumps to single node Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 23. Pagerank with random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank Pagerank(x) HITS + (1 − ) Pagerank(P) = |Γ+ (x)| N Web graph x∈Γ− (P) characterization Scale-free networks Macroscopic structure Summary References ≈ 0.15 Typically
  • 24. Pagerank calculation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Iterative calculation Pagerank HITS Exploit blocks in the matrix – locality of links Web graph characterization The extra node can be added after the last iteration Scale-free networks Macroscopic structure Dangling nodes can be added after the last iteration Summary [Eiron et al., 2004] References
  • 25. Alternatives Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank Ranking as a vector: topic-sensitive Pagerank HITS [Haveliwala, 2002] Web graph characterization Network flows: TrafficRank [Tomlin, 2003] Scale-free networks Macroscopic structure Dynamic absorbing model [Amati et al., 2003] Summary References
  • 26. Dynamic absorbing model Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References Scores are the probabilities of the “clone” nodes
  • 27. Hubs and authorities Link analysis Carlos Castillo Outline [Kleinberg, 1999] Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 28. Algorithm Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Obtain a results set using text search 1 Web graph characterization Expand the results set with in- and out-links 2 Scale-free networks Macroscopic structure Calculate hubs and authorities 3 Summary References
  • 29. Expansion phase Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 30. Calculation phase Link analysis Carlos Castillo Outline Motivation Bibliometrics Initialize: Bibliometric laws Measures of similarity hub(p, 0) = auth(p, 0) = 1 Ranking Pagerank Iterate: HITS Web graph auth(x, t − 1) characterization hub(p, t) = Scale-free networks |Γ− (x)| Macroscopic structure x∈Γ− (p) Summary References hub(x, t − 1) auth(p, t) = |Γ− (x)| x∈Γ− (p)
  • 31. Web graph characterization Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank “While entirely of human design, the emerging HITS network appears to have more in common with Web graph characterization a cell or an ecological system than with a Swiss Scale-free networks Macroscopic structure watch.” [Barab´si, 2001] a Summary References
  • 32. The Web is not a random network Link analysis Carlos Castillo Outline Scale-free network [Barab´si, 2002] a Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 33. Relationship of Web links with goods Link analysis Carlos Castillo imports Outline Motivation [To be available] Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 34. Other scale-free networks Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Power grid designs Pagerank HITS Sexual partners in humans Web graph characterization Collaboration of movie actors in films Scale-free networks Macroscopic structure Citations in scientific publications Summary Protein interactions References
  • 35. Distribution of degree Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 36. Giant strongly-connected component Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity [Broder et al., 2000] Ranking The core of the Web is a strongly-connected Pagerank HITS component (MAIN) Web graph characterization Nodes reachable from this component are in the OUT Scale-free networks Macroscopic structure component Summary Nodes that reach this component are in the IN References component
  • 37. Bow-tie structure Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 38. Bow-tie structure and details Link analysis Carlos Castillo Outline [Baeza-Yates and Castillo, 2001] Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 39. Bow-tie and search behavior Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 40. Summary Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Hyperlinks should be exploited for extracting Web graph information characterization Scale-free networks Link analysis should exploit global properties Macroscopic structure Summary References
  • 41. Amati, G., Ounis, I., and V., P. (2003). Link analysis The dynamic absorbing model for the web. Carlos Castillo Technical Report TR-2003-137, Department of Outline Computing Science, University of Glasgow. Motivation Bibliometrics Baeza-Yates, R. and Castillo, C. (2001). Bibliometric laws Relating Web characteristics with link based Web Measures of similarity Ranking page ranking. Pagerank HITS In Proceedings of String Processing and Information Web graph Retrieval, pages 21–32, Laguna San Rafael, Chile. characterization Scale-free networks IEEE CS Press. Macroscopic structure Summary Barab´si, A.-L. (2001). a References The physics of the web. PhysicsWeb.ORG, online journal. Barab´si, A.-L. (2002). a Linked: the new science of networks. Perseus Publishing.
  • 42. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Link analysis Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, Carlos Castillo J. (2000). Outline Graph structure in the web: Experiments and models. Motivation Bibliometrics In Proceedings of the Ninth Conference on World Bibliometric laws Measures of similarity Wide Web, pages 309–320, Amsterdam, Netherlands. Ranking Pagerank HITS Davison, B. D. (2000). Web graph Topical locality in the web. characterization Scale-free networks In Proceedings of the 23rd annual international ACM Macroscopic structure SIGIR conference on research and development in Summary information retrieval, pages 272–279. ACM Press. References Eiron, N., McCurley, K. S., and Tomlin, J. A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web, pages 309–318. ACM Press. Haveliwala, T. H. (2002). Topic-sensitive pagerank.
  • 43. In Proceedings of the Eleventh World Wide Web Link analysis Conference, pages 517–526, Honolulu, Hawaii, USA. Carlos Castillo ACM Press. Outline Kleinberg, J. M. (1999). Motivation Authoritative sources in a hyperlinked environment. Bibliometrics Journal of the ACM, 46(5):604–632. Bibliometric laws Measures of similarity Ranking Li, Y. (1998). Pagerank Toward a qualitative search engine. HITS Web graph IEEE Internet Computing, pages 24 – 29. characterization Scale-free networks Macroscopic structure Page, L., Brin, S., Motwani, R., and Winograd, T. Summary (1998). References The Pagerank citation algorithm: bringing order to the web. In Proceedings of the seventh conference on World Wide Web, Brisbane, Australia. Tomlin, J. A. (2003). A new paradigm for ranking pages on the world wide web.
  • 44. In Proceedings of the Twelfth Conference on World Link analysis Wide Web, pages 350–355, Budapest, Hungary. ACM Carlos Castillo Press. Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 45. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 46. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References