SlideShare a Scribd company logo
1 of 46
Download to read offline
Link analysis

                                  Carlos Castillo

                              Outline

                              Motivation

                              Bibliometrics
                              Bibliometric laws
                              Measures of similarity

                              Ranking
   Link analysis              Pagerank
                              HITS

                              Web graph
                              characterization
                              Scale-free networks
     Carlos Castillo          Macroscopic structure

                              Summary
  Center for Web Research     References
Computer Science Department
    University of Chile
        www.cwr.cl
Link analysis

Motivation                       Carlos Castillo

                             Outline
Bibliometrics                Motivation
   Bibliometric laws         Bibliometrics
   Measures of similarity    Bibliometric laws
                             Measures of similarity

                             Ranking
Ranking                      Pagerank
                             HITS

  Pagerank                   Web graph
                             characterization
  HITS                       Scale-free networks
                             Macroscopic structure

                             Summary
Web graph characterization
                             References
  Scale-free networks
  Macroscopic structure

Summary

References
Motivation                                                 Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
Hyperlinks are not at random, they provide valuable   Measures of similarity

                                                      Ranking
information for:                                      Pagerank
                                                      HITS
    Link-based ranking                                Web graph
                                                      characterization
    Structure analysis                                Scale-free networks
                                                      Macroscopic structure
    Detection of communities                          Summary

    Spam detection                                    References

     ...
Topical locality                                            Link analysis

                                                           Carlos Castillo

                                                       Outline

                                                       Motivation

                                                       Bibliometrics
                                                       Bibliometric laws
                                                       Measures of similarity

                                                       Ranking
    “We found that pages are significantly more         Pagerank
                                                       HITS
    likely to be related topically to pages to which   Web graph
    they are linked, as opposed to other pages         characterization
                                                       Scale-free networks
    selected at random or other nearby pages.”         Macroscopic structure

                                                       Summary
    [Davison, 2000]
                                                       References
What type of relationship?                                Link analysis

                                                         Carlos Castillo

                                                     Outline

                                                     Motivation

                                                     Bibliometrics
                                                     Bibliometric laws
                                                     Measures of similarity
Link from P to P means that the content of P is      Ranking
                                                     Pagerank
endorsed by author of P , but links can also mean:   HITS

    Disagreement                                     Web graph
                                                     characterization
                                                     Scale-free networks
    Self-citation                                    Macroscopic structure

    Citation to popular document                     Summary

                                                     References
    Citation to methodological document
Bibliometrics                                                       Link analysis

                                                                   Carlos Castillo

                                                               Outline

                                                               Motivation

                                                               Bibliometrics
                                                               Bibliometric laws
                                                               Measures of similarity

                                                               Ranking
                                                               Pagerank
                                                               HITS

Quantitative analysis and statistics to describe patterns of   Web graph
                                                               characterization
publication.                                                   Scale-free networks
                                                               Macroscopic structure

                                                               Summary

                                                               References
Example 1: Lotka’s Law                           Link analysis

                                                Carlos Castillo

                                            Outline

                                            Motivation

                                            Bibliometrics
                                            Bibliometric laws
                                            Measures of similarity


Fraction of authors with n papers ∝ 1/n2    Ranking
                                            Pagerank
                                            HITS
    1 publication = 0.60                    Web graph
    2 publications = (1/2)2 × 0.60 = 0.15   characterization
                                            Scale-free networks
                                            Macroscopic structure
    3 publications = (1/3)2 × 0.60 = 0.07   Summary
    7 publications = (1/7)2 × 0.60 = 0.01   References
Example 2: Bradford’s Law                                       Link analysis

                                                               Carlos Castillo

                                                           Outline

                                                           Motivation

                                                           Bibliometrics
                                                           Bibliometric laws
                                                           Measures of similarity

                                                           Ranking
                                                           Pagerank
                                                           HITS
Journals in a field can be divided in 3 equal parts (each   Web graph
part with the same number of journals). The number of      characterization
                                                           Scale-free networks
papers in each part will be ∝ 1 : n : n2 .                 Macroscopic structure

                                                           Summary

                                                           References
Counting the number of citations                                Link analysis

                                                               Carlos Castillo

                                                           Outline

                                                           Motivation

                                                           Bibliometrics
Problems:                                                  Bibliometric laws
                                                           Measures of similarity

    Quantity, not quality                                  Ranking
                                                           Pagerank

    Self-citations are frequent                            HITS

                                                           Web graph
    In some fields there are many publications, in others   characterization
                                                           Scale-free networks
    there are less                                         Macroscopic structure

                                                           Summary
    Citations go from newer to older article
                                                           References
    New documents have few citations
    1/3 of the citations in a paper are not relevant
Measures of similarity                                    Link analysis

                                                         Carlos Castillo

                                                     Outline

                                                     Motivation

                                                     Bibliometrics
                                                     Bibliometric laws
                                                     Measures of similarity

Studying the citations of papers:                    Ranking
                                                     Pagerank

    Bibliographic coupling: two documents share a    HITS

                                                     Web graph
    significant portion of their bibliographies       characterization
                                                     Scale-free networks
    Co-citation: two documents are cited             Macroscopic structure

                                                     Summary
    simultaneously by a significant number of other
                                                     References
    documents
Bibliographic coupling        Link analysis

                             Carlos Castillo

                         Outline

                         Motivation

                         Bibliometrics
                         Bibliometric laws
                         Measures of similarity

                         Ranking
                         Pagerank
                         HITS

                         Web graph
                         characterization
                         Scale-free networks
                         Macroscopic structure

                         Summary

                         References
Co-citation        Link analysis

                  Carlos Castillo

              Outline

              Motivation

              Bibliometrics
              Bibliometric laws
              Measures of similarity

              Ranking
              Pagerank
              HITS

              Web graph
              characterization
              Scale-free networks
              Macroscopic structure

              Summary

              References
Types of ranking                     Link analysis

                                    Carlos Castillo

                                Outline

                                Motivation

                                Bibliometrics
                                Bibliometric laws
                                Measures of similarity

                                Ranking
                                Pagerank
                                HITS

    Query-independent ranking   Web graph
                                characterization
    Query-dependent ranking     Scale-free networks
                                Macroscopic structure

                                Summary

                                References
Adversarial-IR in link-based ranking                         Link analysis

                                                            Carlos Castillo

                                                        Outline

                                                        Motivation

                                                        Bibliometrics
    “With a simple program, huge numbers of             Bibliometric laws
                                                        Measures of similarity

    pages can be created easily, artificially inflating   Ranking
                                                        Pagerank
    citation counts. Because the Web environment        HITS

    contains profit seeking ventures, attention          Web graph
                                                        characterization
    getting strategies evolve in response to search     Scale-free networks
                                                        Macroscopic structure
    engine algorithms. For this reason, any             Summary
    evaluation strategy which counts replicable         References
    features of web pages is prone to
    manipulation” [Page et al., 1998].
Notation                                           Link analysis

                                                  Carlos Castillo

                                              Outline

                                              Motivation

                                              Bibliometrics
                                              Bibliometric laws
                                              Measures of similarity

                                              Ranking
       Symbol             Meaning             Pagerank
                                              HITS

          p             A Web page            Web graph
                                              characterization
        Γ+ (p)   Links pointing from page p   Scale-free networks

        Γ− (p)
                                              Macroscopic structure
                  Links pointing to page p    Summary

                                              References
Hyperlink vector voting                                    Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
                                                      Measures of similarity
Each link counts as a “vote” on certain keywords
                                                      Ranking
[Li, 1998]:                                           Pagerank
                                                      HITS

    h(P , P, w ) = 1 ⇐⇒ there is a link from P to P   Web graph
                                                      characterization
    with anchor text w                                Scale-free networks
                                                      Macroscopic structure
    HVV (P, w ) =    P ∈Γ− (P) h(P   , P, w )         Summary

    Only the count of links is used ⇒ easy to         References

    manipulate
Pagerank                                                   Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
Measure of importance of a page, defined recursively   Measures of similarity

[Page et al., 1998]: A page with high Pagerank is a   Ranking
                                                      Pagerank
page referenced by many pages with high Pagerank      HITS

                                                      Web graph
                                                      characterization
This is a simplified version:                          Scale-free networks
                                                      Macroscopic structure

                                                      Summary
                                      Pagerank (x)
          Pagerank (P) =                              References
                                        |Γ+ (x)|
                           x∈Γ− (P)
Iterations with seudo-Pagerank                        Link analysis

                                                     Carlos Castillo

                                                 Outline

                                                 Motivation

                                                 Bibliometrics
                                                 Bibliometric laws
                                                 Measures of similarity

                                                 Ranking
                                                 Pagerank
                                                 HITS

                                                 Web graph
                                                 characterization
                                                 Scale-free networks
                                                 Macroscopic structure

                                                 Summary

                                                 References




Note: we should normalize after each iteration
Convergence with seudo-Pagerank        Link analysis

                                      Carlos Castillo

                                  Outline

                                  Motivation

                                  Bibliometrics
                                  Bibliometric laws
                                  Measures of similarity

                                  Ranking
                                  Pagerank
                                  HITS

                                  Web graph
                                  characterization
                                  Scale-free networks
                                  Macroscopic structure

                                  Summary

                                  References
Random jumps                                             Link analysis

                                                        Carlos Castillo

                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
   So far, so good, but ...                         Pagerank
                                                    HITS
   The Web includes many pages with no out-links,   Web graph
                                                    characterization
   these will accumulate all of the score           Scale-free networks
                                                    Macroscopic structure
   We would like Web pages to accumulate ranking    Summary

   We add random jumps (teleportation)              References
Random jumps        Link analysis

                   Carlos Castillo

               Outline

               Motivation

               Bibliometrics
               Bibliometric laws
               Measures of similarity

               Ranking
               Pagerank
               HITS

               Web graph
               characterization
               Scale-free networks
               Macroscopic structure

               Summary

               References
Random jumps to single node        Link analysis

                                  Carlos Castillo

                              Outline

                              Motivation

                              Bibliometrics
                              Bibliometric laws
                              Measures of similarity

                              Ranking
                              Pagerank
                              HITS

                              Web graph
                              characterization
                              Scale-free networks
                              Macroscopic structure

                              Summary

                              References
Pagerank with random jumps                                        Link analysis

                                                                 Carlos Castillo

                                                             Outline

                                                             Motivation

                                                             Bibliometrics
                                                             Bibliometric laws
                                                             Measures of similarity

                                                             Ranking
                                                             Pagerank
                                               Pagerank(x)   HITS
                         + (1 − )
    Pagerank(P) =
                                                 |Γ+ (x)|
                     N                                       Web graph
                                    x∈Γ− (P)                 characterization
                                                             Scale-free networks
                                                             Macroscopic structure

                                                             Summary

                                                             References


            ≈ 0.15
Typically
Pagerank calculation                                            Link analysis

                                                               Carlos Castillo

                                                           Outline

                                                           Motivation

                                                           Bibliometrics
                                                           Bibliometric laws
                                                           Measures of similarity

                                                           Ranking
    Iterative calculation                                  Pagerank
                                                           HITS
    Exploit blocks in the matrix – locality of links       Web graph
                                                           characterization
    The extra node can be added after the last iteration   Scale-free networks
                                                           Macroscopic structure
    Dangling nodes can be added after the last iteration   Summary

    [Eiron et al., 2004]                                   References
Alternatives                                             Link analysis

                                                        Carlos Castillo

                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
                                                    Pagerank
    Ranking as a vector: topic-sensitive Pagerank   HITS

    [Haveliwala, 2002]                              Web graph
                                                    characterization
    Network flows: TrafficRank [Tomlin, 2003]          Scale-free networks
                                                    Macroscopic structure

    Dynamic absorbing model [Amati et al., 2003]    Summary

                                                    References
Dynamic absorbing model                                  Link analysis

                                                        Carlos Castillo

                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
                                                    Pagerank
                                                    HITS

                                                    Web graph
                                                    characterization
                                                    Scale-free networks
                                                    Macroscopic structure

                                                    Summary

                                                    References




Scores are the probabilities of the “clone” nodes
Hubs and authorities        Link analysis

                           Carlos Castillo

                       Outline
[Kleinberg, 1999]      Motivation

                       Bibliometrics
                       Bibliometric laws
                       Measures of similarity

                       Ranking
                       Pagerank
                       HITS

                       Web graph
                       characterization
                       Scale-free networks
                       Macroscopic structure

                       Summary

                       References
Algorithm                                                  Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
                                                      Measures of similarity

                                                      Ranking
                                                      Pagerank
                                                      HITS
      Obtain a results set using text search
  1
                                                      Web graph
                                                      characterization
      Expand the results set with in- and out-links
  2
                                                      Scale-free networks
                                                      Macroscopic structure
      Calculate hubs and authorities
  3
                                                      Summary

                                                      References
Expansion phase        Link analysis

                      Carlos Castillo

                  Outline

                  Motivation

                  Bibliometrics
                  Bibliometric laws
                  Measures of similarity

                  Ranking
                  Pagerank
                  HITS

                  Web graph
                  characterization
                  Scale-free networks
                  Macroscopic structure

                  Summary

                  References
Calculation phase                                            Link analysis

                                                            Carlos Castillo

                                                        Outline

                                                        Motivation

                                                        Bibliometrics
Initialize:                                             Bibliometric laws
                                                        Measures of similarity
                 hub(p, 0) = auth(p, 0) = 1
                                                        Ranking
                                                        Pagerank
Iterate:                                                HITS

                                                        Web graph
                                       auth(x, t − 1)   characterization
              hub(p, t) =                               Scale-free networks
                                          |Γ− (x)|      Macroscopic structure
                            x∈Γ− (p)                    Summary

                                                        References
                                        hub(x, t − 1)
              auth(p, t) =
                                          |Γ− (x)|
                             x∈Γ− (p)
Web graph characterization                                 Link analysis

                                                          Carlos Castillo

                                                      Outline

                                                      Motivation

                                                      Bibliometrics
                                                      Bibliometric laws
                                                      Measures of similarity

                                                      Ranking
                                                      Pagerank
   “While entirely of human design, the emerging      HITS

   network appears to have more in common with        Web graph
                                                      characterization
   a cell or an ecological system than with a Swiss   Scale-free networks
                                                      Macroscopic structure
   watch.” [Barab´si, 2001]
                   a                                  Summary

                                                      References
The Web is not a random network            Link analysis

                                          Carlos Castillo

                                      Outline
Scale-free network [Barab´si, 2002]
                         a            Motivation

                                      Bibliometrics
                                      Bibliometric laws
                                      Measures of similarity

                                      Ranking
                                      Pagerank
                                      HITS

                                      Web graph
                                      characterization
                                      Scale-free networks
                                      Macroscopic structure

                                      Summary

                                      References
Relationship of Web links with goods        Link analysis

                                           Carlos Castillo
imports
                                       Outline

                                       Motivation

[To be available]                      Bibliometrics
                                       Bibliometric laws
                                       Measures of similarity

                                       Ranking
                                       Pagerank
                                       HITS

                                       Web graph
                                       characterization
                                       Scale-free networks
                                       Macroscopic structure

                                       Summary

                                       References
Other scale-free networks                        Link analysis

                                                Carlos Castillo

                                            Outline

                                            Motivation

                                            Bibliometrics
                                            Bibliometric laws
                                            Measures of similarity

                                            Ranking
    Power grid designs                      Pagerank
                                            HITS
    Sexual partners in humans               Web graph
                                            characterization
    Collaboration of movie actors in films   Scale-free networks
                                            Macroscopic structure
    Citations in scientific publications     Summary

    Protein interactions                    References
Distribution of degree        Link analysis

                             Carlos Castillo

                         Outline

                         Motivation

                         Bibliometrics
                         Bibliometric laws
                         Measures of similarity

                         Ranking
                         Pagerank
                         HITS

                         Web graph
                         characterization
                         Scale-free networks
                         Macroscopic structure

                         Summary

                         References
Giant strongly-connected component                            Link analysis

                                                             Carlos Castillo

                                                         Outline

                                                         Motivation

                                                         Bibliometrics
                                                         Bibliometric laws
                                                         Measures of similarity
[Broder et al., 2000]
                                                         Ranking
    The core of the Web is a strongly-connected          Pagerank
                                                         HITS

    component (MAIN)                                     Web graph
                                                         characterization
    Nodes reachable from this component are in the OUT   Scale-free networks
                                                         Macroscopic structure
    component                                            Summary

    Nodes that reach this component are in the IN        References

    component
Bow-tie structure        Link analysis

                        Carlos Castillo

                    Outline

                    Motivation

                    Bibliometrics
                    Bibliometric laws
                    Measures of similarity

                    Ranking
                    Pagerank
                    HITS

                    Web graph
                    characterization
                    Scale-free networks
                    Macroscopic structure

                    Summary

                    References
Bow-tie structure and details           Link analysis

                                       Carlos Castillo

                                   Outline
[Baeza-Yates and Castillo, 2001]   Motivation

                                   Bibliometrics
                                   Bibliometric laws
                                   Measures of similarity

                                   Ranking
                                   Pagerank
                                   HITS

                                   Web graph
                                   characterization
                                   Scale-free networks
                                   Macroscopic structure

                                   Summary

                                   References
Bow-tie and search behavior        Link analysis

                                  Carlos Castillo

                              Outline

                              Motivation

                              Bibliometrics
                              Bibliometric laws
                              Measures of similarity

                              Ranking
                              Pagerank
                              HITS

                              Web graph
                              characterization
                              Scale-free networks
                              Macroscopic structure

                              Summary

                              References
Summary                                                  Link analysis

                                                        Carlos Castillo

                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
                                                    Pagerank
                                                    HITS
   Hyperlinks should be exploited for extracting    Web graph
   information                                      characterization
                                                    Scale-free networks

   Link analysis should exploit global properties   Macroscopic structure

                                                    Summary

                                                    References
Amati, G., Ounis, I., and V., P. (2003).                   Link analysis

The dynamic absorbing model for the web.                  Carlos Castillo

Technical Report TR-2003-137, Department of           Outline
Computing Science, University of Glasgow.             Motivation

                                                      Bibliometrics
Baeza-Yates, R. and Castillo, C. (2001).              Bibliometric laws
Relating Web characteristics with link based Web      Measures of similarity

                                                      Ranking
page ranking.                                         Pagerank
                                                      HITS
In Proceedings of String Processing and Information
                                                      Web graph
Retrieval, pages 21–32, Laguna San Rafael, Chile.     characterization
                                                      Scale-free networks
IEEE CS Press.                                        Macroscopic structure

                                                      Summary
Barab´si, A.-L. (2001).
     a
                                                      References
The physics of the web.
PhysicsWeb.ORG, online journal.
Barab´si, A.-L. (2002).
     a
Linked: the new science of networks.
Perseus Publishing.
Broder, A., Kumar, R., Maghoul, F., Raghavan, P.,            Link analysis

Rajagopalan, S., Stata, R., Tomkins, A., and Wiener,        Carlos Castillo

J. (2000).                                              Outline
Graph structure in the web: Experiments and models.     Motivation

                                                        Bibliometrics
In Proceedings of the Ninth Conference on World         Bibliometric laws
                                                        Measures of similarity

Wide Web, pages 309–320, Amsterdam, Netherlands.        Ranking
                                                        Pagerank
                                                        HITS
Davison, B. D. (2000).
                                                        Web graph
Topical locality in the web.                            characterization
                                                        Scale-free networks
In Proceedings of the 23rd annual international ACM     Macroscopic structure

SIGIR conference on research and development in         Summary

information retrieval, pages 272–279. ACM Press.        References


Eiron, N., McCurley, K. S., and Tomlin, J. A. (2004).
Ranking the web frontier.
In Proceedings of the 13th international conference
on World Wide Web, pages 309–318. ACM Press.
Haveliwala, T. H. (2002).
Topic-sensitive pagerank.
In Proceedings of the Eleventh World Wide Web              Link analysis
Conference, pages 517–526, Honolulu, Hawaii, USA.         Carlos Castillo
ACM Press.
                                                      Outline
Kleinberg, J. M. (1999).                              Motivation
Authoritative sources in a hyperlinked environment.   Bibliometrics
Journal of the ACM, 46(5):604–632.                    Bibliometric laws
                                                      Measures of similarity

                                                      Ranking
Li, Y. (1998).                                        Pagerank

Toward a qualitative search engine.                   HITS

                                                      Web graph
IEEE Internet Computing, pages 24 – 29.               characterization
                                                      Scale-free networks
                                                      Macroscopic structure
Page, L., Brin, S., Motwani, R., and Winograd, T.
                                                      Summary
(1998).
                                                      References
The Pagerank citation algorithm: bringing order to
the web.
In Proceedings of the seventh conference on World
Wide Web, Brisbane, Australia.
Tomlin, J. A. (2003).
A new paradigm for ranking pages on the world wide
web.
In Proceedings of the Twelfth Conference on World        Link analysis
Wide Web, pages 350–355, Budapest, Hungary. ACM         Carlos Castillo
Press.
                                                    Outline

                                                    Motivation

                                                    Bibliometrics
                                                    Bibliometric laws
                                                    Measures of similarity

                                                    Ranking
                                                    Pagerank
                                                    HITS

                                                    Web graph
                                                    characterization
                                                    Scale-free networks
                                                    Macroscopic structure

                                                    Summary

                                                    References
Link analysis

    Carlos Castillo

Outline

Motivation

Bibliometrics
Bibliometric laws
Measures of similarity

Ranking
Pagerank
HITS

Web graph
characterization
Scale-free networks
Macroscopic structure

Summary

References
Link analysis

    Carlos Castillo

Outline

Motivation

Bibliometrics
Bibliometric laws
Measures of similarity

Ranking
Pagerank
HITS

Web graph
characterization
Scale-free networks
Macroscopic structure

Summary

References

More Related Content

What's hot

Multivariate Data Visualization
Multivariate Data VisualizationMultivariate Data Visualization
Multivariate Data Visualizationanilash
 
Image Segmentation: Approaches and Challenges
Image Segmentation: Approaches and ChallengesImage Segmentation: Approaches and Challenges
Image Segmentation: Approaches and ChallengesApache MXNet
 
Fraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Fraud Detection in Insurance with Machine Learning for WARTA - Artur SuchwalkoFraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Fraud Detection in Insurance with Machine Learning for WARTA - Artur SuchwalkoInstitute of Contemporary Sciences
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit Vpkaviya
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia dataminingKrish_ver2
 
Advanced analytics
Advanced analyticsAdvanced analytics
Advanced analyticsShankar R
 
Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용Kyunghoon Kim
 
Coronavirus Impact Assessment And Mitigation Strategies On Technology Sector ...
Coronavirus Impact Assessment And Mitigation Strategies On Technology Sector ...Coronavirus Impact Assessment And Mitigation Strategies On Technology Sector ...
Coronavirus Impact Assessment And Mitigation Strategies On Technology Sector ...SlideTeam
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Rehan Guha
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualizationVini Vasundharan
 

What's hot (20)

Multivariate Data Visualization
Multivariate Data VisualizationMultivariate Data Visualization
Multivariate Data Visualization
 
Image Segmentation: Approaches and Challenges
Image Segmentation: Approaches and ChallengesImage Segmentation: Approaches and Challenges
Image Segmentation: Approaches and Challenges
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Fraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Fraud Detection in Insurance with Machine Learning for WARTA - Artur SuchwalkoFraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Fraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
 
Advanced analytics
Advanced analyticsAdvanced analytics
Advanced analytics
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용
 
Coronavirus Impact Assessment And Mitigation Strategies On Technology Sector ...
Coronavirus Impact Assessment And Mitigation Strategies On Technology Sector ...Coronavirus Impact Assessment And Mitigation Strategies On Technology Sector ...
Coronavirus Impact Assessment And Mitigation Strategies On Technology Sector ...
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)
 
Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Analyzing Hyperlinks Using Link Analysis and Graph Theory

  • 1. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Link analysis Pagerank HITS Web graph characterization Scale-free networks Carlos Castillo Macroscopic structure Summary Center for Web Research References Computer Science Department University of Chile www.cwr.cl
  • 2. Link analysis Motivation Carlos Castillo Outline Bibliometrics Motivation Bibliometric laws Bibliometrics Measures of similarity Bibliometric laws Measures of similarity Ranking Ranking Pagerank HITS Pagerank Web graph characterization HITS Scale-free networks Macroscopic structure Summary Web graph characterization References Scale-free networks Macroscopic structure Summary References
  • 3. Motivation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Hyperlinks are not at random, they provide valuable Measures of similarity Ranking information for: Pagerank HITS Link-based ranking Web graph characterization Structure analysis Scale-free networks Macroscopic structure Detection of communities Summary Spam detection References ...
  • 4. Topical locality Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking “We found that pages are significantly more Pagerank HITS likely to be related topically to pages to which Web graph they are linked, as opposed to other pages characterization Scale-free networks selected at random or other nearby pages.” Macroscopic structure Summary [Davison, 2000] References
  • 5. What type of relationship? Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Link from P to P means that the content of P is Ranking Pagerank endorsed by author of P , but links can also mean: HITS Disagreement Web graph characterization Scale-free networks Self-citation Macroscopic structure Citation to popular document Summary References Citation to methodological document
  • 6. Bibliometrics Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Quantitative analysis and statistics to describe patterns of Web graph characterization publication. Scale-free networks Macroscopic structure Summary References
  • 7. Example 1: Lotka’s Law Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Fraction of authors with n papers ∝ 1/n2 Ranking Pagerank HITS 1 publication = 0.60 Web graph 2 publications = (1/2)2 × 0.60 = 0.15 characterization Scale-free networks Macroscopic structure 3 publications = (1/3)2 × 0.60 = 0.07 Summary 7 publications = (1/7)2 × 0.60 = 0.01 References
  • 8. Example 2: Bradford’s Law Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Journals in a field can be divided in 3 equal parts (each Web graph part with the same number of journals). The number of characterization Scale-free networks papers in each part will be ∝ 1 : n : n2 . Macroscopic structure Summary References
  • 9. Counting the number of citations Link analysis Carlos Castillo Outline Motivation Bibliometrics Problems: Bibliometric laws Measures of similarity Quantity, not quality Ranking Pagerank Self-citations are frequent HITS Web graph In some fields there are many publications, in others characterization Scale-free networks there are less Macroscopic structure Summary Citations go from newer to older article References New documents have few citations 1/3 of the citations in a paper are not relevant
  • 10. Measures of similarity Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Studying the citations of papers: Ranking Pagerank Bibliographic coupling: two documents share a HITS Web graph significant portion of their bibliographies characterization Scale-free networks Co-citation: two documents are cited Macroscopic structure Summary simultaneously by a significant number of other References documents
  • 11. Bibliographic coupling Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 12. Co-citation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 13. Types of ranking Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Query-independent ranking Web graph characterization Query-dependent ranking Scale-free networks Macroscopic structure Summary References
  • 14. Adversarial-IR in link-based ranking Link analysis Carlos Castillo Outline Motivation Bibliometrics “With a simple program, huge numbers of Bibliometric laws Measures of similarity pages can be created easily, artificially inflating Ranking Pagerank citation counts. Because the Web environment HITS contains profit seeking ventures, attention Web graph characterization getting strategies evolve in response to search Scale-free networks Macroscopic structure engine algorithms. For this reason, any Summary evaluation strategy which counts replicable References features of web pages is prone to manipulation” [Page et al., 1998].
  • 15. Notation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Symbol Meaning Pagerank HITS p A Web page Web graph characterization Γ+ (p) Links pointing from page p Scale-free networks Γ− (p) Macroscopic structure Links pointing to page p Summary References
  • 16. Hyperlink vector voting Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Each link counts as a “vote” on certain keywords Ranking [Li, 1998]: Pagerank HITS h(P , P, w ) = 1 ⇐⇒ there is a link from P to P Web graph characterization with anchor text w Scale-free networks Macroscopic structure HVV (P, w ) = P ∈Γ− (P) h(P , P, w ) Summary Only the count of links is used ⇒ easy to References manipulate
  • 17. Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measure of importance of a page, defined recursively Measures of similarity [Page et al., 1998]: A page with high Pagerank is a Ranking Pagerank page referenced by many pages with high Pagerank HITS Web graph characterization This is a simplified version: Scale-free networks Macroscopic structure Summary Pagerank (x) Pagerank (P) = References |Γ+ (x)| x∈Γ− (P)
  • 18. Iterations with seudo-Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References Note: we should normalize after each iteration
  • 19. Convergence with seudo-Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 20. Random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking So far, so good, but ... Pagerank HITS The Web includes many pages with no out-links, Web graph characterization these will accumulate all of the score Scale-free networks Macroscopic structure We would like Web pages to accumulate ranking Summary We add random jumps (teleportation) References
  • 21. Random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 22. Random jumps to single node Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 23. Pagerank with random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank Pagerank(x) HITS + (1 − ) Pagerank(P) = |Γ+ (x)| N Web graph x∈Γ− (P) characterization Scale-free networks Macroscopic structure Summary References ≈ 0.15 Typically
  • 24. Pagerank calculation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Iterative calculation Pagerank HITS Exploit blocks in the matrix – locality of links Web graph characterization The extra node can be added after the last iteration Scale-free networks Macroscopic structure Dangling nodes can be added after the last iteration Summary [Eiron et al., 2004] References
  • 25. Alternatives Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank Ranking as a vector: topic-sensitive Pagerank HITS [Haveliwala, 2002] Web graph characterization Network flows: TrafficRank [Tomlin, 2003] Scale-free networks Macroscopic structure Dynamic absorbing model [Amati et al., 2003] Summary References
  • 26. Dynamic absorbing model Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References Scores are the probabilities of the “clone” nodes
  • 27. Hubs and authorities Link analysis Carlos Castillo Outline [Kleinberg, 1999] Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 28. Algorithm Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Obtain a results set using text search 1 Web graph characterization Expand the results set with in- and out-links 2 Scale-free networks Macroscopic structure Calculate hubs and authorities 3 Summary References
  • 29. Expansion phase Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 30. Calculation phase Link analysis Carlos Castillo Outline Motivation Bibliometrics Initialize: Bibliometric laws Measures of similarity hub(p, 0) = auth(p, 0) = 1 Ranking Pagerank Iterate: HITS Web graph auth(x, t − 1) characterization hub(p, t) = Scale-free networks |Γ− (x)| Macroscopic structure x∈Γ− (p) Summary References hub(x, t − 1) auth(p, t) = |Γ− (x)| x∈Γ− (p)
  • 31. Web graph characterization Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank “While entirely of human design, the emerging HITS network appears to have more in common with Web graph characterization a cell or an ecological system than with a Swiss Scale-free networks Macroscopic structure watch.” [Barab´si, 2001] a Summary References
  • 32. The Web is not a random network Link analysis Carlos Castillo Outline Scale-free network [Barab´si, 2002] a Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 33. Relationship of Web links with goods Link analysis Carlos Castillo imports Outline Motivation [To be available] Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 34. Other scale-free networks Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Power grid designs Pagerank HITS Sexual partners in humans Web graph characterization Collaboration of movie actors in films Scale-free networks Macroscopic structure Citations in scientific publications Summary Protein interactions References
  • 35. Distribution of degree Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 36. Giant strongly-connected component Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity [Broder et al., 2000] Ranking The core of the Web is a strongly-connected Pagerank HITS component (MAIN) Web graph characterization Nodes reachable from this component are in the OUT Scale-free networks Macroscopic structure component Summary Nodes that reach this component are in the IN References component
  • 37. Bow-tie structure Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 38. Bow-tie structure and details Link analysis Carlos Castillo Outline [Baeza-Yates and Castillo, 2001] Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 39. Bow-tie and search behavior Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 40. Summary Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Hyperlinks should be exploited for extracting Web graph information characterization Scale-free networks Link analysis should exploit global properties Macroscopic structure Summary References
  • 41. Amati, G., Ounis, I., and V., P. (2003). Link analysis The dynamic absorbing model for the web. Carlos Castillo Technical Report TR-2003-137, Department of Outline Computing Science, University of Glasgow. Motivation Bibliometrics Baeza-Yates, R. and Castillo, C. (2001). Bibliometric laws Relating Web characteristics with link based Web Measures of similarity Ranking page ranking. Pagerank HITS In Proceedings of String Processing and Information Web graph Retrieval, pages 21–32, Laguna San Rafael, Chile. characterization Scale-free networks IEEE CS Press. Macroscopic structure Summary Barab´si, A.-L. (2001). a References The physics of the web. PhysicsWeb.ORG, online journal. Barab´si, A.-L. (2002). a Linked: the new science of networks. Perseus Publishing.
  • 42. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Link analysis Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, Carlos Castillo J. (2000). Outline Graph structure in the web: Experiments and models. Motivation Bibliometrics In Proceedings of the Ninth Conference on World Bibliometric laws Measures of similarity Wide Web, pages 309–320, Amsterdam, Netherlands. Ranking Pagerank HITS Davison, B. D. (2000). Web graph Topical locality in the web. characterization Scale-free networks In Proceedings of the 23rd annual international ACM Macroscopic structure SIGIR conference on research and development in Summary information retrieval, pages 272–279. ACM Press. References Eiron, N., McCurley, K. S., and Tomlin, J. A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web, pages 309–318. ACM Press. Haveliwala, T. H. (2002). Topic-sensitive pagerank.
  • 43. In Proceedings of the Eleventh World Wide Web Link analysis Conference, pages 517–526, Honolulu, Hawaii, USA. Carlos Castillo ACM Press. Outline Kleinberg, J. M. (1999). Motivation Authoritative sources in a hyperlinked environment. Bibliometrics Journal of the ACM, 46(5):604–632. Bibliometric laws Measures of similarity Ranking Li, Y. (1998). Pagerank Toward a qualitative search engine. HITS Web graph IEEE Internet Computing, pages 24 – 29. characterization Scale-free networks Macroscopic structure Page, L., Brin, S., Motwani, R., and Winograd, T. Summary (1998). References The Pagerank citation algorithm: bringing order to the web. In Proceedings of the seventh conference on World Wide Web, Brisbane, Australia. Tomlin, J. A. (2003). A new paradigm for ranking pages on the world wide web.
  • 44. In Proceedings of the Twelfth Conference on World Link analysis Wide Web, pages 350–355, Budapest, Hungary. ACM Carlos Castillo Press. Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 45. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 46. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References