Your SlideShare is downloading. ×
Link Analysis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Link Analysis

1,358
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,358
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Link analysis Pagerank HITS Web graph characterization Scale-free networks Carlos Castillo Macroscopic structure Summary Center for Web Research References Computer Science Department University of Chile www.cwr.cl
  • 2. Link analysis Motivation Carlos Castillo Outline Bibliometrics Motivation Bibliometric laws Bibliometrics Measures of similarity Bibliometric laws Measures of similarity Ranking Ranking Pagerank HITS Pagerank Web graph characterization HITS Scale-free networks Macroscopic structure Summary Web graph characterization References Scale-free networks Macroscopic structure Summary References
  • 3. Motivation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Hyperlinks are not at random, they provide valuable Measures of similarity Ranking information for: Pagerank HITS Link-based ranking Web graph characterization Structure analysis Scale-free networks Macroscopic structure Detection of communities Summary Spam detection References ...
  • 4. Topical locality Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking “We found that pages are significantly more Pagerank HITS likely to be related topically to pages to which Web graph they are linked, as opposed to other pages characterization Scale-free networks selected at random or other nearby pages.” Macroscopic structure Summary [Davison, 2000] References
  • 5. What type of relationship? Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Link from P to P means that the content of P is Ranking Pagerank endorsed by author of P , but links can also mean: HITS Disagreement Web graph characterization Scale-free networks Self-citation Macroscopic structure Citation to popular document Summary References Citation to methodological document
  • 6. Bibliometrics Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Quantitative analysis and statistics to describe patterns of Web graph characterization publication. Scale-free networks Macroscopic structure Summary References
  • 7. Example 1: Lotka’s Law Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Fraction of authors with n papers ∝ 1/n2 Ranking Pagerank HITS 1 publication = 0.60 Web graph 2 publications = (1/2)2 × 0.60 = 0.15 characterization Scale-free networks Macroscopic structure 3 publications = (1/3)2 × 0.60 = 0.07 Summary 7 publications = (1/7)2 × 0.60 = 0.01 References
  • 8. Example 2: Bradford’s Law Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Journals in a field can be divided in 3 equal parts (each Web graph part with the same number of journals). The number of characterization Scale-free networks papers in each part will be ∝ 1 : n : n2 . Macroscopic structure Summary References
  • 9. Counting the number of citations Link analysis Carlos Castillo Outline Motivation Bibliometrics Problems: Bibliometric laws Measures of similarity Quantity, not quality Ranking Pagerank Self-citations are frequent HITS Web graph In some fields there are many publications, in others characterization Scale-free networks there are less Macroscopic structure Summary Citations go from newer to older article References New documents have few citations 1/3 of the citations in a paper are not relevant
  • 10. Measures of similarity Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Studying the citations of papers: Ranking Pagerank Bibliographic coupling: two documents share a HITS Web graph significant portion of their bibliographies characterization Scale-free networks Co-citation: two documents are cited Macroscopic structure Summary simultaneously by a significant number of other References documents
  • 11. Bibliographic coupling Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 12. Co-citation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 13. Types of ranking Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Query-independent ranking Web graph characterization Query-dependent ranking Scale-free networks Macroscopic structure Summary References
  • 14. Adversarial-IR in link-based ranking Link analysis Carlos Castillo Outline Motivation Bibliometrics “With a simple program, huge numbers of Bibliometric laws Measures of similarity pages can be created easily, artificially inflating Ranking Pagerank citation counts. Because the Web environment HITS contains profit seeking ventures, attention Web graph characterization getting strategies evolve in response to search Scale-free networks Macroscopic structure engine algorithms. For this reason, any Summary evaluation strategy which counts replicable References features of web pages is prone to manipulation” [Page et al., 1998].
  • 15. Notation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Symbol Meaning Pagerank HITS p A Web page Web graph characterization Γ+ (p) Links pointing from page p Scale-free networks Γ− (p) Macroscopic structure Links pointing to page p Summary References
  • 16. Hyperlink vector voting Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Each link counts as a “vote” on certain keywords Ranking [Li, 1998]: Pagerank HITS h(P , P, w ) = 1 ⇐⇒ there is a link from P to P Web graph characterization with anchor text w Scale-free networks Macroscopic structure HVV (P, w ) = P ∈Γ− (P) h(P , P, w ) Summary Only the count of links is used ⇒ easy to References manipulate
  • 17. Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measure of importance of a page, defined recursively Measures of similarity [Page et al., 1998]: A page with high Pagerank is a Ranking Pagerank page referenced by many pages with high Pagerank HITS Web graph characterization This is a simplified version: Scale-free networks Macroscopic structure Summary Pagerank (x) Pagerank (P) = References |Γ+ (x)| x∈Γ− (P)
  • 18. Iterations with seudo-Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References Note: we should normalize after each iteration
  • 19. Convergence with seudo-Pagerank Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 20. Random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking So far, so good, but ... Pagerank HITS The Web includes many pages with no out-links, Web graph characterization these will accumulate all of the score Scale-free networks Macroscopic structure We would like Web pages to accumulate ranking Summary We add random jumps (teleportation) References
  • 21. Random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 22. Random jumps to single node Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 23. Pagerank with random jumps Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank Pagerank(x) HITS + (1 − ) Pagerank(P) = |Γ+ (x)| N Web graph x∈Γ− (P) characterization Scale-free networks Macroscopic structure Summary References ≈ 0.15 Typically
  • 24. Pagerank calculation Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Iterative calculation Pagerank HITS Exploit blocks in the matrix – locality of links Web graph characterization The extra node can be added after the last iteration Scale-free networks Macroscopic structure Dangling nodes can be added after the last iteration Summary [Eiron et al., 2004] References
  • 25. Alternatives Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank Ranking as a vector: topic-sensitive Pagerank HITS [Haveliwala, 2002] Web graph characterization Network flows: TrafficRank [Tomlin, 2003] Scale-free networks Macroscopic structure Dynamic absorbing model [Amati et al., 2003] Summary References
  • 26. Dynamic absorbing model Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References Scores are the probabilities of the “clone” nodes
  • 27. Hubs and authorities Link analysis Carlos Castillo Outline [Kleinberg, 1999] Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 28. Algorithm Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Obtain a results set using text search 1 Web graph characterization Expand the results set with in- and out-links 2 Scale-free networks Macroscopic structure Calculate hubs and authorities 3 Summary References
  • 29. Expansion phase Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 30. Calculation phase Link analysis Carlos Castillo Outline Motivation Bibliometrics Initialize: Bibliometric laws Measures of similarity hub(p, 0) = auth(p, 0) = 1 Ranking Pagerank Iterate: HITS Web graph auth(x, t − 1) characterization hub(p, t) = Scale-free networks |Γ− (x)| Macroscopic structure x∈Γ− (p) Summary References hub(x, t − 1) auth(p, t) = |Γ− (x)| x∈Γ− (p)
  • 31. Web graph characterization Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank “While entirely of human design, the emerging HITS network appears to have more in common with Web graph characterization a cell or an ecological system than with a Swiss Scale-free networks Macroscopic structure watch.” [Barab´si, 2001] a Summary References
  • 32. The Web is not a random network Link analysis Carlos Castillo Outline Scale-free network [Barab´si, 2002] a Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 33. Relationship of Web links with goods Link analysis Carlos Castillo imports Outline Motivation [To be available] Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 34. Other scale-free networks Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Power grid designs Pagerank HITS Sexual partners in humans Web graph characterization Collaboration of movie actors in films Scale-free networks Macroscopic structure Citations in scientific publications Summary Protein interactions References
  • 35. Distribution of degree Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 36. Giant strongly-connected component Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity [Broder et al., 2000] Ranking The core of the Web is a strongly-connected Pagerank HITS component (MAIN) Web graph characterization Nodes reachable from this component are in the OUT Scale-free networks Macroscopic structure component Summary Nodes that reach this component are in the IN References component
  • 37. Bow-tie structure Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 38. Bow-tie structure and details Link analysis Carlos Castillo Outline [Baeza-Yates and Castillo, 2001] Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 39. Bow-tie and search behavior Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 40. Summary Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Hyperlinks should be exploited for extracting Web graph information characterization Scale-free networks Link analysis should exploit global properties Macroscopic structure Summary References
  • 41. Amati, G., Ounis, I., and V., P. (2003). Link analysis The dynamic absorbing model for the web. Carlos Castillo Technical Report TR-2003-137, Department of Outline Computing Science, University of Glasgow. Motivation Bibliometrics Baeza-Yates, R. and Castillo, C. (2001). Bibliometric laws Relating Web characteristics with link based Web Measures of similarity Ranking page ranking. Pagerank HITS In Proceedings of String Processing and Information Web graph Retrieval, pages 21–32, Laguna San Rafael, Chile. characterization Scale-free networks IEEE CS Press. Macroscopic structure Summary Barab´si, A.-L. (2001). a References The physics of the web. PhysicsWeb.ORG, online journal. Barab´si, A.-L. (2002). a Linked: the new science of networks. Perseus Publishing.
  • 42. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Link analysis Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, Carlos Castillo J. (2000). Outline Graph structure in the web: Experiments and models. Motivation Bibliometrics In Proceedings of the Ninth Conference on World Bibliometric laws Measures of similarity Wide Web, pages 309–320, Amsterdam, Netherlands. Ranking Pagerank HITS Davison, B. D. (2000). Web graph Topical locality in the web. characterization Scale-free networks In Proceedings of the 23rd annual international ACM Macroscopic structure SIGIR conference on research and development in Summary information retrieval, pages 272–279. ACM Press. References Eiron, N., McCurley, K. S., and Tomlin, J. A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web, pages 309–318. ACM Press. Haveliwala, T. H. (2002). Topic-sensitive pagerank.
  • 43. In Proceedings of the Eleventh World Wide Web Link analysis Conference, pages 517–526, Honolulu, Hawaii, USA. Carlos Castillo ACM Press. Outline Kleinberg, J. M. (1999). Motivation Authoritative sources in a hyperlinked environment. Bibliometrics Journal of the ACM, 46(5):604–632. Bibliometric laws Measures of similarity Ranking Li, Y. (1998). Pagerank Toward a qualitative search engine. HITS Web graph IEEE Internet Computing, pages 24 – 29. characterization Scale-free networks Macroscopic structure Page, L., Brin, S., Motwani, R., and Winograd, T. Summary (1998). References The Pagerank citation algorithm: bringing order to the web. In Proceedings of the seventh conference on World Wide Web, Brisbane, Australia. Tomlin, J. A. (2003). A new paradigm for ranking pages on the world wide web.
  • 44. In Proceedings of the Twelfth Conference on World Link analysis Wide Web, pages 350–355, Budapest, Hungary. ACM Carlos Castillo Press. Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 45. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References
  • 46. Link analysis Carlos Castillo Outline Motivation Bibliometrics Bibliometric laws Measures of similarity Ranking Pagerank HITS Web graph characterization Scale-free networks Macroscopic structure Summary References