Outline                              Motivation                Results                            Conclusions




        ...
Outline                              Motivation   Results                            Conclusions




              Motivat...
Outline                              Motivation   Results                            Conclusions



Motivation

          ...
Outline                              Motivation   Results                            Conclusions



Motivation

          ...
Outline                              Motivation   Results                            Conclusions



Motivation

          ...
Outline                              Motivation   Results                            Conclusions



Motivation

          ...
Outline                              Motivation   Results                            Conclusions



Motivation

          ...
Outline                              Motivation              Results                            Conclusions



Collections...
Outline                              Motivation                Results                            Conclusions



Collectio...
Outline                              Motivation   Results                            Conclusions



Scale-free topology


...
Outline                              Motivation   Results                            Conclusions



Scale-free topology


...
Outline                              Motivation   Results                            Conclusions



Scale-free topology


...
Outline                                   Motivation                                Results                               ...
Outline                                    Motivation                                   Results                           ...
Outline                                                  Motivation                                            Results    ...
Outline                              Motivation              Results                            Conclusions



Power-law e...
Outline                              Motivation                 Results                             Conclusions



Power-l...
Outline                              Motivation              Results                            Conclusions



Hostgraph
 ...
Outline                              Motivation             Results                            Conclusions



Hostgraph al...
Outline                                           Motivation                                                              ...
Outline                              Motivation               Results                            Conclusions



Conclusion...
Outline                              Motivation               Results                            Conclusions



Conclusion...
Outline                              Motivation               Results                            Conclusions



Conclusion...
Outline                              Motivation               Results                            Conclusions



Conclusion...
Outline                              Motivation               Results                            Conclusions



Conclusion...
Upcoming SlideShare
Loading in …5
×

Link Analysis in National Web Domains (OSWIR 2005 Compiegne)

687 views
641 views

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
687
On SlideShare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Link Analysis in National Web Domains (OSWIR 2005 Compiegne)

  1. 1. Outline Motivation Results Conclusions Link Analysis in National Web Domains Ricardo Baeza-Yates and Carlos Castillo ICREA / C´tedra Telef´nica, Universitat Pompeu Fabra a o http://www.upf.edu/dtecn/ OSWIR 2005 Compiegne, France September 19, 2005 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  2. 2. Outline Motivation Results Conclusions Motivation 1 Results 2 Conclusions 3 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  3. 3. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  4. 4. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  5. 5. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  6. 6. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  7. 7. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  8. 8. Outline Motivation Results Conclusions Collections used V Different economical, historical, linguistic, geographical contexts Collection Year Brazil 2005 Chile 2004 Greece 2004 Indochina 2004 Italy 2004 South Korea 2004 Spain 2004 U. K. 2002 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  9. 9. Outline Motivation Results Conclusions Collections used Collection Year Available hosts Pages [mill] (rank) [mill] 11th Brazil 2005 3.9 4.7 42th Chile 2004 0.3 3.3 40th Greece 2004 0.3 3.7 38th Indochina 2004 0.5 7.4 4th Italy 2004 9.3 41.3 47th South Korea 2004 0.2 8.9 25th Spain 2004 1.3 16.2 10th U. K. 2002 4.4 18.5 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  10. 10. Outline Motivation Results Conclusions Scale-free topology If we sort pages by the number of in-links, the k th page has indegree proportional to k −α (Zipf’s Law). = The fraction of pages with x in-links is proportional to x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web Partial explanation: a multiplicative process; if dt is the number of links at time t, then dt+1 = C × dt . Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  11. 11. Outline Motivation Results Conclusions Scale-free topology If we sort pages by the number of in-links, the k th page has indegree proportional to k −α (Zipf’s Law). = The fraction of pages with x in-links is proportional to x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web Partial explanation: a multiplicative process; if dt is the number of links at time t, then dt+1 = C × dt . Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  12. 12. Outline Motivation Results Conclusions Scale-free topology If we sort pages by the number of in-links, the k th page has indegree proportional to k −α (Zipf’s Law). = The fraction of pages with x in-links is proportional to x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web Partial explanation: a multiplicative process; if dt is the number of links at time t, then dt+1 = C × dt . Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  13. 13. Outline Motivation Results Conclusions In-degree Brazil Chile Greece 10−1 10−1 10−1 10−2 10−2 10−2 10−3 10−3 10−3 −4 −4 −4 10 10 10 10−5 10−5 10−5 10−6 10−6 10−6 10−7 0 10−7 0 10−7 0 101 102 103 104 101 102 103 104 101 102 103 104 10 10 10 Italy Korea Spain 10−1 10−1 10−1 −2 −2 −2 10 10 10 −3 −3 −3 10 10 10 −4 −4 −4 10 10 10 10−5 10−5 10−5 10−6 10−6 10−6 10−7 10−7 10−7 100 101 102 103 104 100 101 102 103 104 100 101 102 103 104 U.K. 10−1 10−2 −3 10 10−4 10−5 10−6 −7 10 100 101 102 103 104 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  14. 14. Outline Motivation Results Conclusions Out-degree Brazil Chile Greece 10−1 10−1 10−1 −2 −2 −2 10 10 10 10−3 10−3 10−3 −4 −4 10−4 10 10 10−5 10−5 10−5 10−6 0 10−6 0 10−6 0 101 102 103 101 102 103 101 102 103 10 10 10 Italy Korea Spain 10−1 10−1 10−1 10−2 10−2 10−2 10−3 10−3 10−3 10−4 10−4 10−4 −5 −5 −5 10 10 10 10−6 10−6 10−6 100 1 2 3 100 1 2 3 100 101 102 103 10 10 10 10 10 10 U.K. 10−1 −2 10 −3 10 10−4 10−5 −6 10 100 101 102 103 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  15. 15. Outline Motivation Results Conclusions Link scores (PageRank, Hubs, Authorities) Brazil Chile Greece Korea 10-2 10-2 10-2 10-2 -3 -3 -3 -3 10 10 10 10 10-4 10-4 10-4 10-4 -5 -5 -5 -5 10 10 10 10 10-6 10-6 10-6 10-6 10-7 -7 10-7 -7 10-7 -7 10-7 -7 -6 -5 -4 -6 -5 -4 -6 -5 -4 -6 -5 -4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Brazil Chile Greece Korea 10-3 10-3 10-3 10-3 -4 -4 -4 -4 10 10 10 10 10-5 10-5 10-5 10-5 -6 -6 -6 -6 10 10 10 10 -7 -7 -7 -7 10 10 10 10 -7 -6 -5 -4 -7 -6 -5 -4 -7 -6 -5 -4 10-7 10-6 10-5 10-4 10 10 10 10 10 10 10 10 10 10 10 10 Brazil Chile Greece Korea 10-3 10-3 10-3 10-3 10-4 10-4 10-4 10-4 10-5 10-5 10-5 10-5 10-6 10-6 10-6 10-6 10-7 -7 10-7 -7 10-7 -7 10-7 -7 -6 -5 -4 -6 -5 -4 -6 -5 -4 -6 -5 -4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  16. 16. Outline Motivation Results Conclusions Power-law exponents Collection In- Degree Brazil 1.9 Chile 2.0 Greece 1.9 Indochina 1.6 Italy 1.8 South Korea 1.9 Spain 2.1 U. K. 1.8 (Broder. . . 2000) 2.1 (Dill. . . 2002) 2.1 ≈2 (Kleinberg. . . 1999) Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  17. 17. Outline Motivation Results Conclusions Power-law exponents Collection In- Outdegree Page- HITS degree Small Large Rank Hubs Auth. Brazil 1.9 0.7 2.7 1.8 2.9 1.8 Chile 2.0 0.7 2.6 1.9 2.7 1.9 Greece 1.9 0.6 1.9 1.8 2.6 1.8 Indochina 1.6 0.7 2.6 Italy 1.8 0.7 2.5 South Korea 1.9 0.3 2.0 1.8 3.7 1.8 Spain 2.1 0.9 4.2 2.0 U. K. 1.8 0.7 3.4 (Broder. . . 2000) 2.1 2.7 (Dill. . . 2002) 2.1 2.2 (Pandurangan. . . 2002) 2.1 ≈2 (Kleinberg. . . 1999) Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  18. 18. Outline Motivation Results Conclusions Hostgraph www.example1.com S1 www.example2.com S2 www.example3.com S3 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  19. 19. Outline Motivation Results Conclusions Hostgraph also exhibits a power-law Hostgraph degree Collection In Out Brazil 1.9 1.9 Chile 2.0 1.7 Greece 2.0 1.6 South Korea 1.2 1.4 Spain 1.8 1.3 (Bharat. . . 2001) 1.6-1.7 1.7-1.8 (Dill. . . 2002) 2.3 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  20. 20. Outline Motivation Results Conclusions Web structure: connected components “Normal” vs “Giant” strongly connected components Brazil Chile Greece 100 100 100 10-1 10-1 10-1 10-2 10-2 10-2 10-3 10-3 10-3 10-4 10-4 10-4 10-5 10-5 10-5 -6 -6 -6 10 10 10 100 101 102 103 104 105 100 101 102 103 104 105 100 101 102 103 104 105 Korea Spain 100 100 -1 -1 10 10 -2 -2 10 10 -3 -3 10 10 10-4 10-4 10-5 10-5 10-6 10-6 100 101 102 103 104 105 100 101 102 103 104 105 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  21. 21. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  22. 22. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  23. 23. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  24. 24. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  25. 25. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/

×