Outline            Motivation             Algorithms       Experiments      Summary              References




          ...
Outline            Motivation             Algorithms     Experiments      Summary              References




      Motiva...
Outline            Motivation             Algorithms     Experiments      Summary              References




Motivation

...
Outline            Motivation             Algorithms     Experiments      Summary              References




Motivation

...
Outline            Motivation             Algorithms     Experiments      Summary              References




Motivation

...
Outline            Motivation             Algorithms          Experiments      Summary              References




The pro...
Outline            Motivation             Algorithms     Experiments      Summary              References




Optimal scen...
Outline            Motivation             Algorithms     Experiments      Summary              References




Restrictions...
Outline            Motivation             Algorithms     Experiments      Summary              References




Restrictions...
Outline            Motivation             Algorithms     Experiments      Summary              References




Restrictions...
Outline            Motivation             Algorithms     Experiments      Summary              References




Restrictions...
Outline            Motivation             Algorithms     Experiments      Summary              References




Distribution...
Outline            Motivation             Algorithms     Experiments      Summary              References




Realistic sc...
Outline            Motivation             Algorithms     Experiments      Summary              References




Number of ac...
Outline            Motivation             Algorithms     Experiments      Summary              References




Goal




   ...
Outline            Motivation             Algorithms     Experiments      Summary              References




Algorithms

...
Outline            Motivation             Algorithms     Experiments      Summary              References




Algorithms

...
Outline            Motivation             Algorithms     Experiments      Summary              References




Algorithms

...
Outline            Motivation             Algorithms     Experiments      Summary              References




Queues used ...
Outline            Motivation             Algorithms     Experiments      Summary              References




Algorithms b...
Outline            Motivation             Algorithms     Experiments      Summary              References




Algorithms b...
Outline            Motivation             Algorithms     Experiments      Summary              References




Algorithms b...
Outline            Motivation             Algorithms     Experiments      Summary              References




Algorithms n...
Outline            Motivation             Algorithms     Experiments      Summary              References




Algorithms n...
Outline            Motivation             Algorithms     Experiments      Summary              References




Experiments
...
Outline            Motivation             Algorithms     Experiments      Summary              References




Experiments
...
Outline            Motivation             Algorithms     Experiments      Summary              References




Experiments
...
Outline            Motivation             Algorithms     Experiments      Summary              References




Experiments
...
Outline            Motivation             Algorithms     Experiments      Summary              References




Experiments
...
Outline            Motivation             Algorithms     Experiments      Summary              References




Simulation p...
Outline            Motivation             Algorithms     Experiments      Summary              References




Simulation p...
Outline            Motivation             Algorithms     Experiments      Summary              References




Simulation p...
Outline            Motivation             Algorithms     Experiments      Summary              References




Simulation p...
Outline            Motivation             Algorithms     Experiments      Summary              References




Results with...
Outline            Motivation             Algorithms     Experiments      Summary              References




Results with...
Outline            Motivation             Algorithms     Experiments      Summary              References




Speed-ups wi...
Outline            Motivation             Algorithms     Experiments      Summary              References




Crawling the...
Outline            Motivation             Algorithms     Experiments      Summary              References




Pagerank vs ...
Outline            Motivation             Algorithms     Experiments      Summary              References




Depth is not...
Outline            Motivation             Algorithms     Experiments      Summary              References




Summary



 ...
Outline            Motivation             Algorithms     Experiments      Summary              References




Summary



 ...
Outline            Motivation             Algorithms     Experiments      Summary              References




Summary



 ...
Outline            Motivation             Algorithms     Experiments      Summary              References




Summary



 ...
Outline            Motivation             Algorithms     Experiments      Summary              References




Open problem...
Outline            Motivation             Algorithms     Experiments      Summary              References




Open problem...
Outline            Motivation             Algorithms     Experiments      Summary              References




Open problem...
Outline            Motivation             Algorithms     Experiments      Summary              References




            ...
Outline            Motivation             Algorithms     Experiments      Summary              References




            ...
Outline            Motivation             Algorithms     Experiments      Summary              References




C. Castillo,...
Outline            Motivation             Algorithms     Experiments      Summary              References




C. Castillo,...
Upcoming SlideShare
Loading in...5
×

Scheduling Algorithms for Web Crawling

2,979

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,979
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
93
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Scheduling Algorithms for Web Crawling

  1. 1. Outline Motivation Algorithms Experiments Summary References Scheduling Algorithms for Web Crawling C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl LA-WEB 2004 C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  2. 2. Outline Motivation Algorithms Experiments Summary References Motivation Algorithms Experiments Summary References C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  3. 3. Outline Motivation Algorithms Experiments Summary References Motivation Web search generates more than 13% of the traffic to Web sites [StatMarket, 2003]. No search engine indexes more than one third of the publicly available Web [Lawrence and Giles, 1998]. If we cannot download all of the pages, we should at least download the most “important” ones. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  4. 4. Outline Motivation Algorithms Experiments Summary References Motivation Web search generates more than 13% of the traffic to Web sites [StatMarket, 2003]. No search engine indexes more than one third of the publicly available Web [Lawrence and Giles, 1998]. If we cannot download all of the pages, we should at least download the most “important” ones. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  5. 5. Outline Motivation Algorithms Experiments Summary References Motivation Web search generates more than 13% of the traffic to Web sites [StatMarket, 2003]. No search engine indexes more than one third of the publicly available Web [Lawrence and Giles, 1998]. If we cannot download all of the pages, we should at least download the most “important” ones. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  6. 6. Outline Motivation Algorithms Experiments Summary References The problem of Web crawling We must download pages with sizes given by Pi , over a connection of bandwidth B. Trivial solution: we download all the pages simultaneously at a speed proportional to the size of each page: Pi Bi = T∗ T ∗ is the optimal time to use all the available bandwidth: Pi T∗ = B C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  7. 7. Outline Motivation Algorithms Experiments Summary References Optimal scenario C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  8. 8. Outline Motivation Algorithms Experiments Summary References Restrictions Robot exclusion protocol [Koster, 1995] Waiting time ≈ 10 − 30 seconds Web sites bandwidth BiMAX lower than the crawler bandwidth B Distribution of Web site sizes is very skewed C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  9. 9. Outline Motivation Algorithms Experiments Summary References Restrictions Robot exclusion protocol [Koster, 1995] Waiting time ≈ 10 − 30 seconds Web sites bandwidth BiMAX lower than the crawler bandwidth B Distribution of Web site sizes is very skewed C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  10. 10. Outline Motivation Algorithms Experiments Summary References Restrictions Robot exclusion protocol [Koster, 1995] Waiting time ≈ 10 − 30 seconds Web sites bandwidth BiMAX lower than the crawler bandwidth B Distribution of Web site sizes is very skewed C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  11. 11. Outline Motivation Algorithms Experiments Summary References Restrictions Robot exclusion protocol [Koster, 1995] Waiting time ≈ 10 − 30 seconds Web sites bandwidth BiMAX lower than the crawler bandwidth B Distribution of Web site sizes is very skewed C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  12. 12. Outline Motivation Algorithms Experiments Summary References Distribution of site sizes C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  13. 13. Outline Motivation Algorithms Experiments Summary References Realistic scenario C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  14. 14. Outline Motivation Algorithms Experiments Summary References Number of active robots in a batch C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  15. 15. Outline Motivation Algorithms Experiments Summary References Goal If each page has a certain score, capture most of the total value of this score downloading just a fraction of the pages. We will use the total Pagerank of the downloaded set vs. the fraction of downloaded pages as a measure of quality C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  16. 16. Outline Motivation Algorithms Experiments Summary References Algorithms Algorithms are based on a scheduler with two levels of queues: Queue of Web sites Queue of Web pages in each Web site C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  17. 17. Outline Motivation Algorithms Experiments Summary References Algorithms Algorithms are based on a scheduler with two levels of queues: Queue of Web sites Queue of Web pages in each Web site C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  18. 18. Outline Motivation Algorithms Experiments Summary References Algorithms Algorithms are based on a scheduler with two levels of queues: Queue of Web sites Queue of Web pages in each Web site C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  19. 19. Outline Motivation Algorithms Experiments Summary References Queues used for the scheduling C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  20. 20. Outline Motivation Algorithms Experiments Summary References Algorithms based on Pagerank Optimal/Oracle: crawler asks for the Pagerank value of each page in the frontier using an “Oracle”. This is not available in a real crawl as we do not have the entire graph The average relative error for estimating the Pagerank four months ahead is about 78% [Cho and Adams, 2004], so historical information from previous crawls is not too useful Batch-Pagerank: Pagerank calculations are executed over the subset of known pages [Cho et al., 1998] Partial-Pagerank: a “temporary” Pagerank value is assigned to the pages in between batch-Pagerank calculations C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  21. 21. Outline Motivation Algorithms Experiments Summary References Algorithms based on Pagerank Optimal/Oracle: crawler asks for the Pagerank value of each page in the frontier using an “Oracle”. This is not available in a real crawl as we do not have the entire graph The average relative error for estimating the Pagerank four months ahead is about 78% [Cho and Adams, 2004], so historical information from previous crawls is not too useful Batch-Pagerank: Pagerank calculations are executed over the subset of known pages [Cho et al., 1998] Partial-Pagerank: a “temporary” Pagerank value is assigned to the pages in between batch-Pagerank calculations C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  22. 22. Outline Motivation Algorithms Experiments Summary References Algorithms based on Pagerank Optimal/Oracle: crawler asks for the Pagerank value of each page in the frontier using an “Oracle”. This is not available in a real crawl as we do not have the entire graph The average relative error for estimating the Pagerank four months ahead is about 78% [Cho and Adams, 2004], so historical information from previous crawls is not too useful Batch-Pagerank: Pagerank calculations are executed over the subset of known pages [Cho et al., 1998] Partial-Pagerank: a “temporary” Pagerank value is assigned to the pages in between batch-Pagerank calculations C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  23. 23. Outline Motivation Algorithms Experiments Summary References Algorithms not based on Pagerank Depth: pages are given a priority based on their depths. This is graph traversal in breadth-first ordering [Najork and Wiener, 2001] Length: pages from the Web sites which seem to be bigger are crawled first. We do not know which are really the bigger Web sites until the end of the crawl. We use partial information C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  24. 24. Outline Motivation Algorithms Experiments Summary References Algorithms not based on Pagerank Depth: pages are given a priority based on their depths. This is graph traversal in breadth-first ordering [Najork and Wiener, 2001] Length: pages from the Web sites which seem to be bigger are crawled first. We do not know which are really the bigger Web sites until the end of the crawl. We use partial information C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  25. 25. Outline Motivation Algorithms Experiments Summary References Experiments Download a sample of pages using the WIRE crawler [Baeza-Yates and Castillo, 2002] 3.5 million pages from over 50,000 Web sites in .CL At most 25,000 pages from each Web site Strategies are simulated on a graph built using actual data Simulation includes: bandwidth saturation, network speed of different Web sites, page sizes, waiting time, latency, etc. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  26. 26. Outline Motivation Algorithms Experiments Summary References Experiments Download a sample of pages using the WIRE crawler [Baeza-Yates and Castillo, 2002] 3.5 million pages from over 50,000 Web sites in .CL At most 25,000 pages from each Web site Strategies are simulated on a graph built using actual data Simulation includes: bandwidth saturation, network speed of different Web sites, page sizes, waiting time, latency, etc. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  27. 27. Outline Motivation Algorithms Experiments Summary References Experiments Download a sample of pages using the WIRE crawler [Baeza-Yates and Castillo, 2002] 3.5 million pages from over 50,000 Web sites in .CL At most 25,000 pages from each Web site Strategies are simulated on a graph built using actual data Simulation includes: bandwidth saturation, network speed of different Web sites, page sizes, waiting time, latency, etc. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  28. 28. Outline Motivation Algorithms Experiments Summary References Experiments Download a sample of pages using the WIRE crawler [Baeza-Yates and Castillo, 2002] 3.5 million pages from over 50,000 Web sites in .CL At most 25,000 pages from each Web site Strategies are simulated on a graph built using actual data Simulation includes: bandwidth saturation, network speed of different Web sites, page sizes, waiting time, latency, etc. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  29. 29. Outline Motivation Algorithms Experiments Summary References Experiments Download a sample of pages using the WIRE crawler [Baeza-Yates and Castillo, 2002] 3.5 million pages from over 50,000 Web sites in .CL At most 25,000 pages from each Web site Strategies are simulated on a graph built using actual data Simulation includes: bandwidth saturation, network speed of different Web sites, page sizes, waiting time, latency, etc. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  30. 30. Outline Motivation Algorithms Experiments Summary References Simulation parameters Algorithm Waiting time between pages from the same Web site w Number of pages downloaded per connection when re-using the HTTP connection k Number of robots r C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  31. 31. Outline Motivation Algorithms Experiments Summary References Simulation parameters Algorithm Waiting time between pages from the same Web site w Number of pages downloaded per connection when re-using the HTTP connection k Number of robots r C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  32. 32. Outline Motivation Algorithms Experiments Summary References Simulation parameters Algorithm Waiting time between pages from the same Web site w Number of pages downloaded per connection when re-using the HTTP connection k Number of robots r C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  33. 33. Outline Motivation Algorithms Experiments Summary References Simulation parameters Algorithm Waiting time between pages from the same Web site w Number of pages downloaded per connection when re-using the HTTP connection k Number of robots r C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  34. 34. Outline Motivation Algorithms Experiments Summary References Results with one robot C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  35. 35. Outline Motivation Algorithms Experiments Summary References Results with many robots C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  36. 36. Outline Motivation Algorithms Experiments Summary References Speed-ups with the “Length” strategy C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  37. 37. Outline Motivation Algorithms Experiments Summary References Crawling the real Web using the “Length” strategy C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  38. 38. Outline Motivation Algorithms Experiments Summary References Pagerank vs day of crawl C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  39. 39. Outline Motivation Algorithms Experiments Summary References Depth is not correlated with Pagerank When depth is ≥ 2 links from the home page C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  40. 40. Outline Motivation Algorithms Experiments Summary References Summary The restrictions, specially waiting time, create a difficult problem for scheduling An strategy with an “oracle” was too greedy We try to keep Web sites in the frontier for as long as possible, so we always have several Web sites to choose from Simulation ensures the same conditions, which is critical because the Web is very dynamic C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  41. 41. Outline Motivation Algorithms Experiments Summary References Summary The restrictions, specially waiting time, create a difficult problem for scheduling An strategy with an “oracle” was too greedy We try to keep Web sites in the frontier for as long as possible, so we always have several Web sites to choose from Simulation ensures the same conditions, which is critical because the Web is very dynamic C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  42. 42. Outline Motivation Algorithms Experiments Summary References Summary The restrictions, specially waiting time, create a difficult problem for scheduling An strategy with an “oracle” was too greedy We try to keep Web sites in the frontier for as long as possible, so we always have several Web sites to choose from Simulation ensures the same conditions, which is critical because the Web is very dynamic C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  43. 43. Outline Motivation Algorithms Experiments Summary References Summary The restrictions, specially waiting time, create a difficult problem for scheduling An strategy with an “oracle” was too greedy We try to keep Web sites in the frontier for as long as possible, so we always have several Web sites to choose from Simulation ensures the same conditions, which is critical because the Web is very dynamic C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  44. 44. Outline Motivation Algorithms Experiments Summary References Open problems Scheduling using historical information Exploiting the Web’s structure Adversarial IR: Spam detection before downloading the pages C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  45. 45. Outline Motivation Algorithms Experiments Summary References Open problems Scheduling using historical information Exploiting the Web’s structure Adversarial IR: Spam detection before downloading the pages C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  46. 46. Outline Motivation Algorithms Experiments Summary References Open problems Scheduling using historical information Exploiting the Web’s structure Adversarial IR: Spam detection before downloading the pages C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  47. 47. Outline Motivation Algorithms Experiments Summary References Baeza-Yates, R. and Castillo, C. (2002). Balancing volume, quality and freshness in web crawling. In Soft Computing Systems - Design, Management and Applications, pages 565–572, Santiago, Chile. IOS Press Amsterdam. Cho, J. and Adams, R. (2004). Page quality: In search of an unbiased Web ranking. Technical report, UCLA Computer Science. Cho, J., Garc´ ıa-Molina, H., and Page, L. (1998). Efficient crawling through URL ordering. In Proceedings of the seventh conference on World Wide Web, Brisbane, Australia. Koster, M. (1995). Robots in the web: threat or treat ? ConneXions, 9(4). C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  48. 48. Outline Motivation Algorithms Experiments Summary References Lawrence, S. and Giles, C. L. (1998). Searching the World Wide Web. Science, 280(5360):98–100. Najork, M. and Wiener, J. L. (2001). Breadth-first crawling yields high-quality pages. In Proceedings of the Tenth Conference on World Wide Web, pages 114–118, Hong Kong. Elsevier Science. StatMarket (2003). Search engine referrals nearly double worldwide. http://websidestory.com/pressroom/pressreleases.html- ?id=181. C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  49. 49. Outline Motivation Algorithms Experiments Summary References C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  50. 50. Outline Motivation Algorithms Experiments Summary References C. Castillo, M. Marin, A. Rodr´ ıguez and R. Baeza-Yates Center for Web Research www.cwr.cl Scheduling Algorithms for Web Crawling
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×