Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this


  1. 1. Chapter 19 Web Crawler
  2. 2. Chapter Objectives <ul><li>Provide a case study example from problem statement through implementation </li></ul><ul><li>Demonstrate how hash tables and graphs can be used to solve a problem </li></ul>
  3. 3. Web Crawler <ul><li>A web crawler is a system that searches the web, beginning with a user-designated we page, looking for a designated target string </li></ul><ul><li>A web crawler follows all of the links on each page that it encounter until there are no more pages or until it reaches a designated limit </li></ul>
  4. 4. Web Crawler <ul><li>For this case study, we will create a graphical web crawler with the following requirements </li></ul><ul><ul><li>Enter a designated starting web page </li></ul></ul><ul><ul><li>Enter a target string for which to search </li></ul></ul><ul><ul><li>Limit the search to 50 pages </li></ul></ul><ul><ul><li>Display the results when done </li></ul></ul>
  5. 5. Web Crawler - Design <ul><li>Our web crawler system consists of three high-level components: </li></ul><ul><ul><li>The driver </li></ul></ul><ul><ul><li>The graphical user interface </li></ul></ul><ul><ul><li>The web crawler implementation </li></ul></ul><ul><ul><ul><li>Makes use of graphs and hashtables </li></ul></ul></ul>
  6. 6. Web Crawler - Design <ul><li>The algorithm for the web crawler is as follows </li></ul><ul><ul><li>Add the starting page to a HashSet of pages to be searched and to our graph </li></ul></ul><ul><ul><li>Remove a page from the set of pages to be searched </li></ul></ul><ul><ul><li>Search the page for the target string </li></ul></ul><ul><ul><ul><li>If string exists, add page to list of results </li></ul></ul></ul><ul><ul><li>Search the page for links </li></ul></ul><ul><ul><ul><li>If links have not already been searched, add them to set of pages to be searched and to our graph </li></ul></ul></ul><ul><ul><li>Repeat the three previous steps until our limit is reached or the set is empty </li></ul></ul>
  7. 7. FIGURE 19.1 User interface design
  8. 8. FIGURE 19.2 UML description