Ch19

425
-1

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
425
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ch19

  1. 1. Chapter 19 Web Crawler
  2. 2. Chapter Objectives <ul><li>Provide a case study example from problem statement through implementation </li></ul><ul><li>Demonstrate how hash tables and graphs can be used to solve a problem </li></ul>
  3. 3. Web Crawler <ul><li>A web crawler is a system that searches the web, beginning with a user-designated we page, looking for a designated target string </li></ul><ul><li>A web crawler follows all of the links on each page that it encounter until there are no more pages or until it reaches a designated limit </li></ul>
  4. 4. Web Crawler <ul><li>For this case study, we will create a graphical web crawler with the following requirements </li></ul><ul><ul><li>Enter a designated starting web page </li></ul></ul><ul><ul><li>Enter a target string for which to search </li></ul></ul><ul><ul><li>Limit the search to 50 pages </li></ul></ul><ul><ul><li>Display the results when done </li></ul></ul>
  5. 5. Web Crawler - Design <ul><li>Our web crawler system consists of three high-level components: </li></ul><ul><ul><li>The driver </li></ul></ul><ul><ul><li>The graphical user interface </li></ul></ul><ul><ul><li>The web crawler implementation </li></ul></ul><ul><ul><ul><li>Makes use of graphs and hashtables </li></ul></ul></ul>
  6. 6. Web Crawler - Design <ul><li>The algorithm for the web crawler is as follows </li></ul><ul><ul><li>Add the starting page to a HashSet of pages to be searched and to our graph </li></ul></ul><ul><ul><li>Remove a page from the set of pages to be searched </li></ul></ul><ul><ul><li>Search the page for the target string </li></ul></ul><ul><ul><ul><li>If string exists, add page to list of results </li></ul></ul></ul><ul><ul><li>Search the page for links </li></ul></ul><ul><ul><ul><li>If links have not already been searched, add them to set of pages to be searched and to our graph </li></ul></ul></ul><ul><ul><li>Repeat the three previous steps until our limit is reached or the set is empty </li></ul></ul>
  7. 7. FIGURE 19.1 User interface design
  8. 8. FIGURE 19.2 UML description
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×