A spider is a program designed to automatically gather webpages.
If, for example, you want to automatically download all of the speeches delivered in Congress today – without manually clicking on every one, cutting and pasting, etc. – you might want to build a spider.
Requests a few items with known urls from a website.
Type 2 Requester
Requests a few items, then requests (some set of) pages to which those items link.
Type 3 Requester
Starts at a given url, and then requests everything linked, everything linked by that, etc. at the same host server . The idea here is usually to download an entire website.
Type 4 Requester
Starts at a given url, requests everything linked anywhere , everything linked by that, etc. until it, perhaps, visits the entire web.
YOU – I am talking to YOU – in all likelihood have no business writing Type 3 or Type 4 spiders. These can easily go seriously awry causing mayhem of many sorts. Write only spiders with known finite scope.