A web crawler works by starting with a specified URL and recursively retrieving links within pages to build a crawl frontier of URLs to visit. It checks each URL to see if it exists and parses the page to extract new links, adding them to the frontier. This process continues recursively to a depth of around 5 levels typically to gather most on-site information before stopping to avoid getting trapped on pages with infinite loops of links.