The Parts of a Search EngineSpider (or “crawler”)IndexerSearch software (an algorithm)
The “spider” or “crawler”The spider visits a web page, reads it, and thenfollows links to other pages within the site. This iswhat it means when someone refers to a site being"spidered" or "crawled". This is also known as“harvesting”. The spider returns to the site on aregular basis, such as every month or two, to look forchanges.
The indexerEverything the spider finds goes into the second partof a search engine, the index. The index, sometimescalled the catalog, is like a giant book containing acopy of every web page that the spider finds. If a webpage changes, then this book is updated newinformation.
13UCB SIMS 202, Sept. 2004Avi Rappoport, Search Tools ConsultingBut Its NotIndex ahead of time• Find files or records• Open each one and read it• Store each word in a searchable indexProvide search forms• Match the query terms with words in the index• Sort documents by relevanceDisplay results
14UCB SIMS 202, Sept. 2004Avi Rappoport, Search Tools ConsultingcontentsearchfunctionalityuserinterfaceSearch is Mostly InvisibleLike an iceberg,2/3 below water
How Search Engines Work?1) They collect information from selected web sites2) The employ special software robots, called spiders, tocrawl web pages3) Spiders build lists of the words found in Web sites.1) When a spider is building its lists, the spider is Web crawling.4) Spiders store the lists in the engine’s database5) The engine’s indexing software builds an index of words6) Information is matched against query input andretrieved (processing algorithm)