2. EVERY SEARCH ENGINE HAS THREE MAIN FUNCTIONS
1. Crawling (to discover content)
2. Indexing (to track and store content)
3. Retrieval (to fetch relevant content when users query
the search engine).
3. CRAWLING - ACQUISITION OF DATA ABOUT A WEBSITE
An automated bot (called a “spider”) visits page after page
as quickly as possible, using page links to find where to go
next. Even in the earliest days, Google’s spiders could read
several hundred pages per second. Nowadays, it’s in the
thousands.
4. INDEXING - DATA FROM A CRAWL IS PROCESSED
Imagine making a list of all the books you own, their
publishers, their authors, their genres, their page counts,
etc. Crawling is when you comb through each book while
indexing is when you log them to your list.
5. RETRIEVAL AND RANKING
Retrieval is when the search engine processes your search query
and returns the most relevant pages that match your query.
Ranking algorithms check your search query against billions of
pages to determine each one’s relevance.
6. GOOGLE SEARCH SHORTCUTS
• related: example.com
• filetype:
• operators
• “ “
• *
• cached
• Search by image
• Site: example.com
• site:.edu
• Searching By Date
• Allintitle:
• 100254789 = English