3. Search Engines
What is a Search Engine?
• Software that enables users to search the Internet or Intranet using keywords.
• A program that acts as a catalogue for the Internet or Intranet.
• A web search tool that automatically visits websites (using crawlers), records and indexes them within its
database, and generates results based on a user's search criteria.
Popular Search Engines
5. Search Engines
How do they work?
• Web Crawling
Automated “spider” that follows links and then analyses content in pages to determine what should be
indexed. Metadata is very important in this phase.
• Indexing
Data about web pages are stored in an index database for use in later queries. Some search engines, such
as Google, store all or part of the source page (referred to as a cache) as well as information about the
web pages.
• Searching & Results
The engine looks up the index and provides a listing of best-matching web pages according to its criteria.
The usefulness of a search engine depends on the relevance of the result set it gives back.
6. Search Engines
University Search Engine Statistics
• The University uses Microsoft SharePoint Portal Server 2003
• ~88,000 web pages are crawled every day
• The crawl starts at 6:00am every day
• The crawl takes ~ 20 minutes to complete
• The search engine is currently crawling:
– www.unisa.edu.au
– www.unisanet.unisa.edu.au
– www.library.unisa.edu.au
10. Search Engines
When I search I don’t get good results …
Four possible causes:
• No Content
There is no content that matches your search query.
• Bad search phrases
You are putting in really bad search phrases!
• Bad search engine
The search engine isn’t very good, or isn’t searching the content that it should be.
• Bad content
The content that you want exists, but the metadata and text within the page is not optimised for a search
engine.