A tool that enables users to locate information on the world wide web. Search Engines use keywords entered by users to find web sites which contain the information sought. Some Search Engines are specifically designed to find web sites intended for children. What is a Search Engine?
How a Search Engine Works
Create an index
Receive a query -- a set of search terms and
Look in the index file for matches
Gather the matching page entries and rank them by relevance
Format the results
Return the result page in HTML to the searcher’s web browser
Search Engine Diagram search Engine Index Indexed Pages Results Page send search query look in index get list of results return formatted results user opens a found page Indexer
CRAWLER-BASED SEARCH ENGINES
HUMAN -POWERED DIRECTORIES
HYBRID SEARCH ENGINES
TYPES OF SEARCH ENGINES:
Crawler-based search engines, such as Google,create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found. If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role. CRAWLER BASED SEARCH ENGINES
A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted. Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site . HUMAN-POWERED DIRECTORIES
In the web's early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries. HYBRID SEARCH ENGINES
PARTS OF CRAWLER -BASED ENGINE
Crawler-based search engines have three major elements:
SPIDER OR CRAWLER
SEARCH ENGINE SOFTWARE
SPIDER ALSO CALLED AS THE CRAWLER VISITS A WEB PAGE ,READS IT, AND THEN FOLLOWS LINKS TO OTHER PAGES WITHIN THE SITE.
THIS IS WHAT IT MEANS WHEN SOMEONE REFERS TO SITE BEING “SPIDERED” OR “CRAWLED”.
THE SPIDER RETURNS TO THE SITE ON A REGULAR BASIS,SUCH AS EVERY MONTH OR TWO,TO LOOK FOR CHANGES.
INDEX IS LIKE A GIANT BOOK CONTAINING A COPY OF EVERY WEB PAGE THAT THE SPIDER FINDS.
IF A WEB PAGE CHANGES,THEN THIS BOOK IS UPDATED WITH NEW INFORMATION.
SOMETIMES IT CAN TAKE A WHILE FOR NEW PAGES OR CHANGES THAT THE SPIDER FINDS TO BE ADDED TO THE INDEX.
THUS A WEB-PAGE MAY HAVE BEEN “SPIDERED” BUT NOT YET “INDEXED”.
SEARCH ENGINE SOFTWARE IS THE THIRD
PART OF A SEARCH ENGINE.
THIS IS THE PROGRAM THAT SIFTS THROUGH THE
MILLIONS OF PAGES RECORDED IN THE INDEX TO
FIND MATCHES TO A SEARCH AND RANK THEM IN
ORDER OF WHAT IT BELIEVES IS MOST RELEVANT.
SEARCH ENGINE MATH INTRODUCTION: Forget power searching. Don't worry about learning to do a "Boolean" search. All most people need to know is a little basic "search engine math" in order to improve their results. Come learn how to easily add, subtract and multiply your way into better searches at your favorite search engine. The information below works for nearly all of the major search engines. HERE WE GO …….
KEY TO REMEMBER : Be Specific
IF YOU WANT INFORMATION ABOUT WINDOWS 98 BUGS,SEARCH FOR:
“ WINDOWS 98 BUGS “
TYPE EXACTLY WHAT THE PROBLEM IS SUCH AS THIS:-
“ I CAN’T INSTALL A USB DEVICE IN
Using The + Symbol to Add Imagine you want to find pages that have references to both President Clinton and Kenneth Starr on the same page. You could search this way: +Clinton +starr SECOND EXAMPLE: +windows +98 +bugs
Using The - Symbol to Subtract
Imagine you want information about President Clinton but don't want to be overwhelmed by pages relating to the Monica Lewinsky scandal. You could search this way:
Using Quotation Marks To Multiply You tell a search engine to give you pages where the terms appear in exactly the order you specify. You do this by putting quotation marks around the phrase, like this : "windows 98 bugs"
Major Search Features
Internet Query (+, -, "quotes")
Boolean (AND, OR, NOT, parentheses)
Radio buttons or menus
Search Synonyms ( Thesaurus )
Search for alternate terms
Numbers: 40 / forty
Alternate spellings: color / colour
Spacing issues: Super Bowl / Superbowl
Technical terms: hives / urticaria
True synonyms: shears / clippers, doctor / physician
Exact match option
Possibly Useful Features
Handle typos and misspelling
Tend to return way too many results
Hard to determine "aboutness"
What Sites Need Search?
Sites with support materials
Search Form Interfaces
Basic search field everywhere!
Site home page
Simple search page
Few most useful options
Zones can be very helpful
Include help and/or tips on search pages
Search form on the Help page
Simple Search Page
Advanced Search Forms
Provide all available options
Graphic interface for multiple query elements
Standard field searching (title, URL, text)
Modification date (often inaccurate)
Metadata, XML tags or Catalog Fields
Keywords and descriptions
Size, materials, color, etc.
Advanced Search Page
Commerce Search Issues
Product records tend to lack detail
Index all text fields
Product name, description, color, material
Terms need synonyms
Pictures in results are very useful
Better to find too much than too little
How well a page answers the underlying question
How well the words on a page match the query
Algorithms vary: test with real data
Use search engine weighting options
Recommendations for special searches
Conform to web conventions
Provide site context
use site colors and images
include site navigation options
Show search metadata
Search box with options
Number of results
Location with results list
Results Items - Basic
For each page or record
Title or product name
Link and possibly URL
Emphasize match terms
Results Items - Description
Meta description tag
Properties description field
Top of page (gets navigation)
Or first <P> tag
Snippet shows text around word match
Results Page - Commerce
Results Page Commerce with Extras
Don’t just search products
People look for general information and return policies.
Results Page - Not Enough Info
Results Page - Too Much Info
Too Many Results
Common when searching large sites
Track common searches, show recommended pages
Sort phrase & all matches at the top
Display matched terms in context
Allow search in zones
Consider clustering results in categories
Why Searches Fail
topic out of scope for site
vocabulary mismatch ( car vs. auto )
misspellings or typos
complex search requirements not met
search syntax error
server errors (should be rare!)
Provide good no-matches pages
Include site context & navigation
Zero Matches Page Bad Example
Zero MatchesPage Good Example
How Search Engines Rank Web Pages ?
Search Toolbars & Utilities
Search Toolbars From Major Search Engines::
Provides access to AltaVista web, news and
multimedia search, page translation, term
highlighting and pop-up blocking.
Provides the ability to search with Google from the taskbar within Windows. In other words, you can search without having to be in your browser.
Ask Jeeves Toolbar :
In addition to searching Ask.com, the Jeeves toolbar lets you limit your search to news, dictionary, stock market, weather, events, maps, and the Ask Jeeves Kids web sites etc….
Share of visits
Google Strengths : * Size and scope: It is now the largest, and includes PDF, DOC, PS, and many other file types * Relevance based on sites' linkages and authority * Cached archive of Web pages as they looked were indexed * Additional databases: Google Groups, News, Directory, etc.
* Limited search features: no nesting, no truncation, does not support full Boolean * Link searches must be exact and are incomplete * Only indexes first 101 KB of a Web page and about 120 KB of PDFs * May search for plural/singular, synonyms, and grammatical variants without telling you Weaknesses :
Default Operation : Multiple search terms are processed as an AND operation by default Boolean Searching : Google always searches for pages containing all the words in Your query so need not use a + sign in front of the words. Proximity Searching : Google also detects phrase matches even when the quotes are not Used and usually ranks phrases matches higher.
Truncation: No truncation is available.Some automatic plural and word stemming occurs for English words and can be turned off by using a + sign in front of each term.You can use wildcards within phrases. For example:TO FIND “ A little neglect may breed mischief” When you are not sure of the 2 nd last word I.e ‘breed’ Search like this: "a little neglect may * mischief". Case Sensitivity : No Using either lower or upper case results in the same hits.