• Save
Search Enginesv2
Upcoming SlideShare
Loading in...5
×
 

Search Enginesv2

on

  • 3,405 views

The ultimate technology

The ultimate technology

Statistics

Views

Total Views
3,405
Views on SlideShare
3,388
Embed Views
17

Actions

Likes
4
Downloads
0
Comments
1

3 Embeds 17

http://www.frankluong.com 15
file:// 1
http://show.fanatizer.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Search Enginesv2 Search Enginesv2 Presentation Transcript

  • Yahoo! Google Metacrawler Ask Jeeves Khoj msn Gigablast Teoma Hotbot Wisenut Altavista Ask jeeves AllTheWeb
  • PRESENTING ……….. SEARCH ENGINES
  • A tool that enables users to locate information on the world wide web. Search Engines use keywords entered by users to find web sites which contain the information sought. Some Search Engines are specifically designed to find web sites intended for children. What is a Search Engine?
  • How a Search Engine Works
    • Create an index
    • Receive a query -- a set of search terms and
    • commands
    • Look in the index file for matches
    • Gather the matching page entries and rank them by relevance
    • Format the results
    • Return the result page in HTML to the searcher’s web browser
  • Search Engine Diagram search Engine Index Indexed Pages Results Page send search query look in index get list of results return formatted results user opens a found page Indexer
    • CRAWLER-BASED SEARCH ENGINES
    • HUMAN -POWERED DIRECTORIES
    • HYBRID SEARCH ENGINES
    TYPES OF SEARCH ENGINES:
  • Crawler-based search engines, such as Google,create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found. If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role. CRAWLER BASED SEARCH ENGINES
  • A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted. Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site . HUMAN-POWERED DIRECTORIES
  • In the web's early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries. HYBRID SEARCH ENGINES
  • PARTS OF CRAWLER -BASED ENGINE
    • Crawler-based search engines have three major elements:
    • SPIDER OR CRAWLER
    • INDEX
    • SEARCH ENGINE SOFTWARE
    • CONTD… .
    • SPIDER ALSO CALLED AS THE CRAWLER VISITS A WEB PAGE ,READS IT, AND THEN FOLLOWS LINKS TO OTHER PAGES WITHIN THE SITE.
    • THIS IS WHAT IT MEANS WHEN SOMEONE REFERS TO SITE BEING “SPIDERED” OR “CRAWLED”.
    • THE SPIDER RETURNS TO THE SITE ON A REGULAR BASIS,SUCH AS EVERY MONTH OR TWO,TO LOOK FOR CHANGES.
  • CONTD…
    • INDEX IS LIKE A GIANT BOOK CONTAINING A COPY OF EVERY WEB PAGE THAT THE SPIDER FINDS.
    • IF A WEB PAGE CHANGES,THEN THIS BOOK IS UPDATED WITH NEW INFORMATION.
    • SOMETIMES IT CAN TAKE A WHILE FOR NEW PAGES OR CHANGES THAT THE SPIDER FINDS TO BE ADDED TO THE INDEX.
    • THUS A WEB-PAGE MAY HAVE BEEN “SPIDERED” BUT NOT YET “INDEXED”.
  • contd ….
    • SEARCH ENGINE SOFTWARE IS THE THIRD
    • PART OF A SEARCH ENGINE.
    • THIS IS THE PROGRAM THAT SIFTS THROUGH THE
    • MILLIONS OF PAGES RECORDED IN THE INDEX TO
    • FIND MATCHES TO A SEARCH AND RANK THEM IN
    • ORDER OF WHAT IT BELIEVES IS MOST RELEVANT.
  • SEARCH ENGINE MATH INTRODUCTION: Forget power searching. Don't worry about learning to do a "Boolean" search. All most people need to know is a little basic "search engine math" in order to improve their results. Come learn how to easily add, subtract and multiply your way into better searches at your favorite search engine. The information below works for nearly all of the major search engines. HERE WE GO …….
  • KEY TO REMEMBER : Be Specific
    • IF YOU WANT INFORMATION ABOUT WINDOWS 98 BUGS,SEARCH FOR:
    • “ WINDOWS 98 BUGS “
            • OR
            • TYPE EXACTLY WHAT THE PROBLEM IS SUCH AS THIS:-
    • “ I CAN’T INSTALL A USB DEVICE IN
    • WINDOWS 98”
    • WINDOWS 98”
  • Using The + Symbol to Add Imagine you want to find pages that have references to both President Clinton and Kenneth Starr on the same page. You could search this way: +Clinton +starr SECOND EXAMPLE: +windows +98 +bugs
  • Using The - Symbol to Subtract
    •     Imagine you want information about President Clinton but don't want to be overwhelmed by pages relating to the Monica Lewinsky scandal. You could search this way:
    • clinton -lewinsky
  • Using Quotation Marks To Multiply You tell a search engine to give you pages where the terms appear in exactly the order you specify. You do this by putting quotation marks around the phrase, like this : "windows 98 bugs"
  • Major Search Features
    • Search Operators
      • Internet Query (+, -, "quotes")
      • Boolean (AND, OR, NOT, parentheses)
      • Radio buttons or menus
    • Field Searching
      • File info
      • Meta tags
      • Database fields
  • Search Synonyms ( Thesaurus )
    • Search for alternate terms
      • Numbers: 40 / forty
      • Alternate spellings: color / colour
      • Spacing issues: Super Bowl / Superbowl
      • Technical terms: hives / urticaria
      • True synonyms: shears / clippers, doctor / physician
    • Exact match option
    • Stemming (language-based)
  • Possibly Useful Features
    • Spell checking
    • Fuzzy Matching
      • Handle typos and misspelling
      • Tend to return way too many results
    • Concept Search
      • Hard to determine "aboutness"
  • What Sites Need Search?
    • Informational sites
    • Commerce sites
    • Sites with support materials
      • Documentation
      • FAQs
      • Message boards
      • Return policies
  • Search Form Interfaces
    • Basic search field everywhere!
      • Site home page
      • Navigation area
    • Simple search page
      • Few most useful options
      • Zones can be very helpful
      • Include help and/or tips on search pages
    • Search form on the Help page
  • Simple Search Page
  • Advanced Search Forms
    • Provide all available options
      • Graphic interface for multiple query elements
      • Standard field searching (title, URL, text)
      • Modification date (often inaccurate)
      • File types
    • Metadata, XML tags or Catalog Fields
      • Keywords and descriptions
      • Size, materials, color, etc.
  • Advanced Search Page
  • Commerce Search Issues
    • Product records tend to lack detail
    • Index all text fields
      • Product name, description, color, material
    • Terms need synonyms
    • Pictures in results are very useful
    • Better to find too much than too little
  • Relevance Ranking
    • Real Relevance
      • How well a page answers the underlying question
    • Search Relevance
      • How well the words on a page match the query
    • Algorithms vary: test with real data
    • Use search engine weighting options
    • Recommendations for special searches
  • Results Pages
    • Conform to web conventions
    • Provide site context
      • use site colors and images
      • include site navigation options
    • Show search metadata
      • Search box with options
      • Number of results
      • Location with results list
    • Language localization
  • Results Items - Basic
    • For each page or record
      • Title or product name
      • Link and possibly URL
    • Optional
      • Size
      • Date modified
      • File Format
    • Hit highlighting
      • Emphasize match terms
  • Results Items - Description
    • Page Description
      • Meta description tag
      • Properties description field
    • Text Extract
      • Top of page (gets navigation)
      • Or first <P> tag
    • Context
      • Snippet shows text around word match
  • Results Page - Commerce
  • Results Page Commerce with Extras
    • Don’t just search products
      • People look for general information and return policies.
  • Results Page - Not Enough Info
  • Results Page - Too Much Info
  • Too Many Results
    • Common when searching large sites
    • Track common searches, show recommended pages
    • Sort phrase & all matches at the top
    • Display matched terms in context
    • Allow search in zones
    • Consider clustering results in categories
  • Clustered Results
  • Why Searches Fail
    • Common reasons
      • topic out of scope for site
      • vocabulary mismatch ( car vs. auto )
      • misspellings or typos
      • complex search requirements not met
      • search syntax error
      • server errors (should be rare!)
    • Provide good no-matches pages
      • Include site context & navigation
  • Zero Matches Page Bad Example
  • Zero MatchesPage Good Example
  • How Search Engines Rank Web Pages ?
    • LOCATION WISE
    • FREQUENCY WISE
  • Search Toolbars & Utilities
    • Search Toolbars From Major Search Engines::
    • AltaVista Toolbar:
    • Provides access to AltaVista web, news and
    • multimedia search, page translation, term
    • highlighting and pop-up blocking.
    •    Goggle Deskbar:
    Provides the ability to search with Google from the taskbar within Windows. In other words, you can search without having to be in your browser.
    •    Ask Jeeves Toolbar :
    In addition to searching Ask.com, the Jeeves toolbar lets you limit your search to news, dictionary, stock market, weather, events, maps, and the Ask Jeeves Kids web sites etc….
  • Share of visits
  • Google Strengths : * Size and scope: It is now the largest, and includes PDF, DOC, PS, and many other file types * Relevance based on sites' linkages and authority * Cached archive of Web pages as they looked were indexed * Additional databases: Google Groups, News, Directory, etc.
  • * Limited search features: no nesting, no truncation, does not support full Boolean * Link searches must be exact and are incomplete * Only indexes first 101 KB of a Web page and about 120 KB of PDFs * May search for plural/singular, synonyms, and grammatical variants without telling you Weaknesses :
  • Default Operation : Multiple search terms are processed as an AND operation by default Boolean Searching : Google always searches for pages containing all the words in Your query so need not use a + sign in front of the words. Proximity Searching : Google also detects phrase matches even when the quotes are not Used and usually ranks phrases matches higher.
  • Truncation: No truncation is available.Some automatic plural and word stemming occurs for English words and can be turned off by using a + sign in front of each term.You can use wildcards within phrases. For example:TO FIND “ A little neglect may breed mischief” When you are not sure of the 2 nd last word I.e ‘breed’ Search like this: &quot;a little neglect may * mischief&quot;. Case Sensitivity : No Using either lower or upper case results in the same hits.
  • THANK YOU