Search engine
Upcoming SlideShare
Loading in...5
×
 

Search engine

on

  • 188 views

Presentation on "How the Search Engine Works"

Presentation on "How the Search Engine Works"

Statistics

Views

Total Views
188
Slideshare-icon Views on SlideShare
188
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Search engine Search engine Presentation Transcript

    • Assignment Topic-How Search Engine works? Submitted to- Al Imtiaz (Lecturer) (CSE & IT) Prepared by-Sheikh Mohammad Shahnoor Dept-IT ID-12410224
    • Purpose of Search Engines  Helping people find what they’re looking for – – –  Starts with an “information need” Convert to a query Gets results In the materials available – – – Web pages Other formats Deep Web
    • Search Will Never Be Perfect • Search engines can’t read minds – User queries are short and ambiguous • Some things will help – – – – – – Design a usable interface Show match words in context Keep index current and complete Adjust heuristic weighting Maintain suggestions and synonyms Consider faceted metadata search
    • Search is Not a Panacea  Search can’t find what’s not there – The content is hugely important  Information Architecture is vital  Usable sites have good navigation and structure
    • Names of some popular search Engine  Google-  Bing-  Yahoo-  Ask-  Aol.-  Mywebsearch-
    • Search Looks Simple
    • But It's Not Index ahead of time – Find files or records – Open each one and read it – Store each word in a searchable index Provide search forms – Match the query terms with words in the index – Sort documents by relevance Display results
    • Search is Mostly Invisible Like an iceberg, 2/3 below water user interface content search functionality
    • Text Search vs. Database Query        Text search works for structured content Keyword search vs. SQL queries Approximate vs. exact match Multiple sources of content Response time and database resources Relevance ranking, very important Works in the real world (e.g. EBay)
    • Search is Only as Good as the Content • Users blame the search engine – Even when the content is unavailable • Understand the scope of site or intranet – – – – – – Kinds of information Divided sites: products / corporate info Dates Languages Sources and data silos: CMSs, databases... Update processes
    • Making a Searchable Index  Store text to search it later  Many ways to gather text – – – – – Crawl (spider) via HTTP Read files on file servers Access databases (HTTP or API) Data silos via local APIs Applications, CMSs, via Web Services  Security and Access Control
    • Robot Indexing Diagram
    • What the Index Needs ?  Basic information for document or record – – –   File name / URL / record ID Title or equivalent Size, date, MIME type – – – Product name, picture ID Category, topic, or subject Other attributes, for relevance ranking and display Full text of item More metadata
    • Simple Index Diagram
    • Search Query Processing  What happens after you click the search button, and before retrieval starts.  Usually in this order Handle character set, maybe language Look for operators and organize the query Look for field names or metadata Extract words (just like the indexer) Deal with letter casing
    • Search and Retrieval  Retrieval: find files with query terms  Not the same as relevance ranking  Recall: find all relevant items  Precision: find only relevant items  Increasing one decreases the other
    • Retrieval = Matching  Single-word queries   Find items containing that word Multi-word queries: combine lists Any: every item with any query word  All: only items with every word  Phrases: find only items with all words in order   Boolean and complex queries  Use algorithm to combine lists
    • Why Searches Fail  Empty search  Nothing on the site on that topic (scope)  Misspelling or typing mistakes  Vocabulary differences  Restrictive search defaults  Restrictive search choices  Software failure
    • Relevance Ranking     Theory: sort the matching items, so the most relevant ones appear first Can't really know what the user wants Relevance is hard to define and situational Short queries tend to be deeply ambiguous    What do people mean when they type “bank”? First 10 results are the most important The more transparent, the better
    • Relevance Processing Sorting documents on various criteria  Start with words matching query terms  Citation and link analysis  – – – Like old library Citation Indexes Ted Nelson - not only hypertext, but the links Google PageRank  Incoming links  Authority of linkers  Taxonomies and external metadata
    • Search Results Interface  What users see after they click the Search button  The most visible part of search  Elements of the results page     Page layout and navigation Results header List of results items Results footer
    • Many Experiments in Interface
    • Back to Simplicity
    • Search Suggestions (aka Best Bets)  Human judgment beats algorithms  Great for frequent, ambiguous searches – Use search log to identify best candidates  Recommend good starting pages  Product information, FAQs, etc.  Requires human resources – That means money and time  More static than algorithmic search
    • Salon.com Results