Introduction to Search Engines


Published on

Gives a brief introduction on how a search engine works

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to Search Engines

  1. 1. Describes how a basic search engine works.<br />How a Search Engine Works<br />Reehaz Soobhany (0920302)<br />Strategic e-Marketing<br />University of Mauritius 2010<br />
  2. 2. Search Engines Introduction<br />Everyone who uses the internet today surely uses a search engine.<br />Several types of search engines<br />Crawler Based (Google, Yahoo)<br />Human Directories (Open Directory, Yahoo!Directory)<br />Hybrid<br />Meta Search Engine (<br />
  3. 3. Crawler Based Search Engine<br />Core Operations:<br />Web Crawling (aka the spider) – follows every link in a page recursively and downloads the page<br />Indexing – Creates the inverted file<br />Searching – Searches through the inverted (indexed file according to the query of the user<br />
  4. 4. Indexing<br />Normalize Documents<br />Deletes stop words<br />Stem words<br />Create index entries<br />Calculate weights<br />Updates inverted file<br />
  5. 5. Document Normalization<br /><H1><br />This is a Heading Level One<br /></H1><br />Case Folding<br /><h1><br />this is a heading level one<br /></h1><br />Extract Core document text from file<br />this is a heading level one<br />
  6. 6. Delete Stop Words<br />Stop words are words which do not have little value is finding a relevant document. Example of stop words are :<br />A, are, is, when, how…<br />Helps save resources and also not create to big and irrelevant indexes<br />heading level one<br />
  7. 7. Word Stemming & Index Entries<br />Word stemming removes the suffixes from words<br />Add efficiency to the index file<br />Also match the meaning rather than the exact word<br />inflectional suffixes (-s, -es, -ed)<br />derivational suffixes (-ing, -able, -aciousness, -ability)<br />headlevelone<br />
  8. 8. Calculate Weights<br />Usually a secret algorithm of the search engine<br />Some typical scheme used:<br />Placement in a document (a word in a heading level 1 will have a greater weight than one at heading level 2 or a normal text)<br />The number of other documents which refers to this document<br />If by authoritative writing<br />
  9. 9. Creates or Update the Inverted File<br />
  10. 10. Query Processor<br />When the user type a query in the search engine, the search engine recognises the terms and operators<br />Runs the query against the inverted file<br />Ranks the result. Again the secret algorithm of the search engine. Uses the weights on each word<br />Return the results to the user.<br />Voila <br />
  11. 11. Thank You<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.