• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Notes for
 

Notes for

on

  • 403 views

 

Statistics

Views

Total Views
403
Views on SlideShare
403
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Notes for “Search Engine” Project
    Common Popular Search Engines. www.google.com, www.bing.com, www.yahoo.com, www.ask.com

    Other Search engines Wolfram Alpha, Dog Pile, Swag Bucks

    Crawler-Based Search Engines
    Crawler-based search engines, such as Google, create their listings automatically. They 'crawl' or 'spider' the web, then people search through what they have found.
    If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.
    Human-Powered Directories
    A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.
    Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.
    The Parts Of A Crawler-Based Search Engine
    Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being 'spidered' or 'crawled.' The spider returns to the site on a regular basis, such as every month or two, to look for changes.







    Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.
    Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been 'spidered' but not yet 'indexed.' Until it is indexed -- added to the index -- it is not available to those searching with the search engine.
    Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. You can learn more about how search engine software ranks web pages on the aptly-named How Search Engines Rank Web Pages page.
    All crawler-based search engines have the basic parts described above, but there are differences in how these parts are tuned. That is why the same search on different search engines often produces different results.
    Without search engines it would be very difficult to search the internet to find information. Imagine having to do a research paper on the Great Depression and not being able to get information quickly. You would have to spend literally hours finding different information on the internet to efficiently write the paper. But with search engines it becomes very easy to find information on your topic.
    What is a Search Engine?
    By definition, an Internet search engine is an information retrieval system, which helps us find information on the World Wide Web. World Wide Web is the universe of information where this information is accessible on the network. It facilitates global sharing of information. But WWW is seen as an unstructured database. It is exponentially growing to become enormous store of information. Searching for information on the web is hence a difficult task. There is a need to have a tool to manage, filter and retrieve this oceanic information. A search engine serves this purpose.
    How does a Search Engine Work?
    • Internet search engines are web search engines that search and retrieve information on the web. Most of them use crawler indexer architecture. They depend on their crawler modules. Crawlers also referred to as spiders are small programs that browse the web.
    • Crawlers are given an initial set of URLs whose pages they retrieve. They extract the URLs that appear on the crawled pages and give this information to the crawler control module. The crawler module decides which pages to visit next and gives their URLs back to the crawlers.
    • The topics covered by different search engines vary according to the algorithms they use. Some search engines are programmed to search sites on a particular topic while the crawlers in others may be visiting as many sites as possible.
    • The crawl control module may use the link graph of a previous crawl or may use usage patterns to help in its crawling strategy.
    • The indexer module extracts the words form each page it visits and records its URLs. It results into a large lookup table that gives a list of URLs pointing to pages where each word occurs. The table lists those pages, which were covered in the crawling process.
    • A collection analysis module is another important part of the search engine architecture. It creates a utility index. A utility index may provide access to pages of a given length or pages containing a certain number of pictures on them.
    • During the process of crawling and indexing, a search engine stores the pages it retrieves. They are temporarily stored in a page repository. Search engines maintain a cache of pages they visit so that retrieval of already visited pages expedites.
    • The query module of a search engine receives search requests form users in the form of keywords. The ranking module sorts the results.
    • The crawler indexer architecture has many variants. It is modified in the distributed architecture of a search engine. These search engine architectures consist of gatherers and brokers. Gatherers collect indexing information from web servers while the brokers give the indexing mechanism and the query interface. Brokers update indices on the basis of information received from gatherers and other brokers. They can filter information. Many search engines of today use this type of architecture.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Notes for Notes for Document Transcript

    • Notes for “Search Engine” ProjectCommon Popular Search Engines. www.google.com, www.bing.com, www.yahoo.com,www.ask.comOther Search engines Wolfram Alpha, Dog Pile, Swag BucksCrawler-Based Search EnginesCrawler-based search engines, such as Google, create their listings automatically.They "crawl" or "spider" the web, then people search through what they have found.If you change your web pages, crawler-based search engines eventually find thesechanges, and that can affect how you are listed. Page titles, body copy and otherelements all play a role.Human-Powered DirectoriesA human-powered directory, such as the Open Directory, depends on humans for itslistings. You submit a short description to the directory for your entire site, or editorswrite one for sites they review. A search looks for matches only in the descriptionssubmitted.Changing your web pages has no effect on your listing. Things that are useful forimproving a listing with a search engine have nothing to do with improving a listingin a directory. The only exception is that a good site, with good content, might bemore likely to get reviewed for free than a poor site.The Parts Of A Crawler-Based Search EngineCrawler-based search engines have three major elements. First is the spider, alsocalled the crawler. The spider visits a web page, reads it, and then follows links toother pages within the site. This is what it means when someone refers to a sitebeing "spidered" or "crawled." The spider returns to the site on a regular basis, suchas every month or two, to look for changes.
    • Everything the spider finds goes into the second part of the search engine, the index.The index, sometimes called the catalog, is like a giant book containing a copy ofevery web page that the spider finds. If a web page changes, then this book isupdated with new information.Sometimes it can take a while for new pages or changes that the spider finds to beadded to the index. Thus, a web page may have been "spidered" but not yet"indexed." Until it is indexed -- added to the index -- it is not available to thosesearching with the search engine.Search engine software is the third part of a search engine. This is the program thatsifts through the millions of pages recorded in the index to find matches to a searchand rank them in order of what it believes is most relevant. You can learn moreabout how search engine software ranks web pages on the aptly-named How SearchEngines Rank Web Pages page.All crawler-based search engines have the basic parts described above, but there aredifferences in how these parts are tuned. That is why the same search on differentsearch engines often produces different results.1Without search engines it would be very difficult to search the internet to findinformation. Imagine having to do a research paper on the Great Depression and notbeing able to get information quickly. You would have to spend literally hours findingdifferent information on the internet to efficiently write the paper. But with searchengines it becomes very easy to find information on your topic.1 http://searchenginewatch.com/2168031
    • What is a Search Engine?By definition, an Internet search engine is an information retrieval system, which helps us findinformation on the World Wide Web. World Wide Web is the universe of information where thisinformation is accessible on the network. It facilitates global sharing of information. But WWW isseen as an unstructured database. It is exponentially growing to become enormous store ofinformation. Searching for information on the web is hence a difficult task. There is a need tohave a tool to manage, filter and retrieve this oceanic information. A search engine serves thispurpose.How does a Search Engine Work? • Internet search engines are web search engines that search and retrieve information on the web. Most of them use crawler indexer architecture. They depend on their crawler modules. Crawlers also referred to as spiders are small programs that browse the web. • Crawlers are given an initial set of URLs whose pages they retrieve. They extract the URLs that appear on the crawled pages and give this information to the crawler control module. The crawler module decides which pages to visit next and gives their URLs back to the crawlers. • The topics covered by different search engines vary according to the algorithms they use. Some search engines are programmed to search sites on a particular topic while the crawlers in others may be visiting as many sites as possible. • The crawl control module may use the link graph of a previous crawl or may use usage patterns to help in its crawling strategy. • The indexer module extracts the words form each page it visits and records its URLs. It results into a large lookup table that gives a list of URLs pointing to pages where each word occurs. The table lists those pages, which were covered in the crawling process. • A collection analysis module is another important part of the search engine architecture. It creates a utility index. A utility index may provide access to pages of a given length or pages containing a certain number of pictures on them. • During the process of crawling and indexing, a search engine stores the pages it retrieves. They are temporarily stored in a page repository. Search engines maintain a cache of pages they visit so that retrieval of already visited pages expedites. • The query module of a search engine receives search requests form users in the form of keywords. The ranking module sorts the results. • The crawler indexer architecture has many variants. It is modified in the distributed architecture of a search engine. These search engine architectures consist of gatherers and brokers. Gatherers collect indexing information from web servers while the brokers give the indexing mechanism and the query interface. Brokers update indices on the basis of information received from gatherers and other brokers. They can filter information. Many search engines of today use this type of architecture. 22 http://www.buzzle.com/articles/how-does-a-search-engine-work.html