Working of search engine

Working Of “Search
Engine”
Nikhil
D-1
14BTCSERS033
Maths Assignment

What is Search Engine ?
“A web search engine is a software system that
is designed to search for information on the
World Wide Web.”

Purpose of Search Engines
Helping people find what they’re looking for:
• Starts with an “information need”
• Convert to a query
• Gets results

Types of Search Engines
• Search by Keywords
(e.g.AltaVista,Google)
• Search by categories
(e.g. Yahoo)

The Parts of a Search Engine
Spider (or “crawler”)
Index
Search software (an algorithm)

The “spider” or “crawler”
The spider visits a web page, reads it, and
then follows links to other pages within the
site. This is what it means when someone
refers to a site being "spidered" or
"crawled". This is also known as
“harvesting”. The spider returns to the site
on a regular basis, such as every month or
two, to look for changes.

The Indexer
Everything the spider finds goes
into the second part of a search
engine, the index. The index,
sometimes called the catalog, is like
a giant book containing a copy of
every web page that the spider
finds. If a web page changes, then
this book is updated new
information.

Search engine software
It is the third part of a search
engine. This is the program that
sifts through the millions of pages
recorded in the index to find
matches to a search and rank them
in order of what it believes is most
relevant.

Variations of the tf–idf weighting
scheme are often used by search
engines as a central tool in scoring and
ranking a document's relevance given a
user query.
Term Frequency–Inverse Document
Frequency, is a numerical statistic that is
intended to reflect how important a
word is to a document in a collection.
TF-IDF Ranking Algorithm
wij = weight of Term Tj in Document Di
tfij = frequency of Term Tj in Document Dj
N = number of Documents in collection
n = number of Documents where term Tj occurs at least once

• The equation:
PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))
• Used by WebQuery and Google
• Google simulates users using the search engine to
rank documents.
• Google uses citation graph (518 million links)
• Google computes 26 million in a few hours.
PageRank

PageRank works by counting
the number and quality of
links to a page to determine a
rough estimate of how
important the website is. The
underlying assumption is that
more important websites are
likely to receive more links
from other websites

The End
Thank you for listening patiently.

Working of search engine

More Related Content

What's hot

Viewers also liked

Similar to Working of search engine

More from Nikhil Deswal

Recently uploaded

Working of search engine