4. • A web crawler (also known asaweb spider or
web robot) isaprogram or automated script
which browses(crawls) theWorld WideWeb
in amethodical, automated manner.This
processiscalled Web crawling or spidering.
• In short, All websitesarecodesand these
codesareread by spiders.
• So basically aSpider or Spidersaresoftware
programsthat crawl throughout theinternet
and select relativedatato beINDEXED.
5. • When weSearch in asearch engineisactually
searching theindex of that search engineand
not thewholeinternet.
• Spidersstart by fetching few web pagesthen
they follow thelinkson thosepagesand fetch
thepagesthey point to, and follow all thelinks
on thosepagesand fetch thepagesthey link to
and so on.
• Until they'veindexed apretty big chunk of the
web, many billionsof pagesstored across
thousandsof machines.
10. • Thisistheequivalent form of theinformation that our spiders
acquireafter they finish with crawling.
• Random organization and NO STRUCTURE.
• Theinformation availablethroughout theworld wideweb is
in all kindsof shapessizesand FORMATS.
11. • AND This is what indexing does.
• Makes data accessible in a Structured
format, easily accessible through search.
12. Indexer
• Search engine indexing is the process of a
search engine collecting, parses and stores
data for use by the search engine.
• The actual search engine index is the place
where all the data the search engine has
collected is stored.
• It is the search engine index that provides
the results for search queries, and pages
that are stored within the search engine
index that appear on the search engine
results page.
14. • Basically, a search engine algorithm is a set of
rules, or a unique formula, that the search engine
uses to determine the significance or rankings of
a web page, and each search engine has its own
set of rules.
• The algorithms, as they are different for each
search engine, are also closely guarded secrets.
• Search algorithm sorts on the basis of many
things like location of keyword, synonyms,
adjacent words, etc
• But there are certain things that all search engine
algorithms have in common.
17. Relevancy
• This is the First thing every search engine
checks.
• The algorithm will determine whether this web
page has any relevancy at all for the particular
keyword.
• Location of keywords in that page is also
important for the relevancy of that website.
• Web pages that have the keywords in the title,
as well as within the headline or the first few
lines of the text will rank better for that keyword
than websites that do not have these features
18. Individual Factors
• A second part of search engine algorithms are
the individual factors that make that particular
search engine different from every other search
engine out there.
• Each search engine has unique algorithms, and
the individual factors of these algorithms are why
a search query turns up different results on
Google than MSN or Yahoo!.
• One of the most common individual factors is the
number of pages a search engine indexes.
• They may just have more pages indexed, or
index them more frequently, but this can give
different results for each search engine.
• Some search engines also penalize for
spamming, while others do not.
19. Off-Page Factors
• Another part of algorithms that is still individual
to each search engine are off-page factors.
• Off-page factors are such things as click-through
measurement and linking.
• The frequency of click-through rates and linking
can be an indicator of how relevant a web page
is to actual users and visitors, and this can
cause an algorithm to rank the web page higher.
• Off-page factors are harder for web masters to
craft, but can have an enormous effect on page
rank depending on thesearch engine
algorithm.
20.
21. • Search engine algorithms are the mystery
behind search engines, sometimes even
amusingly called the search engine’s
“Secret Sauce”.
• Beyond the basic functions of a search
engine, the relevancy of a web page, the
off-page factors, and the unique factors of
each search engine help make the
algorithms of each engine an important
part of the search engine optimization
design.