How Google Search Engine Algorithm Works ??

How Google search engine algorithm works
Prepared by:- Viral Shah (120570107014)
Guided by :- Prof. Sahista Machhar, MEFGI

It is a program that
searches for and
identifies items in a
database that
correspond to
keywords or
characters specified
by the user, used
especially for finding
particular sites on the
World Wide Web.

 There are 759 Million websites on the Web &
60 Trillion webpages of this websites.
 AND IT’S CONSTANTLY GROWING !!!!!

 GOOGLE navigates WEB by
crawling.
 To find information on the
hundreds of millions of Web
pages that exist, a search
engine employs special
software robots, called
SPIDERS, to build lists of the
words found on Web sites.
When a spider is building its
lists, the process is called
Web crawling.

 The usual starting points are lists of heavily
used servers and very popular pages. The
spider will begin with a popular site, indexing
the words on its pages and following every
link found within the site. In this way, the
spidering system quickly begins to travel,
spreading out across the most widely used
portions of the Web.

 When the Google spider looked at an HTML page, it took note of
following things:-
Words occurring in the title, subtitles, meta tags and other
positions of relative importance were noted for special consideration
during a subsequent user search. The Google spider was built to index
every significant word on a page, leaving out the articles “a”, “an” and
"the”. Other spiders take different approaches.
 For example, some spiders will keep track of the words in the title,
sub-headings and links, along with the 100 most frequently used
words on the page and each word in the first 20 lines of text. Lycos is
said to use this approach to spidering the Web.
 GOOGLE built their initial system to use multiple spiders, usually three
at one time. Each spider could keep about 300 connections to Web
pages open at a time.

 Google’s spider name is Googlebot.
 Googlebot is the search bot software used
by Google, which collects documents from
the web to build a searchable index for
the Google Search engine.

 By following the web-pages, INDEX is
prepared. The index includes text from
millions of books from several libraries and
other partners.
 That means GOOGLE follow links from page
to page. Also they sort pages by their content
and other factors.

 These all activities Google carry out is tracked
in the INDEX. Google continuously updates
index and it is stored over large servers.
 Currently, Google’s Index size is over 100
million Gigabyte.

 Site owners choose whether their sites are
crawled.
 To prevent most search engine web
crawlers from indexing a page on your site, place
the following meta tag into the<head> section of
your page:
<meta name="robots" content="noindex">
 To prevent only Google web crawlers from
indexing a page:
<meta name="googlebot" content="noindex">

1) AUTOCOMPLETE
Predicts what you might be searching for.
This includes understanding terms with more
than one meaning.
2) SYNONYMS
Recognizes words with similar meanings.

3) QUERY UNDERSTANDING
Gets to the deeper meaning of the words
you type.
4) GOOGLE INSTANT
Displays immediate results as you type.
5) SPELLING
Identifies and corrects possible spelling
errors and provides alternatives.

 Based on all the above factors, Google picks
some web-pages from the index.
 Then, Google ranks the result on various
factors.
 1) Site & Page Quality:-
It is checked by how you are writing
key-words.

2) Freshness:-
How much fresh the content is & at how
much regular interval it is updated !!
3) Safe-Search:-
Google tries to find out how much it is safe
and doesn’t contains spams.
Along with these, there are 200+ factors used
by Google to rank any particular webs-page.

 After all these operations, you will get the
desired result and these all happens in one
nano-second !!!

 Google fights with spam every second to give
true & relevant result.
 The majority of spam removal is
automatic. Google examine other
questionable documents by hand. If Google
find spam, they take manual action.

1) PURE SPAM
Site appears to use aggressive spam
techniques such as automatically generated
gibberish, cloaking, scraping content from
other websites, and/or repeated or egregious
violations of Google's Webmaster Guidelines.
2) HIDDEN TEXT AND/OR KEYWORD STUFFING
Some of the pages may contain hidden
text and/or keyword stuffing.

3) USER-GENERATED SPAM
Site appears to contain spammy user-generated
content. The problematic content
may appear on forum pages, guestbook pages,
or user profiles.
4) PARKED DOMAINS
Parked domains are placeholder sites with little
unique content, so Google doesn't typically
include them in search results.

5) THIN CONTENT WITH LITTLE OR
NO ADDED VALUE
Site appears to consist of low-quality or shallow pages
which do not provide users with much added value
(such as thin affiliate pages, doorway pages, cookie-cutter
sites, automatically generated content, or copied
content).
6) UNNATURAL LINKS TO A SITE
Google has detected a pattern of unnatural artificial,
deceptive or manipulative links pointing to the site.
These may be the result of buying links that pass
PageRank or participating in link schemes.

 Besides these all there are thousands other
factors Google uses to detect Spam and
decides the page-rank of web-page
accordingly which is constantly updated and
finally Google only keeps trusted documents
in index.

 And the point of Interest is that to make
presentation on google, I used

 Behind your simple page of results is a
complex system, carefully crafted and
tested, to support more than one-hundred
billion searches each month !!!! 

How Google Search Engine Algorithm Works ??

How Google Search Engine Algorithm Works ??

More Related Content

What's hot

Viewers also liked

Similar to How Google Search Engine Algorithm Works ??

Recently uploaded

How Google Search Engine Algorithm Works ??