How Google search engine algorithm works 
Prepared by:- Viral Shah (120570107014) 
Guided by :- Prof. Sahista Machhar, MEFGI
It is a program that 
searches for and 
identifies items in a 
database that 
correspond to 
keywords or 
characters specified 
by the user, used 
especially for finding 
particular sites on the 
World Wide Web.
 There are 759 Million websites on the Web & 
60 Trillion webpages of this websites. 
 AND IT’S CONSTANTLY GROWING !!!!!
 GOOGLE navigates WEB by 
crawling. 
 To find information on the 
hundreds of millions of Web 
pages that exist, a search 
engine employs special 
software robots, called 
SPIDERS, to build lists of the 
words found on Web sites. 
When a spider is building its 
lists, the process is called 
Web crawling.
 The usual starting points are lists of heavily 
used servers and very popular pages. The 
spider will begin with a popular site, indexing 
the words on its pages and following every 
link found within the site. In this way, the 
spidering system quickly begins to travel, 
spreading out across the most widely used 
portions of the Web.
 When the Google spider looked at an HTML page, it took note of 
following things:- 
Words occurring in the title, subtitles, meta tags and other 
positions of relative importance were noted for special consideration 
during a subsequent user search. The Google spider was built to index 
every significant word on a page, leaving out the articles “a”, “an” and 
"the”. Other spiders take different approaches. 
 For example, some spiders will keep track of the words in the title, 
sub-headings and links, along with the 100 most frequently used 
words on the page and each word in the first 20 lines of text. Lycos is 
said to use this approach to spidering the Web. 
 GOOGLE built their initial system to use multiple spiders, usually three 
at one time. Each spider could keep about 300 connections to Web 
pages open at a time.
 Google’s spider name is Googlebot. 
 Googlebot is the search bot software used 
by Google, which collects documents from 
the web to build a searchable index for 
the Google Search engine.
 By following the web-pages, INDEX is 
prepared. The index includes text from 
millions of books from several libraries and 
other partners. 
 That means GOOGLE follow links from page 
to page. Also they sort pages by their content 
and other factors.
 These all activities Google carry out is tracked 
in the INDEX. Google continuously updates 
index and it is stored over large servers. 
 Currently, Google’s Index size is over 100 
million Gigabyte.
 Site owners choose whether their sites are 
crawled. 
 To prevent most search engine web 
crawlers from indexing a page on your site, place 
the following meta tag into the<head> section of 
your page: 
<meta name="robots" content="noindex"> 
 To prevent only Google web crawlers from 
indexing a page: 
<meta name="googlebot" content="noindex">
1) AUTOCOMPLETE 
Predicts what you might be searching for. 
This includes understanding terms with more 
than one meaning. 
2) SYNONYMS 
Recognizes words with similar meanings.
3) QUERY UNDERSTANDING 
Gets to the deeper meaning of the words 
you type. 
4) GOOGLE INSTANT 
Displays immediate results as you type. 
5) SPELLING 
Identifies and corrects possible spelling 
errors and provides alternatives.
 Based on all the above factors, Google picks 
some web-pages from the index. 
 Then, Google ranks the result on various 
factors. 
 1) Site & Page Quality:- 
It is checked by how you are writing 
key-words.
2) Freshness:- 
How much fresh the content is & at how 
much regular interval it is updated !! 
3) Safe-Search:- 
Google tries to find out how much it is safe 
and doesn’t contains spams. 
Along with these, there are 200+ factors used 
by Google to rank any particular webs-page.
 After all these operations, you will get the 
desired result and these all happens in one 
nano-second !!!
 Google fights with spam every second to give 
true & relevant result. 
 The majority of spam removal is 
automatic. Google examine other 
questionable documents by hand. If Google 
find spam, they take manual action.
1) PURE SPAM 
Site appears to use aggressive spam 
techniques such as automatically generated 
gibberish, cloaking, scraping content from 
other websites, and/or repeated or egregious 
violations of Google's Webmaster Guidelines. 
2) HIDDEN TEXT AND/OR KEYWORD STUFFING 
Some of the pages may contain hidden 
text and/or keyword stuffing.
3) USER-GENERATED SPAM 
Site appears to contain spammy user-generated 
content. The problematic content 
may appear on forum pages, guestbook pages, 
or user profiles. 
4) PARKED DOMAINS 
Parked domains are placeholder sites with little 
unique content, so Google doesn't typically 
include them in search results.
5) THIN CONTENT WITH LITTLE OR 
NO ADDED VALUE 
Site appears to consist of low-quality or shallow pages 
which do not provide users with much added value 
(such as thin affiliate pages, doorway pages, cookie-cutter 
sites, automatically generated content, or copied 
content). 
6) UNNATURAL LINKS TO A SITE 
Google has detected a pattern of unnatural artificial, 
deceptive or manipulative links pointing to the site. 
These may be the result of buying links that pass 
PageRank or participating in link schemes.
 Besides these all there are thousands other 
factors Google uses to detect Spam and 
decides the page-rank of web-page 
accordingly which is constantly updated and 
finally Google only keeps trusted documents 
in index.
 And the point of Interest is that to make 
presentation on google, I used
 Behind your simple page of results is a 
complex system, carefully crafted and 
tested, to support more than one-hundred 
billion searches each month !!!! 
How Google Search Engine Algorithm Works ??

How Google Search Engine Algorithm Works ??

  • 1.
    How Google searchengine algorithm works Prepared by:- Viral Shah (120570107014) Guided by :- Prof. Sahista Machhar, MEFGI
  • 2.
    It is aprogram that searches for and identifies items in a database that correspond to keywords or characters specified by the user, used especially for finding particular sites on the World Wide Web.
  • 3.
     There are759 Million websites on the Web & 60 Trillion webpages of this websites.  AND IT’S CONSTANTLY GROWING !!!!!
  • 4.
     GOOGLE navigatesWEB by crawling.  To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called SPIDERS, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling.
  • 5.
     The usualstarting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.
  • 6.
     When theGoogle spider looked at an HTML page, it took note of following things:- Words occurring in the title, subtitles, meta tags and other positions of relative importance were noted for special consideration during a subsequent user search. The Google spider was built to index every significant word on a page, leaving out the articles “a”, “an” and "the”. Other spiders take different approaches.  For example, some spiders will keep track of the words in the title, sub-headings and links, along with the 100 most frequently used words on the page and each word in the first 20 lines of text. Lycos is said to use this approach to spidering the Web.  GOOGLE built their initial system to use multiple spiders, usually three at one time. Each spider could keep about 300 connections to Web pages open at a time.
  • 7.
     Google’s spidername is Googlebot.  Googlebot is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google Search engine.
  • 8.
     By followingthe web-pages, INDEX is prepared. The index includes text from millions of books from several libraries and other partners.  That means GOOGLE follow links from page to page. Also they sort pages by their content and other factors.
  • 9.
     These allactivities Google carry out is tracked in the INDEX. Google continuously updates index and it is stored over large servers.  Currently, Google’s Index size is over 100 million Gigabyte.
  • 10.
     Site ownerschoose whether their sites are crawled.  To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the<head> section of your page: <meta name="robots" content="noindex">  To prevent only Google web crawlers from indexing a page: <meta name="googlebot" content="noindex">
  • 11.
    1) AUTOCOMPLETE Predictswhat you might be searching for. This includes understanding terms with more than one meaning. 2) SYNONYMS Recognizes words with similar meanings.
  • 12.
    3) QUERY UNDERSTANDING Gets to the deeper meaning of the words you type. 4) GOOGLE INSTANT Displays immediate results as you type. 5) SPELLING Identifies and corrects possible spelling errors and provides alternatives.
  • 13.
     Based onall the above factors, Google picks some web-pages from the index.  Then, Google ranks the result on various factors.  1) Site & Page Quality:- It is checked by how you are writing key-words.
  • 14.
    2) Freshness:- Howmuch fresh the content is & at how much regular interval it is updated !! 3) Safe-Search:- Google tries to find out how much it is safe and doesn’t contains spams. Along with these, there are 200+ factors used by Google to rank any particular webs-page.
  • 15.
     After allthese operations, you will get the desired result and these all happens in one nano-second !!!
  • 16.
     Google fightswith spam every second to give true & relevant result.  The majority of spam removal is automatic. Google examine other questionable documents by hand. If Google find spam, they take manual action.
  • 17.
    1) PURE SPAM Site appears to use aggressive spam techniques such as automatically generated gibberish, cloaking, scraping content from other websites, and/or repeated or egregious violations of Google's Webmaster Guidelines. 2) HIDDEN TEXT AND/OR KEYWORD STUFFING Some of the pages may contain hidden text and/or keyword stuffing.
  • 18.
    3) USER-GENERATED SPAM Site appears to contain spammy user-generated content. The problematic content may appear on forum pages, guestbook pages, or user profiles. 4) PARKED DOMAINS Parked domains are placeholder sites with little unique content, so Google doesn't typically include them in search results.
  • 19.
    5) THIN CONTENTWITH LITTLE OR NO ADDED VALUE Site appears to consist of low-quality or shallow pages which do not provide users with much added value (such as thin affiliate pages, doorway pages, cookie-cutter sites, automatically generated content, or copied content). 6) UNNATURAL LINKS TO A SITE Google has detected a pattern of unnatural artificial, deceptive or manipulative links pointing to the site. These may be the result of buying links that pass PageRank or participating in link schemes.
  • 20.
     Besides theseall there are thousands other factors Google uses to detect Spam and decides the page-rank of web-page accordingly which is constantly updated and finally Google only keeps trusted documents in index.
  • 21.
     And thepoint of Interest is that to make presentation on google, I used
  • 22.
     Behind yoursimple page of results is a complex system, carefully crafted and tested, to support more than one-hundred billion searches each month !!!! 