Information Retrieval Techniques of Google

PRESENTATION
ON
GOOGLE
PRESENTEDBY
Sehrish
Akram
2

Google, the leading search engine worldwide
Founded in 1998 by Stanford University graduate
students Larry Page and Sergei Brin.
4

SEARCHING TECHNIQUES
Google search engine uses these techniques:
”It is a full-text searching engine”
When we do a Google search actually, we are
searching GOOGLE’s index of the web.
We do this by software program called
“spiders”.
7

Spiders start fetching a few web pages and then
they follow the link and fetch the pages they
point to.
CASE FOLDING technique
Normalized technique e.g.
U.S.A …USA.
8

Case sensitive technique is not also used in
Google if the user search for seven , SEVEN,
Seven or even 7 u get the same results.
Singular is different from plural searches for
apple or apples turn up different pages.
The orders of words matters: Google considers
the first word most important ,the second word
next and so on.
Google ignores most little words including “I”
“an” “ how” “the” “of” “AN”. 9

Google search word limit is 32.
 Wildcards searching generally places the symbol
"*" after a word.
 It tells the database to look for variations of that
word.
For Example: Investigation* Might pull sites
with words such as investigation, investigator,
and investigative.
10

INFORMATION RETRIEVAL AND THE WEB
What We Do
Google WANTED TO organize the web into
something searchable. Their early prototype was
based upon a few basic principles, including:
The best pages tend to be the ones that people
linked to the most.
The best description of a page is often derived
from the anchor text associated with the links to a
page. 11

DOCUMENT ACQUISITION AND STORAGE:
Google searches more than 3 billion Web documents,
which includes Web pages, images and Usenet
postings.
Google uses a standalone Web crawler, distributed
trough several machines, to create indexes and copies
of the document.
Besides standard .html files, Google also indexes
other file type including
________
_________
__________
__________
13

DOCUMENT ACQUISITION AND STORAGE:
A copy of each crawled page is stored in
Google’s repository.
Indexes are created using stored words, pointing
to an inverted index file
14

QUERY INTRODUCTION AND USER
OPTIONS:
Since it’s foundation, Google has been steadily
introducing new features.
Google uses Boolean search without nested
expressions support and with some variations.
By default, it automatically uses AND operator
between terms, the minus symbol can be used to
perform a NOT function and the OR operation is
supported (using OR in upper case).
15

Google does not uses stemming, nor truncation,
but allows the use of ‘*’ as a wildcard in the
middle of a phrase. For example, searching for
“Search Engine” wields quite different result
from “Search * Engine”.
Query Introduction and user Options:
16

RESULTS SELECTION AND PRESENTATION
To select which document is presented, Google
combines a document’s Page Rank value, anchor
text and proximity
Results are clustered by server with two visible
results and a link to “More results from server”.
17

RESULTS SELECTION AND PRESENTATION
Google helps users by
correcting misspelled words
in their search queries using,
not a predetermined
dictionary, but it’s own index
of the entire web.
Google visual interface is
one of the simplest and,
according to many, one of the
reasons to Google’s success,
“it’s simple and it works”. 18

LOGICAL DIAGRAM
Web Crawling, Extraction, and Indexing 19

Information Retrieval Techniques of Google

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Information Retrieval Techniques of Google

Similar to Information Retrieval Techniques of Google (20)

Recently uploaded

Recently uploaded (20)

Information Retrieval Techniques of Google