Google history nd architecture

By:- Name: Divyangee Jain En no: 090410107015 Class: TY CE-A Batch: A 1

A public company based in Mountain View, California, it provides services such as e-mail, online mapping, office productivity, social networking, video sharing and an open source web browser. 3

Page rank algorithm,[object Object]

It was first incorporated as a privately held company on September 4, 1998, and its initial public offering followed on August 19, 2004. 6

8 Main AIM of GOOGLE: “To organize the world's information and make it universally accessible and useful”

How Google Got the Name Google ? 9

But Later Sergey and Larry decided to name the company number called a “Googol” – which is the number 1 followed by 100 zeroes(10100).10

[object Object],which was picked to signify that the search engine wants to provide large quantities of information for people. 11

Anatomy (Architecture) of google 12

High level archit-ecture of GOOGLE 13

14 Few words to acquaint with:

A Uniform Resource Locator or Universal Resource Locator (URL) is a character string that specifies where a known resource is available on the Internet and the mechanism for retrieving it.,[object Object]

Short for Domain Name System, an Internet service that translates domain names into IP addresses

For example, the domain name www.example.com might translate to 198.105.232.4. ,[object Object]

The document index keeps information about each document.

The information stored in each entry includes the current document status, a pointer into the repository, a document checksum, and various statistics, points to variable file which contains crawled pages’ URL.,[object Object]

Parse:to divide large components into small components that can be analyzed.

PARSER: A program that dissects source code so that it can be translated into object code.,[object Object]

1)CRAWLER ,[object Object], crawling ,(downloading of web pages) is done by several distributed crawlers. ,[object Object],21

Each crawler keeps roughly 300 connections open at once. This is necessary to retrieve web pages at a fast enough pace. ,[object Object]

Each of the hundreds of connections can be in a number of different states: looking up DNS, connecting to host, sending request, and receiving response.,[object Object]

Due to huge amount of data involved,crawler can crash or behave unexpectedly.,[object Object]

Since large complex systems such as crawlers will invariably cause problems, there needs to be significant resources devoted to reading the email and solving these problems as they come up. ,[object Object]

3)Storeserver ,[object Object]

The storeserver then compresses and stores the web pages into a repository.28

The choice of compression technique is a tradeoff between speed and compression ratio.,[object Object]

The compression rate on the repository of zlib is 3 to 1 compression.

In the repository, the documents are stored one after the other and are prefixed by docID, length, and URL as can be seen in Figure.,[object Object]

33 5)INDEXING ,[object Object]

The indexing function is performed by the indexer and the sorter. ,[object Object],[object Object]

Indexing Documents into Barrels

The hits record the word, position in document, an approximation of font size, and capitalization. 36

37 ,[object Object],[object Object]

This file contains enough information to determine where each link points from and to, and the text of the link. 38

[object Object], into absolute URLs and in turn into docIDs. ,[object Object],into the forward index, associated with the docID that the anchor points to. 39 Part 3:

The links database is used to compute PageRanks for all the documents,[object Object]

This is done in place so that little temporary space is needed for this operation. 41

Google history nd architecture

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Google history nd architecture

Similar to Google history nd architecture (20)

Recently uploaded

Recently uploaded (20)

Google history nd architecture