Google File System

THE GOOGLE FILE SYSTEM
By Sanjay Ghemawat, Howard Gobioff, and
Shun-Tak Leung
1

INTRODUCTION
• Google
• Applications process lots of data
• Need good file system
• Solution : Google File System
Large, distributed, highly fault tolerant file system.
2

DESIGN MOTIVATIONS
1. Fault-tolerance and auto-recovery need to be built
into the system.
2. Standard I/O assumptions (e.g. block size) have
to be re-examined.
3. Record appends are the prevalent form of
writing.
4. Google applications and GFS should be co-
designed.
3

INTERFACE
 Create
 Delete
 Open
 Close
 Read
 Write
 Snapshot
 Record Append
4

GFS ARCHITECTURE
On a single-machine FS:
 An upper layer maintains the metadata.
 A lower layer (i.e. disk) stores the data in units
called “blocks”.
In the GFS:
 A master process maintains the metadata.
A lower layer (i.e. a set of chunk servers) stores the
data in units called “chunks”.
5

CHUNK
 Analogous to block, except larger.
 Size: 64 MB
 Stored on chunk server as file
 Chunk handle ( chunk file name) is used to
reference chunk.
 Replicated across multiple chunk servers
7

CHUNK SIZE
• Advantages
o Reduce client-master interaction
o Reduce the size of the metadata
• Disadvantages
o Hot Spots
Solution:
Higher replication factor
8

MASTER
 Single master is centralized
 Stores all metadata:
o File namespace
o File to chunk mappings
o Chunk location information
9

System Interactions
Current lease holder?
identity of primary
location of replicas
(cached by client)
3a. data
3b. data
3c. data
Write request
Primary assign mutations
Applies it
Forward write request
Operation completed
Operation completed
Operation completed
or Error report
11

SYSTEM INTERACTIONS
 Record appends
- Client specifies only data
 Snapshot
-Makes a copy of a file or a directory
tree
12

OPERATION LOG
 Historical record of critical metadata changes
 Defines the order of concurrent operations
 Critical
 Replicated on multiple remote machines
 Respond to client only when log locally and remotely
 Fast recovery by using checkpoints
 Use a compact B-tree like form directly mapping into
memory
 Switch to a new log, Create new checkpoints in a
separate threads
13

MASTER OPERATIONS
 Namespace Management and Locking
 Chunk Creation
 Chunk Re-replication
 Chunk Rebalancing
 Garbage Collection
14

FAULT TOLERANCE AND DIAGNOSIS
1.High Availability
They keep the overall system highly
available with two simple yet effective
strategies.
Fast Recovery and replication
15

1.1 Fast Recovery : Master and chunk
servers are designed to restart and restore
states in a few seconds.
1.2 Chunk Replication : Across multiple
machines, across multiple racks.
16

1.3 Master Replication:
 Log of all changes made to
metadata.
 Log replicated on multiple
machines.
 “Shadow” masters for reading
data if “real” master is down.
17

2. Data Integrity
Each chunk has an associated checksum.
3. Diagnostic Logging
Logging is maintained for keeping the details
of interactions between machines. (exact
request and responses sent on the wire
except data being transferred.)
19

MEASUREMENTS
They measured performance on a GFS
cluster consisting one master, two master
replicas, 16 chunk servers and 16 clients.
20

All machines are configured with
1.Dual 1.4 GHz PIII processors
2. 2 GB memory
3. Two 80 GB 5400 rpm disks
4. 100 Mbps full duplex
Ethernet connection to an HP 2524
switch.
21

Here also rate will drop when the number of clients
increases up to 16 , append rate drops due to
congestion and variance in network transfer rates
seen by different clients.
24

REAL WORLD CLUSTERS
Table 1-Characteristics of two GFS clusters
25

Table 2 –Performance Metrics for A and B clusters
26

RESULTS
1.Read and Write Rates
• Average write rate was 30 MB/s.
• When the measurements were taken B
was in a middle of a write.
• Read rates were high, both clusters
were in the middle of a heavy read
activity.
• A is using resources efficiently than B.
27

2. Master Loads
Master can easily keep up with 200 to 500
operations per second.
28

3. Recovery Time.
• Killed a single chunk server
( 15, 000 chunks containing 600 GB of
data) in cluster B.
•All chunks were replicated in 23.2
minutes at an effective replication rate
of 440 MB/s.
29

Killed two chunk servers (16 000 chunks
and 660 GB of data).
Failure reduced 266 chunks to having a
single replica.
30

These 266 chunks were cloned at a
higher priority and all restored within 2
minutes.
Putting the cluster in a state where it
could tolerate another chunk server
failure
31

WORKLOAD BREAKDOWN
Cluster X and Y are used to represent
breakdown of the workloads on two
GFS. Cluster X is for research and
development while Y is for production
data processing.
32

Operations Breakdown by Size
Table 3 – Operation Breakdown by Size (%)
33

Bytes transferred breakdown by operation size
Table 4 – Bytes Transferred Breakdown by Operation
Size(%) 34

Master Requests Breakdown by Type (%)
Table 5 : Master request Breakdown by Type (%)
35

CONCLUSIONS
• GFS demonstrates the qualities essential for
supporting large scale data processing
workloads on commodity hardware.
• It provides fault tolerance by constant
monitoring, replicating crucial data and fast,
automatic recovery.
• It delivers high aggregate throughput to many
concurrent readers and writers by separating file
system control from data transfer. 36

Google File System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Google File System

Similar to Google File System (20)

Recently uploaded

Recently uploaded (20)

Google File System

Editor's Notes