Google File System - GFS Presentation Slides PPT

The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
1

WASHINGTON
STATE
UNIVERSITY
PREVIOUS FILE SYSTEMS
2

WASHINGTON
STATE
UNIVERSITY
PROBLEM EXISTING
3
❖ Frequent Failures of Nodes
❖ Files are huge – multi-GB
❖ Access Patterns
➢ Most of them were sequential Read/Write/append
❖ Scalability
➢ Concurrent Writers
Fig: Andrew File System Architecture

WASHINGTON
STATE
UNIVERSITY
ASSUMPTIONS
5
❖ Designed using cheap hardwares
➢ needs constant monitor, detect, tolerate & recover regularly as failure
is obvious
❖ Files are huge ( +100MB to GB)
❖ Workloads:
➢ Large Streaming Reads: Read only(1 MB+)
➢ Small Random Reads: unmodified after written (KB)
➢ includes many writes than append data to files
❖Throughput is more valued than individual request latency

WASHINGTON
STATE
UNIVERSITY
INTERFACE
6
❖ Doesn’t support Portable Operating System Interface (POSIX)
➢ supports typical file system operations: create, delete, open, close, read, and
write
❖ Supports Snapshot
➢ creates a copy of a file or a directory tree at low cost
➢ Duplicate metadata
❖ Supports Record append
➢ allows multiple clients to append data to the same file concurrently

WASHINGTON
STATE
UNIVERSITY
7
TERMINOLOGY ALERT !!

WASHINGTON
STATE
UNIVERSITY
CHUNK
8
❖ Files are divided into fixed size blocks called chunk
❖ 64 MB; greater than typical file system block size
❖ Each chunk is replicated 3 or more times
❖ Each chunk is identified by 64-bit chunk handle

WASHINGTON
STATE
UNIVERSITY
META DATA
9
❖ Three major types of metadata
➢ The file and chunk namespaces
➢ The mapping from files to chunks
➢ Locations of each chunk’s replicas
❖ All the metadata is kept in the Master’s memory
❖ 64MB chunk has 64 bytes of metadata
❖ Chunk Location are updated on every restart & heartbeat message
❖ Operation log contains a historical record of critical metadata changes.

WASHINGTON
STATE
UNIVERSITY
10
GFS ARCHITECTURE

WASHINGTON
STATE
UNIVERSITY
ARCHITECTURE
11
11

WASHINGTON
STATE
UNIVERSITY
GFS MASTER & CLIENT
12
❖ Single Master : maintains all file system metadata
➢ Namespaces, Access Control, mappings from files to chunks, and current
locations of chunks
❖ Clients never read and write file data through the master (Bottleneck)
➢ asks the master which chunk servers it should contact
❖ Communicates with each chunkserver in HeartBeat messages
➢ Is chunkserver up or down?
➢ Are there any disk failures on chunkserver?
➢ Are any replicas corrupted?

WASHINGTON
STATE
UNIVERSITY
CHUNK SERVER
13
❖ Stores Chunk on local disks
as Linux file
❖ Read/Write Chunk data if
requested by client

WASHINGTON
STATE
UNIVERSITY
14
Step
1

WASHINGTON
STATE
UNIVERSITY
15
Step
2

WASHINGTON
STATE
UNIVERSITY
16
Step
3

WASHINGTON
STATE
UNIVERSITY
17
Step
4

WASHINGTON
STATE
UNIVERSITY
18

WASHINGTON
STATE
UNIVERSITY
CONSISTENCY MODEL
19
❖ Consistent : all clients will always see the same data (even from
different replicas)
❖ Defined : same as consistent; also clients will always see what the
mutation has written

WASHINGTON
STATE
UNIVERSITY
20
SYSTEM INTERACTION

WASHINGTON
STATE
UNIVERSITY
Leases and Mutation Order
21
❖ Mutation : operation that changes the contents or metadata of a chunk
❖ Lease : to maintain a consistent mutation order across replicas
❖ If the master receives a modification operation for a particular chunk
➢ Master finds the chunk servers that have the chunk and grants a chunk lease to
one of them (primary)
➢ The primary determines the serialization order for all of the chunk’s modifications,
and the secondaries follow that order
➢ Leases timeout at 60 seconds.(also possible to extend the timeout)

WASHINGTON
STATE
UNIVERSITY
22
Fig: Data Flow & Control Flow

WASHINGTON
STATE
UNIVERSITY
23
asks which chunkserver holds the current lease for the chunk and the
locations of the other replicas
Step
1

WASHINGTON
STATE
UNIVERSITY
24
● Master replies with the identity of the primary and the locations of the other
(secondary) replicas if available
● client caches this data for future mutations & contact again if unreachable
Step
2

WASHINGTON
STATE
UNIVERSITY
25
The client pushes the data to all the replicas
Step
3

WASHINGTON
STATE
UNIVERSITY
26
Client sends write request to primary. Primary decides serialization order for
all incoming modifications and applies them to the chunk
Step
4

WASHINGTON
STATE
UNIVERSITY
27
primary forwards the write request to all secondary replicas in order
Step
5

WASHINGTON
STATE
UNIVERSITY
28
All secondaries reply back to the primary once they finish the modifications
Step
6

WASHINGTON
STATE
UNIVERSITY
29
Primary replies back to the client, either with success or error
If Fails, Client can retry steps (3) through (7)
Step
7

WASHINGTON
STATE
UNIVERSITY
30
MASTER OPERATION

WASHINGTON
STATE
UNIVERSITY
NAMESPACE MANAGEMENT &
LOCKING
31
❖ Master maintain a table which map full path
name to metadata
❖ Locks are used over namespaces to ensure
proper serialization
❖ Each node in the namespace has
associated read-write lock
❖ Concurrent operations can be properly
serialized by locking mechanism

WASHINGTON
STATE
UNIVERSITY
OTHER MASTER OPERATIONS
32
❖ Replica Placement
➢ GFS place replicas over different racks for reliability and availability
➢ Maximum network bandwidth utilization
❖ Creation, Re-replication, Rebalancing
➢ New replicas are created on below average disk utilization
➢ Re-replication is done when available replicas fall below user specialized goal
➢ Rebalancing is done periodically for better disk space and load balancing

WASHINGTON
STATE
UNIVERSITY
OTHER MASTER OPERATIONS
33
❖ Garbage Collection
➢ Deletion operation is logged
➢ Filename is renamed to a hidden name; can be later deleted or recovered
➢ Orphan chunks are removed during regular scan of chunk namespace
❖ Stale Replica Detection
➢ Stale if a chunk server fails and misses mutations to the chunk while it is down
➢ Master stored chunk version used to identify
➢ The master removes stale replicas in its regular garbage collection

WASHINGTON
STATE
UNIVERSITY
34
FAULT TOLERANCE AND
DIAGNOSIS

WASHINGTON
STATE
UNIVERSITY
HIGH AVAILABILITY
35
❖ Fast Recovery
➢ restore their state and start in seconds no matter how they terminated.
➢ Operation logs & Checkpoints
❖ Chunk Replication
➢ each chunk is replicated on multiple chunk servers (3 or more) on different racks
❖ Master Replication
➢ Operation logs & Checkpoints are replicated on various machines
➢ “shadow” masters provide read-only access when the primary master is down (
depends on create delete updates from primary)

WASHINGTON
STATE
UNIVERSITY
36

WASHINGTON
STATE
UNIVERSITY
DATA INTEGRITY
37
❖ Checksum
➢ to check if there is corrupted data
➢ Each chunkserver independently verifies integrity
❖ A chunk is broken up into 64 KB blocks has 32 bit checksum
❖ Checksums are kept in memory and stored persistently with logging,
separate from user data.

WASHINGTON
STATE
UNIVERSITY
Example : CheckSum
38
carryout is added
Data added
Data To be sent
Checksum :1’s compliment
Note: Assume receiver receives the same data sent by sender, the data is said to be correct when
sum of checksum & received data contains all value 1

WASHINGTON
STATE
UNIVERSITY
39
PERFORMANCE RESULTS

WASHINGTON
STATE
UNIVERSITY
AGGREGATE THROUGHPUTS
40
❖ 1 Master, 16 Chunk servers, 16 Clients

WASHINGTON
STATE
UNIVERSITY
PERFORMANCE ON CLUSTER
41
Characteristics of Clusters
Performances of Clusters

WASHINGTON
STATE
UNIVERSITY
42
WHAT HAPPENED NEXT ?

WASHINGTON
STATE
UNIVERSITY
AFTER GFS !!
43
❖ HDFS released (MapReduce)
❖ Many Heterogeneous Storage evolved
❖ Colossus : The Future

WASHINGTON
STATE
UNIVERSITY
Questions?
44

WASHINGTON
STATE
UNIVERSITY
ATOMIC RECORD APPEND
45
❖ Primary Chunk Server checks if append exceeds max chunk size
❖ If so, it pads the chunk to max chunk size
❖ Secondary chunk servers do the same
❖ On failure, Client retries the operation
❖GFS appends data at least once automatically

Google File System - GFS Presentation Slides PPT

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Google File System - GFS Presentation Slides PPT

Similar to Google File System - GFS Presentation Slides PPT (20)

Recently uploaded

Recently uploaded (20)

Google File System - GFS Presentation Slides PPT