1. GOOGLE FILE SYSTEM
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Presented By – Ankit Thiranh
2. OVERVIEW
• Introduction
• Architecture
• Characteristics
• System Interaction
• Master Operation and Fault tolerance and diagnosis
• Measurements
• Some Real world clusters and their performance
3. INTRODUCTION
• Google – large amount of data
• Need a good file distribution system to process its data
• Solution: Google File System
• GFS is :
• Large
• Distributed
• Highly fault tolerant system
4. ASSUMPTIONS
• The system is built from many inexpensive commodity components that often fail.
• The system stores a modest number of large files.
• Primarily two kind of reads: large streaming reads and small random needs.
• Many large sequential writes append data to files.
• The system must efficiently implement well-defined semantics for multiple clients that
concurrently append to the same file.
• High sustained bandwidth is more important than low latency.
6. CHARACTERISTICS
• Single master
• Chunk size
• Metadata
• In-Memory Data structures
• Chunk Locations
• Operational Log
• Consistency Model (figure)
• Guarantees by GFS
• Implications for Applications
Write Record Append
Serial Success defined Defined
interspersed with
inconsistent
Concurrent
successes
Consistent but
undefined
Failure inconsistent
File Region State After Mutation
7. SYSTEM INTERACTION
• Leases and Mutation Order
• Data flow
• Atomic Record appends
• Snapshot
Figure 2: Write Control and Data Flow
9. FAULT TOLERANCE AND DIAGNOSIS
• High Availability
• Fast Recovery
• Chunk Replication
• Master Replication
• Data Integrity
• Diagnostics tools
10. MEASUREMENTS
Aggregate Throughputs. Top curves show theoretical limits imposed by the network topology. Bottom curves
show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in
some cases because of low variance in measurements.
11. REAL WORLD CLUSTERS
• Two clusters were examined:
• Cluster A used for Research and development by over a hundred users.
• Cluster B is used for production data processing with occasional human
intervention
• Storage
• Metadata
Cluster A B
Chunkservers 342 227
Available disk Size
72 TB
Used Disk Space
55 TB
Characteristics of two GFS clusters
180 TB
155 TB
Number of Files
Number of Dead Files
Number of chunks
735 k
22 k
992 k
737 k
232 k
1550 k
Metadata at chunkservers
Metadata at master
13 GB
48 MB
21 GB
60 MB
14. WORKLOAD BREAKDOWN
• Master Workload
Cluster X Y
Open 26.1 16.3
Delete 0.7 1.5
FindLocation 64.3 65.8
FindLeaseHolder 7.8 13.4
FindMatchingFiles 0.6 2.2
All other combined 0.5 0.8
Master Requests Break down by Type (% )
Editor's Notes
GFS – single master, multiple chunkservers, multiple client. Files- divided into chunks, chunks- immutable and globally unique 64 bit chunk handle. Stored in multiple chunkservers, master- contains metadata includes the namespace, access control information, mapping of file to chunks and current location of chunks
Single Master- can make sophisticated chunk replacement and replication decisions using global knowledge. Read example
Chunk Size – 64 MB, advantages – reduces client-master interation, client more likely to perform many operations on given chunk, reduces metadata size.
Metadata – stores file and chunk namespaces, mapping from files to chunks, location to chunk’s relica, metadata stored in memory to do fast operations, chunk location – does not keep a record, polls at startup, monitor by sending heartbeat messages,operation log- contains a history of critical metadata changes.
Guarantee- application mutation on same order to all the replicas , using chunk version numbers to detect any replica
Consistent – all replicas have the same data, defined – consistent – defined and client can see what the mutation has written
Mutation – operation that changes the content of metadata
Data flow – bandwidth – data is [pushed linearly along the server, avoid bottlenecks and high-latency links- each machine forwards the data to closest possible, latency min – pipelining the data transfer over TCP connections.
Record append – client specifies the data, GFS appends automatically, same way as control flow
Snapshots – makes a copy of file or ‘directory tree’ minimizing any interruption with ongoing mutations
Master – executes all namespace operations, manages chunk replicas,
Namespace – GFS logically represent its namespace as a look up table mapping full path names to metadata.
Replica placement - 1) maximise data reliability and availability, and 2) maximum bandwidth utilization
Creation, re-replication – replicas on severs with below average disk utilization, limit recent creation on each chunk server, spread replicas of a chunk across racks
Garbage collection – after deletion, file renamed to a hidden file, deleted after 3 days, orphaned chunks,
State replica detection – chunkserver failure missing mutation while it is down, master assigns – chunk server numbers to distinguish
Fast recovery – mast and chunk server designed such that they restore their data and start in two seconds
Chunk replication – discussed earlier
Master replication – operations log and checkpoints are replicated on multiple machines, shadow masters – provide read-only access
Data integrity – uses checksumming to detect corruption of stored data, we can recover from corruption using replicas, but it is impractical
Diagnostic tools – generate diagnostic logs that record many significant events. The RPC logs include the exact requests and responsessent on the wire, except for the file data being read or written.
The two clusters have similar numbers of files, though B has a larger proportion of dead files, namely files which were deleted or replaced by a new version but whose storage have not yet been reclaimed. It also has more chunks because its files tend to be larger
Read returns no data in Y b’coz applications in production system use file as producer-consumer queues
cluster Y sees a much higher percentage of large record appends than cluster X does becauseour production systems, which use cluster Y, are more aggressively tuned for GFS