Google File System
Lalit Kumar
M.Tech Final Year
Compute Science & Engineering Dept.
KEC Dwarahat, Almora
Overview
 Introduction To GFS
 Architecture
 Data Flow
 System Interactions
 Master Operations
 Meta Data Management
 Garbage Collection
 Fault tolerance
 Latest Advancement
 Drawbacks
 Conclusion
 References
Introduction
 More than 15,000 commodity-class PC's.
 Multiple clusters distributed worldwide.
 Thousands of queries served per second.
 One query reads 100's of MB of data.
 One query consumes 10's of billions of CPU cycles.
 Google stores dozens of copies of the entire Web!
Conclusion: Need large, distributed, highly fault tolerant file
system.
Architecture
A GFS cluster consists of a single master and multiple chunk-servers
and is accessed by multiple clients
Figure 1: GFS Architecture
Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
Master
 Manages namespace/metadata.
 Manages chunk creation, replication, placement.
 Performs snapshot operation to create duplicate of file or directory tree.
 Performs checkpointing and logging of changes to metadata
Chunkservers
 On startup/failure recovery, reports chunks to master.
 Periodically reports sub-set of chunks to master (to detect no longer needed
chunks)
Metadata
 Types of Metadata:- File and chunk namespaces, Mapping from files to
chunks, Location of each chunks replicas.
 Easy and efficient for the master to periodically scan.
 Periodic scanning is used to implement chunk garbage collection, re-
replication and chunk migration .
 Data is pushed linearly along a carefully picked chain of chunk servers in a
TCP pipelined fashion.
 Once a chunkserver receives some data, it starts forwarding immediately to
the next chunkserver
 Each machine forwards the data to the closest machine in the network
topology that has not received it.
Data Flow
Figure 2: Data Flow in chunkservers
Source: http://research.google.com/archive/gfs‐sosp2003.pdf
System Interactions
Read Algorithm
1. Application originates the read request
2. GFS client translates the request form
(filename, byte range) -> (filename, chunk
index), and sends it to master
3. Master responds with chunk handle and replica
locations (i.e. chunkservers where the replicas
are stored)
4. Client picks a location and sends the (chunk
handle, byte range) request to the location
5. Chunkserver sends requested data to the client
6. Client forwards the data to the application .
Figure 3: Block diagram for Read operation
Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
Write Algorithm
1. Application originates the request
2. GFS client translates request from
(filename, data) -> (filename, chunk
index), and sends it to master
3. Master responds with chunk handle and
(primary + secondary) replica locations
4. Client pushes write data to all locations.
Data is stored in chunkserver’s internal
buffers
5. Client sends write command to primary
6. Primary determines serial order for data
instances stored in its buffer and writes the
instances in that order to the chunk
7. Primary sends the serial order to the
secondaries and tells them to perform the
write
8. Secondaries respond to the primary &
primary responds back to the client
Figure 4: Block Diagram for Write operation
Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
Master Operation
1. Namespace Management and Locking
 GFS maps full pathname to Metadata in a table.
 Each master operation acquires a set of locks.
 Locking scheme allows concurrent mutations in same directory.
 Locks are acquired in a consistent total order to prevent deadlock.
2. Replica Placement
3. Chunk Creation
4. Re-Replication
5. Balancing
 Each master operation acquires a set of locks before it runs
 To make operation on /dir1/dir2/dir3/leaf it first needs the
following locks
– Read-lock on /dir1
– Read-lock on /dir1/dir2
– Read-lock on /dir1/dir2/dir3
– Read-lock or Write-lock on /dir1/dir2/dir3/leaf
 File creation doesn’t require write‐lock on parent director read-
lock on the name Sufficient to protect the parent directory from
deletion, rename, or snapsho1ed
1. Namespace Management & Locking
2. Chunk Creation
 Master considers several factors
 Place new replicas on chunk servers with below‐average disk
space utilization
 Limit the number of “recent” creations on each chunk server
 Spread replicas of a chunk across racks
3. Re-replication
 Master Re-replicate a chunk as soon as the number of available
replicas falls below a user-specified goal.
 When a chunkserver becomes unavailable.
 When a chunkserver reports a corrupted chunk.
 When the replication goal is increased.
 Re‐replication placement is similar as for “creation”
4. Balancing
 Master Re-balances replicas periodically for better disk space and
load balancing
 Master gradually fills up a new chunkserver rather than instantly
swaps it with new chunks (and the heavy write traffic that come with
them!)
Metadata Management0
 The master stores three major types of metadata:
 File and chunk namespaces
 Mapping from files to chunks
 Locations of each chunk’s replicas
 All metadata is kept in the master’s memory.
Figure 5: logical Structure of Metadata
Source: Naushad UzZaman,“Survey on Google File System”,CSC 456,2007
 Storage reclaimed lazily by GC.
 File first renamed to a hidden name.
 Hidden files removes if more than three days old.
 When hidden file removed, in-memory metadata is removed.
 Regularly scans chunk namespace, identifying orphaned chunks. These
are removed.
 Chunk servers periodically report chunks they have and the master replies
with the identity of all chunks that are no longer present in the master’s
metadata. The chunkserver is free to delete its replicas of such chunks.
Garbage Collection
Fault Tolerance
 High availability:
 Fast recovery.
 Chunk replication.
 Master Replication
 Data Integrity:
 Chunkserver uses checksumming.
 Broken up into 64 KB blocks.
Latest Advancement
1. Gmail- An easily configurable email service with 15GB of web space.
2. Blogger- A free web-based service that helps consumers publish on the
web without writing code or installing software.
3. Google- “Next generation corporate s/w” A smaller version of the Google
software, modified for private use.
 Small files will have small number of chunks even one. This can lead to
chunk servers storing these files to become hot spots in case of many client
requests.
 Internal Fragmentation.
 If there are many such small files the master involvement will increase and
can lead to a potential bottleneck. Having a single master node can become
an issue.
 Master memory is a limitation.
 Performance might degrade if the numbers of writers and random writes
are more.
 No reasoning is provided for the choice of standard chunk size (64MB).
Drawbacks
Conclusion
GFS meets Google storage requirements
 Incremental growth.
 Regular check of component failure.
 Data optimization from special operations .
 Simple architecture.
 Fault Tolerance.
References
[1] Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung, The Google
File System, ACM SIGOPS Operating Systems Review, Volume 37, Issue 5,
2003.
[2] Sean Quinlan, Kirk McKusick “GFS-Evolution and Fast-Forward”
Communications of the ACM, Vol 53, 2013.
[3] Thomas Anderson, Michael Dahlin, JeannaNeefe, David Patterson, Drew
Roselli, and Randolph Wang. Serverlessnetworkfil e systems. In Proceedings of
the 15th ACM Symposium on Operating System Principles, pages 109–126,
Copper Mountain Resort, Colorado, December 1995.
[4] Luis-Felipe Cabrera and Darrell D. E. Long. Swift: Using distributed disks
triping to provide high I/O data rates. Computer Systems, 4(4):405–436, 1991.
[5] InterMezzo. http://www.inter-mezzo.org, 2003.
Thank You….

advanced Google file System

  • 1.
    Google File System LalitKumar M.Tech Final Year Compute Science & Engineering Dept. KEC Dwarahat, Almora
  • 2.
    Overview  Introduction ToGFS  Architecture  Data Flow  System Interactions  Master Operations  Meta Data Management  Garbage Collection  Fault tolerance  Latest Advancement  Drawbacks  Conclusion  References
  • 3.
    Introduction  More than15,000 commodity-class PC's.  Multiple clusters distributed worldwide.  Thousands of queries served per second.  One query reads 100's of MB of data.  One query consumes 10's of billions of CPU cycles.  Google stores dozens of copies of the entire Web! Conclusion: Need large, distributed, highly fault tolerant file system.
  • 4.
    Architecture A GFS clusterconsists of a single master and multiple chunk-servers and is accessed by multiple clients Figure 1: GFS Architecture Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
  • 5.
    Master  Manages namespace/metadata. Manages chunk creation, replication, placement.  Performs snapshot operation to create duplicate of file or directory tree.  Performs checkpointing and logging of changes to metadata Chunkservers  On startup/failure recovery, reports chunks to master.  Periodically reports sub-set of chunks to master (to detect no longer needed chunks) Metadata  Types of Metadata:- File and chunk namespaces, Mapping from files to chunks, Location of each chunks replicas.  Easy and efficient for the master to periodically scan.  Periodic scanning is used to implement chunk garbage collection, re- replication and chunk migration .
  • 6.
     Data ispushed linearly along a carefully picked chain of chunk servers in a TCP pipelined fashion.  Once a chunkserver receives some data, it starts forwarding immediately to the next chunkserver  Each machine forwards the data to the closest machine in the network topology that has not received it. Data Flow Figure 2: Data Flow in chunkservers Source: http://research.google.com/archive/gfs‐sosp2003.pdf
  • 7.
    System Interactions Read Algorithm 1.Application originates the read request 2. GFS client translates the request form (filename, byte range) -> (filename, chunk index), and sends it to master 3. Master responds with chunk handle and replica locations (i.e. chunkservers where the replicas are stored) 4. Client picks a location and sends the (chunk handle, byte range) request to the location 5. Chunkserver sends requested data to the client 6. Client forwards the data to the application . Figure 3: Block diagram for Read operation Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
  • 8.
    Write Algorithm 1. Applicationoriginates the request 2. GFS client translates request from (filename, data) -> (filename, chunk index), and sends it to master 3. Master responds with chunk handle and (primary + secondary) replica locations 4. Client pushes write data to all locations. Data is stored in chunkserver’s internal buffers 5. Client sends write command to primary 6. Primary determines serial order for data instances stored in its buffer and writes the instances in that order to the chunk 7. Primary sends the serial order to the secondaries and tells them to perform the write 8. Secondaries respond to the primary & primary responds back to the client Figure 4: Block Diagram for Write operation Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
  • 9.
    Master Operation 1. NamespaceManagement and Locking  GFS maps full pathname to Metadata in a table.  Each master operation acquires a set of locks.  Locking scheme allows concurrent mutations in same directory.  Locks are acquired in a consistent total order to prevent deadlock. 2. Replica Placement 3. Chunk Creation 4. Re-Replication 5. Balancing
  • 10.
     Each masteroperation acquires a set of locks before it runs  To make operation on /dir1/dir2/dir3/leaf it first needs the following locks – Read-lock on /dir1 – Read-lock on /dir1/dir2 – Read-lock on /dir1/dir2/dir3 – Read-lock or Write-lock on /dir1/dir2/dir3/leaf  File creation doesn’t require write‐lock on parent director read- lock on the name Sufficient to protect the parent directory from deletion, rename, or snapsho1ed 1. Namespace Management & Locking
  • 11.
    2. Chunk Creation Master considers several factors  Place new replicas on chunk servers with below‐average disk space utilization  Limit the number of “recent” creations on each chunk server  Spread replicas of a chunk across racks
  • 12.
    3. Re-replication  MasterRe-replicate a chunk as soon as the number of available replicas falls below a user-specified goal.  When a chunkserver becomes unavailable.  When a chunkserver reports a corrupted chunk.  When the replication goal is increased.  Re‐replication placement is similar as for “creation”
  • 13.
    4. Balancing  MasterRe-balances replicas periodically for better disk space and load balancing  Master gradually fills up a new chunkserver rather than instantly swaps it with new chunks (and the heavy write traffic that come with them!)
  • 14.
    Metadata Management0  Themaster stores three major types of metadata:  File and chunk namespaces  Mapping from files to chunks  Locations of each chunk’s replicas  All metadata is kept in the master’s memory. Figure 5: logical Structure of Metadata Source: Naushad UzZaman,“Survey on Google File System”,CSC 456,2007
  • 15.
     Storage reclaimedlazily by GC.  File first renamed to a hidden name.  Hidden files removes if more than three days old.  When hidden file removed, in-memory metadata is removed.  Regularly scans chunk namespace, identifying orphaned chunks. These are removed.  Chunk servers periodically report chunks they have and the master replies with the identity of all chunks that are no longer present in the master’s metadata. The chunkserver is free to delete its replicas of such chunks. Garbage Collection
  • 16.
    Fault Tolerance  Highavailability:  Fast recovery.  Chunk replication.  Master Replication  Data Integrity:  Chunkserver uses checksumming.  Broken up into 64 KB blocks.
  • 17.
    Latest Advancement 1. Gmail-An easily configurable email service with 15GB of web space. 2. Blogger- A free web-based service that helps consumers publish on the web without writing code or installing software. 3. Google- “Next generation corporate s/w” A smaller version of the Google software, modified for private use.
  • 18.
     Small fileswill have small number of chunks even one. This can lead to chunk servers storing these files to become hot spots in case of many client requests.  Internal Fragmentation.  If there are many such small files the master involvement will increase and can lead to a potential bottleneck. Having a single master node can become an issue.  Master memory is a limitation.  Performance might degrade if the numbers of writers and random writes are more.  No reasoning is provided for the choice of standard chunk size (64MB). Drawbacks
  • 19.
    Conclusion GFS meets Googlestorage requirements  Incremental growth.  Regular check of component failure.  Data optimization from special operations .  Simple architecture.  Fault Tolerance.
  • 20.
    References [1] Sanjay Ghemawat,Howard Gobioff and Shun-Tak Leung, The Google File System, ACM SIGOPS Operating Systems Review, Volume 37, Issue 5, 2003. [2] Sean Quinlan, Kirk McKusick “GFS-Evolution and Fast-Forward” Communications of the ACM, Vol 53, 2013. [3] Thomas Anderson, Michael Dahlin, JeannaNeefe, David Patterson, Drew Roselli, and Randolph Wang. Serverlessnetworkfil e systems. In Proceedings of the 15th ACM Symposium on Operating System Principles, pages 109–126, Copper Mountain Resort, Colorado, December 1995. [4] Luis-Felipe Cabrera and Darrell D. E. Long. Swift: Using distributed disks triping to provide high I/O data rates. Computer Systems, 4(4):405–436, 1991. [5] InterMezzo. http://www.inter-mezzo.org, 2003.
  • 21.