Google File System


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Beacause components failures are accepted even this kind of large system When we are regularly working , all TB sized, KB sized files are also suppoted by the system. Most files are mutatetd by appending rather than overwriting excepting data It is fine, if we have a file system without imposing burden on the application
  • 1.System supports usual operations , as well as GFS has snapshot & record append operations also. 2. Snapshot creates a copy of a file or directory tree at low cost 3. Record append allows multiple clients to append data at the same file
  • 1.Files are devided into fixed size chunk 2. Chunk handle  immutable and globally unique 64 bit at the time of chunk creation 3. By defult stored in three chunk servers
  • Larger size --  Adavtages Read write interaction between master and client make lesser Likely to perform many operations
  • To keep itself informed a shadow master reads a replica of the growing operation log and applies the same changes to its data structures exactly as the primary does. Keep handshake messages with chunkservers to monitor their status. It depends only on primary master only for replica location updates only from primary’s decision to create and delete replicas.
  • Logs will be used to reconstruct the entire interaction history to diagnose a problem. Serve as traces for load testing and performance analysis.
  • For one client read rate is 10 MB/s 80% of the estimated value For 16 clients 94 MB/s i.e for one client 6 MB/s 75% of the estimated value.
  • Write rate for one client 6.3 MB/s. Half of the estimated value. (12.5 MB/s) Aggregate write rate for 16 clients 35 MB/s 2.2 MB/s per one client. Half of the estimated value.
  • For one client it is 6.0 MB/s and for 16 clients it is 4.8 MB/s.
  • A- used for research and development. It reads through a few MBs to TBs of data, analyze or process them and write the results. B – Used for production data processing. task lasts much longer. Metadata at chunkservers – checksums, chunk version number Metadata at Masters are so small. does not limit the system’s capacity. File names in compressed form, ownerships and permission, mapping from files to chunks, chunks current version, replica location etc. Recovery is fast.
  • Because A can support up to 750 MB/s it is using 580 MB/s. B can support 1300 MB/s but using only 380 MB/s.
  • Google File System

    1. 1. THE GOOGLE FILE SYSTEM By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1
    2. 2. INTRODUCTION • Google • Applications process lots of data • Need good file system • Solution : Google File System Large, distributed, highly fault tolerant file system. 2
    3. 3. DESIGN MOTIVATIONS 1. Fault-tolerance and auto-recovery need to be built into the system. 2. Standard I/O assumptions (e.g. block size) have to be re-examined. 3. Record appends are the prevalent form of writing. 4. Google applications and GFS should be co- designed. 3
    4. 4. INTERFACE  Create  Delete  Open  Close  Read  Write  Snapshot  Record Append 4
    5. 5. GFS ARCHITECTURE On a single-machine FS:  An upper layer maintains the metadata.  A lower layer (i.e. disk) stores the data in units called “blocks”. In the GFS:  A master process maintains the metadata. A lower layer (i.e. a set of chunk servers) stores the data in units called “chunks”. 5
    7. 7. CHUNK  Analogous to block, except larger.  Size: 64 MB  Stored on chunk server as file  Chunk handle ( chunk file name) is used to reference chunk.  Replicated across multiple chunk servers 7
    8. 8. CHUNK SIZE • Advantages o Reduce client-master interaction o Reduce the size of the metadata • Disadvantages o Hot Spots Solution: Higher replication factor 8
    9. 9. MASTER  Single master is centralized  Stores all metadata: o File namespace o File to chunk mappings o Chunk location information 9
    10. 10. GFS ARCHITECTURE 10
    11. 11. System Interactions Current lease holder? identity of primary location of replicas (cached by client) 3a. data 3b. data 3c. data Write request Primary assign mutations Applies it Forward write request Operation completed Operation completed Operation completed or Error report 11
    12. 12. SYSTEM INTERACTIONS  Record appends - Client specifies only data  Snapshot -Makes a copy of a file or a directory tree 12
    13. 13. OPERATION LOG  Historical record of critical metadata changes  Defines the order of concurrent operations  Critical  Replicated on multiple remote machines  Respond to client only when log locally and remotely  Fast recovery by using checkpoints  Use a compact B-tree like form directly mapping into memory  Switch to a new log, Create new checkpoints in a separate threads 13
    14. 14. MASTER OPERATIONS  Namespace Management and Locking  Chunk Creation  Chunk Re-replication  Chunk Rebalancing  Garbage Collection 14
    15. 15. FAULT TOLERANCE AND DIAGNOSIS 1.High Availability They keep the overall system highly available with two simple yet effective strategies. Fast Recovery and replication 15
    16. 16. 1.1 Fast Recovery : Master and chunk servers are designed to restart and restore states in a few seconds. 1.2 Chunk Replication : Across multiple machines, across multiple racks. 16
    17. 17. 1.3 Master Replication:  Log of all changes made to metadata.  Log replicated on multiple machines.  “Shadow” masters for reading data if “real” master is down. 17
    18. 18. 18
    19. 19. 2. Data Integrity Each chunk has an associated checksum. 3. Diagnostic Logging Logging is maintained for keeping the details of interactions between machines. (exact request and responses sent on the wire except data being transferred.) 19
    20. 20. MEASUREMENTS They measured performance on a GFS cluster consisting one master, two master replicas, 16 chunk servers and 16 clients. 20
    21. 21. All machines are configured with 1.Dual 1.4 GHz PIII processors 2. 2 GB memory 3. Two 80 GB 5400 rpm disks 4. 100 Mbps full duplex Ethernet connection to an HP 2524 switch. 21
    22. 22. 22
    23. 23. 23
    24. 24. Here also rate will drop when the number of clients increases up to 16 , append rate drops due to congestion and variance in network transfer rates seen by different clients. 24
    25. 25. REAL WORLD CLUSTERS Table 1-Characteristics of two GFS clusters 25
    26. 26. Table 2 –Performance Metrics for A and B clusters 26
    27. 27. RESULTS 1.Read and Write Rates • Average write rate was 30 MB/s. • When the measurements were taken B was in a middle of a write. • Read rates were high, both clusters were in the middle of a heavy read activity. • A is using resources efficiently than B. 27
    28. 28. 2. Master Loads Master can easily keep up with 200 to 500 operations per second. 28
    29. 29. 3. Recovery Time. • Killed a single chunk server ( 15, 000 chunks containing 600 GB of data) in cluster B. •All chunks were replicated in 23.2 minutes at an effective replication rate of 440 MB/s. 29
    30. 30. Killed two chunk servers (16 000 chunks and 660 GB of data). Failure reduced 266 chunks to having a single replica. 30
    31. 31. These 266 chunks were cloned at a higher priority and all restored within 2 minutes. Putting the cluster in a state where it could tolerate another chunk server failure 31
    32. 32. WORKLOAD BREAKDOWN Cluster X and Y are used to represent breakdown of the workloads on two GFS. Cluster X is for research and development while Y is for production data processing. 32
    33. 33. Operations Breakdown by Size Table 3 – Operation Breakdown by Size (%) 33
    34. 34. Bytes transferred breakdown by operation size Table 4 – Bytes Transferred Breakdown by Operation Size(%) 34
    35. 35. Master Requests Breakdown by Type (%) Table 5 : Master request Breakdown by Type (%) 35
    36. 36. CONCLUSIONS • GFS demonstrates the qualities essential for supporting large scale data processing workloads on commodity hardware. • It provides fault tolerance by constant monitoring, replicating crucial data and fast, automatic recovery. • It delivers high aggregate throughput to many concurrent readers and writers by separating file system control from data transfer. 36
    37. 37. Thank You. 37
    38. 38. Q and A 38
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.