Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Google File System <ul><li>Suman Karumuri </li></ul><ul><li>Andy Bartholomew </li></ul><ul><li>Justin Palmer </li></ul>
GFS <ul><li>High performing,  scalable, distributed file system. </li></ul><ul><li>Batch oriented , data-intensive apps. <...
Design Assumptions <ul><li>Inexpensive commodity hardware. </li></ul><ul><li>Modest number of large files. </li></ul><ul><...
API <ul><li>Open and close </li></ul><ul><li>Create and delete </li></ul><ul><li>Read and write </li></ul><ul><li>Record a...
Architecture
Architecture GFS Master GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Files and  Chunks <ul><li>Files are divided into 64MB chunks. </li></ul><ul><li>Each Chunk has globally unique 64-bit hand...
GFS  Chunk Servers <ul><li>Manage chunks. </li></ul><ul><li>Tells master what chunks it has </li></ul><ul><li>Store chunks...
GFS Master <ul><li>Manages file namespace operations. </li></ul><ul><li>Manages file meta-data. </li></ul><ul><li>Manages ...
Create Operation
Create GFS Master Create /home/user/filename GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul>rack 2 rack 1...
Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul><ul><li>choos...
Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul><ul><li>choos...
Namespaces <ul><li>Syntax for file access same as regular file system </li></ul><ul><ul><li>/home/user/foo </li></ul></ul>...
Locking example <ul><li>Write /home/user/foo </li></ul><ul><ul><li>Acquires read locks on /home, /home/user </li></ul></ul...
Locking <ul><li>Design trade off </li></ul><ul><li>Simple design </li></ul><ul><li>Supports concurrent mutations in same d...
Read Operation
Read Operation GFS Master filename and chunk index GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Read Operation GFS Master chunk handle, server locations GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Read Operation GFS Master Chunk handle, bit range GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Read Operation GFS Master Data GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write <ul><li>Except without primary, use master </li></ul><ul><li>Explain dataflow, and control flow  </li></ul><ul><li>D...
Write Operation
Write GFS Master Chunk id,  chunk offset GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write GFS Master Chunkserver locations (caches this) GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk ...
Write GFS Master GFS Chunk  Server … data Pass along data to nearest replica GFS Client Application GFS Chunk  Server GFS ...
Write GFS Master Serializes all concurrent writes GFS Chunk  Server operation GFS Client Application GFS Chunk  Server GFS...
Write GFS Master GFS Chunk  Server serialized order of writes GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write GFS Master GFS Chunk  Server ack ack ack GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write GFS Master GFS Chunk  Server ack, chunk index GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write under failure GFS Master GFS Chunk  Server ack ack GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write under failure GFS Master GFS Chunk  Server retry GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Leases <ul><li>Master is bottleneck. </li></ul><ul><li>Designates a primary chunk server to handle mutations and serializa...
Write with primary GFS Master Chunk id chunk offset GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk  ...
Write with primary GFS Master Chunkserver locations (caches this) GFS Chunk  Server GFS Client Application GFS Chunk  Serv...
Write with primary GFS Master GFS Chunk  Server … data Pass along data to nearest replica GFS Client Application GFS Chunk...
Write with primary GFS Master Serializes all concurrent writes GFS Chunk  Server operation GFS Client Application GFS Chun...
Write with primary GFS Master GFS Chunk  Server serialized operations GFS Client Application GFS Chunk  Server GFS Chunk  ...
Write with primary GFS Master GFS Chunk  Server ack ack GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write with primary GFS Master GFS Chunk  Server Ack, chunk index GFS Client Application GFS Chunk  Server GFS Chunk  Serve...
Failures during writes <ul><li>Chunk boundary overflow </li></ul><ul><li>Replicas going down </li></ul><ul><ul><li>retry <...
Write with primary <ul><li>Leases etc </li></ul>
Record Append
Record append GFS Master Chunk id GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Record append GFS Master GFS Chunk  Server Ack, chunk index  from end of file GFS Client Application GFS Chunk  Server GFS...
Record Append Operation
Record append <ul><li>Most common mutation. </li></ul><ul><li>Write location determined by GFS. </li></ul><ul><li>Data is ...
Consistency Model
Write -Single process Chunk 1 9:Hello Chunk 1’ 9:Hello
Write – Single process Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
Same with any failed mutation Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
Multiple Writers Chunk 1 9:Hello 10:Wor12345 Chunk 1’ 9:Hello 10:Wor12345 Write(“World”,10:0) Write(“12345”,10:3) Consiste...
Append Chunk 1 9:Hello 10:World Chunk 1’ 9:Hello Append(“World”) Inconsistent and Undefined retry
Append Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent ...
Same for append with multiple writers Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined ...
Consistency model <ul><li>Chunks are not bitwise identical. </li></ul><ul><li>Consistent – all servers agree. </li></ul><u...
Snapshot <ul><li>Snapshot of a file or dir. </li></ul><ul><li>Should be fast, minimal data overhead. </li></ul><ul><li>On ...
Delete Operation <ul><li>Meta data operation. </li></ul><ul><ul><li>Renames file to special name. </li></ul></ul><ul><ul><...
Delete API <ul><li>Design trade off </li></ul><ul><li>Simple design </li></ul><ul><li>Can do when master is free. </li></u...
Fault Tolerance for chunks <ul><li>Re-replication – maintains replication factor. </li></ul><ul><li>Rebalancing  </li></ul...
Fault tolerance for master <ul><li>Master  </li></ul><ul><ul><li>Replication and checkpointing of Operation Log </li></ul>...
Fault tolerance for Chunk Server <ul><li>All chunks are versioned. </li></ul><ul><li>Version number updated when a new lea...
High Availability <ul><li>Fast recovery </li></ul><ul><ul><li>Of Masters and chunk servers. </li></ul></ul><ul><li>HeartBe...
Performance metrics.
Conclusions
Conclusions <ul><li>Extremely cheap hardware </li></ul><ul><ul><li>High failure rate </li></ul></ul><ul><li>Highly concurr...
Conclusions … <ul><li>Built for map-reduce </li></ul><ul><ul><li>Mostly appends and scanning reads </li></ul></ul><ul><ul>...
Thank you?
Design goals <ul><li>Component failures are a norm. </li></ul><ul><li>Files are huge (2GB is common). </li></ul><ul><li>Fi...
Upcoming SlideShare
Loading in …5
×

GFS

9,042 views

Published on

Published in: Technology
  • Be the first to comment

GFS

  1. 1. Google File System <ul><li>Suman Karumuri </li></ul><ul><li>Andy Bartholomew </li></ul><ul><li>Justin Palmer </li></ul>
  2. 2. GFS <ul><li>High performing, scalable, distributed file system. </li></ul><ul><li>Batch oriented , data-intensive apps. </li></ul><ul><li>Fault-tolerant. </li></ul><ul><li>Inexpensive commodity hardware. </li></ul>
  3. 3. Design Assumptions <ul><li>Inexpensive commodity hardware. </li></ul><ul><li>Modest number of large files. </li></ul><ul><li>Large streaming reads, small random reads. (map-reduce) </li></ul><ul><li>Mostly appends. </li></ul><ul><li>Consistent concurrent execution is important. </li></ul><ul><li>High throughput and low latency. </li></ul>
  4. 4. API <ul><li>Open and close </li></ul><ul><li>Create and delete </li></ul><ul><li>Read and write </li></ul><ul><li>Record append </li></ul><ul><li>Snapshot </li></ul>
  5. 5. Architecture
  6. 6. Architecture GFS Master GFS Client Application GFS Chunk Server GFS Chunk Server …
  7. 7. Files and Chunks <ul><li>Files are divided into 64MB chunks. </li></ul><ul><li>Each Chunk has globally unique 64-bit handle. </li></ul><ul><li>Design trade off </li></ul><ul><li>Optimized for large file sizes for high throughput. </li></ul><ul><li>Have very few small files. </li></ul><ul><li>Highly contended small files have large replication factor. </li></ul>
  8. 8. GFS Chunk Servers <ul><li>Manage chunks. </li></ul><ul><li>Tells master what chunks it has </li></ul><ul><li>Store chunks as files. </li></ul><ul><li>Commodity Linux machines. </li></ul><ul><li>Maintain data consistency of chunks. </li></ul><ul><li>Design trade off </li></ul><ul><li>Chunk server knows what chunks are good </li></ul><ul><li>No need to keep Master and Chunk server in sync </li></ul>
  9. 9. GFS Master <ul><li>Manages file namespace operations. </li></ul><ul><li>Manages file meta-data. </li></ul><ul><li>Manages chunks in chunk servers. </li></ul><ul><ul><li>Creation/deletion. </li></ul></ul><ul><ul><li>Placement. </li></ul></ul><ul><ul><li>Load balancing. </li></ul></ul><ul><ul><li>Maintains replication. </li></ul></ul><ul><li>Uses a checkpointed operation log for replication. </li></ul>
  10. 10. Create Operation
  11. 11. Create GFS Master Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  12. 12. Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul>rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  13. 13. Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul><ul><li>choose locations for chunks </li></ul><ul><ul><li>across multiple racks </li></ul></ul><ul><ul><li>across multiple networks </li></ul></ul><ul><ul><li>machines with low contention </li></ul></ul><ul><ul><li>machines with low disk use </li></ul></ul>rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  14. 14. Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul><ul><li>choose locations for chunks </li></ul>rack 2 rack 1 Returns chunk handle, Chunk locations GFS Client Application GFS Chunk Server GFS Chunk Server …
  15. 15. Namespaces <ul><li>Syntax for file access same as regular file system </li></ul><ul><ul><li>/home/user/foo </li></ul></ul><ul><li>Semantics are different </li></ul><ul><ul><li>No directory structures </li></ul></ul><ul><ul><li>Paths exist for fine-grained locking </li></ul></ul><ul><ul><li>Paths stored using prefix compression </li></ul></ul><ul><li>No symbolic or hard links. </li></ul>
  16. 16. Locking example <ul><li>Write /home/user/foo </li></ul><ul><ul><li>Acquires read locks on /home, /home/user </li></ul></ul><ul><ul><li>Acquires write lock on /home/user/foo </li></ul></ul><ul><li>Delete /home/user/foo </li></ul><ul><ul><li>Acquires read lock on /home, /home/user </li></ul></ul><ul><ul><li>Acquires write lock on /home/user/foo </li></ul></ul><ul><ul><ul><li>Must wait for write to finish. </li></ul></ul></ul>
  17. 17. Locking <ul><li>Design trade off </li></ul><ul><li>Simple design </li></ul><ul><li>Supports concurrent mutations in same directory </li></ul><ul><li>Canonical lock order prevents deadlocks </li></ul>
  18. 18. Read Operation
  19. 19. Read Operation GFS Master filename and chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  20. 20. Read Operation GFS Master chunk handle, server locations GFS Client Application GFS Chunk Server GFS Chunk Server …
  21. 21. Read Operation GFS Master Chunk handle, bit range GFS Client Application GFS Chunk Server GFS Chunk Server …
  22. 22. Read Operation GFS Master Data GFS Client Application GFS Chunk Server GFS Chunk Server …
  23. 23. Write <ul><li>Except without primary, use master </li></ul><ul><li>Explain dataflow, and control flow </li></ul><ul><li>Dataflow pushing optimization </li></ul>
  24. 24. Write Operation
  25. 25. Write GFS Master Chunk id, chunk offset GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  26. 26. Write GFS Master Chunkserver locations (caches this) GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  27. 27. Write GFS Master GFS Chunk Server … data Pass along data to nearest replica GFS Client Application GFS Chunk Server GFS Chunk Server
  28. 28. Write GFS Master Serializes all concurrent writes GFS Chunk Server operation GFS Client Application GFS Chunk Server GFS Chunk Server …
  29. 29. Write GFS Master GFS Chunk Server serialized order of writes GFS Client Application GFS Chunk Server GFS Chunk Server …
  30. 30. Write GFS Master GFS Chunk Server ack ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  31. 31. Write GFS Master GFS Chunk Server ack, chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  32. 32. Write under failure GFS Master GFS Chunk Server ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  33. 33. Write under failure GFS Master GFS Chunk Server retry GFS Client Application GFS Chunk Server GFS Chunk Server …
  34. 34. Leases <ul><li>Master is bottleneck. </li></ul><ul><li>Designates a primary chunk server to handle mutations and serialization. </li></ul>
  35. 35. Write with primary GFS Master Chunk id chunk offset GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  36. 36. Write with primary GFS Master Chunkserver locations (caches this) GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  37. 37. Write with primary GFS Master GFS Chunk Server … data Pass along data to nearest replica GFS Client Application GFS Chunk Server GFS Chunk Server
  38. 38. Write with primary GFS Master Serializes all concurrent writes GFS Chunk Server operation GFS Client Application GFS Chunk Server GFS Chunk Server …
  39. 39. Write with primary GFS Master GFS Chunk Server serialized operations GFS Client Application GFS Chunk Server GFS Chunk Server …
  40. 40. Write with primary GFS Master GFS Chunk Server ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  41. 41. Write with primary GFS Master GFS Chunk Server Ack, chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  42. 42. Failures during writes <ul><li>Chunk boundary overflow </li></ul><ul><li>Replicas going down </li></ul><ul><ul><li>retry </li></ul></ul>
  43. 43. Write with primary <ul><li>Leases etc </li></ul>
  44. 44. Record Append
  45. 45. Record append GFS Master Chunk id GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  46. 46. Record append GFS Master GFS Chunk Server Ack, chunk index from end of file GFS Client Application GFS Chunk Server GFS Chunk Server …
  47. 47. Record Append Operation
  48. 48. Record append <ul><li>Most common mutation. </li></ul><ul><li>Write location determined by GFS. </li></ul><ul><li>Data is atomically appended at least once. </li></ul><ul><li>Append can’t be more than ¼ size of chunk to optimize chunk occupancy. </li></ul>
  49. 49. Consistency Model
  50. 50. Write -Single process Chunk 1 9:Hello Chunk 1’ 9:Hello
  51. 51. Write – Single process Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
  52. 52. Same with any failed mutation Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
  53. 53. Multiple Writers Chunk 1 9:Hello 10:Wor12345 Chunk 1’ 9:Hello 10:Wor12345 Write(“World”,10:0) Write(“12345”,10:3) Consistent and Undefined
  54. 54. Append Chunk 1 9:Hello 10:World Chunk 1’ 9:Hello Append(“World”) Inconsistent and Undefined retry
  55. 55. Append Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
  56. 56. Same for append with multiple writers Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
  57. 57. Consistency model <ul><li>Chunks are not bitwise identical. </li></ul><ul><li>Consistent – all servers agree. </li></ul><ul><li>Defined – Consistent and data as written by one mutation. </li></ul><ul><li>Fine for Map-Reduce. </li></ul><ul><li>Applications can differentiate defined from undefined regions. </li></ul>
  58. 58. Snapshot <ul><li>Snapshot of a file or dir. </li></ul><ul><li>Should be fast, minimal data overhead. </li></ul><ul><li>On a snapshot call: </li></ul><ul><ul><li>Revokes leases. </li></ul></ul><ul><ul><li>Logs the operation. </li></ul></ul><ul><ul><li>Copies meta data and makes new chunks pointing to same data. </li></ul></ul><ul><ul><li>Copy on write is used to create actual chunks. </li></ul></ul>
  59. 59. Delete Operation <ul><li>Meta data operation. </li></ul><ul><ul><li>Renames file to special name. </li></ul></ul><ul><ul><li>After certain time, deletes the actual chunks. </li></ul></ul><ul><li>Supports undelete for limited time. </li></ul><ul><li>Actual lazy garbage collection </li></ul><ul><ul><li>Master deletes meta data </li></ul></ul><ul><ul><li>Piggybacks active chunk list on HeartBeat . </li></ul></ul><ul><ul><li>Chunk servers delete files. </li></ul></ul>
  60. 60. Delete API <ul><li>Design trade off </li></ul><ul><li>Simple design </li></ul><ul><li>Can do when master is free. </li></ul><ul><li>Quick logical deletes. </li></ul><ul><li>Good when failure is common. </li></ul><ul><li>Difficult to tune when storage is tight. But, there are workarounds. </li></ul>
  61. 61. Fault Tolerance for chunks <ul><li>Re-replication – maintains replication factor. </li></ul><ul><li>Rebalancing </li></ul><ul><ul><li>Load balancing </li></ul></ul><ul><ul><li>Disk space usage. </li></ul></ul><ul><li>Data integrity </li></ul><ul><ul><li>Checksum for each chunk divided into 64KB blocks. </li></ul></ul><ul><ul><li>Checksum is checked every time an application reads the data. </li></ul></ul>
  62. 62. Fault tolerance for master <ul><li>Master </li></ul><ul><ul><li>Replication and checkpointing of Operation Log </li></ul></ul><ul><li>Shadow Masters. </li></ul><ul><ul><li>Read Check pointed operation log. </li></ul></ul><ul><ul><li>Doesn’t make meta data changes. </li></ul></ul><ul><ul><li>Reduces load on master. </li></ul></ul><ul><ul><li>Might have stale data. </li></ul></ul>
  63. 63. Fault tolerance for Chunk Server <ul><li>All chunks are versioned. </li></ul><ul><li>Version number updated when a new lease is granted. </li></ul><ul><li>Chunks with old versions are not served and are deleted. </li></ul>
  64. 64. High Availability <ul><li>Fast recovery </li></ul><ul><ul><li>Of Masters and chunk servers. </li></ul></ul><ul><li>HeartBeat messages </li></ul><ul><ul><li>Checking liveness of chunkservers </li></ul></ul><ul><ul><li>Piggybacking GC commands </li></ul></ul><ul><ul><li>Lease renewal </li></ul></ul><ul><li>Diagnostic tools. </li></ul>
  65. 65. Performance metrics.
  66. 66. Conclusions
  67. 67. Conclusions <ul><li>Extremely cheap hardware </li></ul><ul><ul><li>High failure rate </li></ul></ul><ul><li>Highly concurrent reads and writes </li></ul><ul><li>Highly scalable </li></ul><ul><li>Supports undelete (for configurable time) </li></ul>
  68. 68. Conclusions … <ul><li>Built for map-reduce </li></ul><ul><ul><li>Mostly appends and scanning reads </li></ul></ul><ul><ul><li>Mostly large files </li></ul></ul><ul><ul><li>Requires high throughput </li></ul></ul><ul><li>Developers understand the limitations and tune apps to suit GFS. </li></ul>
  69. 69. Thank you?
  70. 70. Design goals <ul><li>Component failures are a norm. </li></ul><ul><li>Files are huge (2GB is common). </li></ul><ul><li>Files are appended. </li></ul><ul><li>Application (map-reduce) and file-system are designed together. </li></ul>

×