GFS

4,316
-1

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,316
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
272
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

GFS

  1. 1. Google File System <ul><li>Suman Karumuri </li></ul><ul><li>Andy Bartholomew </li></ul><ul><li>Justin Palmer </li></ul>
  2. 2. GFS <ul><li>High performing, scalable, distributed file system. </li></ul><ul><li>Batch oriented , data-intensive apps. </li></ul><ul><li>Fault-tolerant. </li></ul><ul><li>Inexpensive commodity hardware. </li></ul>
  3. 3. Design Assumptions <ul><li>Inexpensive commodity hardware. </li></ul><ul><li>Modest number of large files. </li></ul><ul><li>Large streaming reads, small random reads. (map-reduce) </li></ul><ul><li>Mostly appends. </li></ul><ul><li>Consistent concurrent execution is important. </li></ul><ul><li>High throughput and low latency. </li></ul>
  4. 4. API <ul><li>Open and close </li></ul><ul><li>Create and delete </li></ul><ul><li>Read and write </li></ul><ul><li>Record append </li></ul><ul><li>Snapshot </li></ul>
  5. 5. Architecture
  6. 6. Architecture GFS Master GFS Client Application GFS Chunk Server GFS Chunk Server …
  7. 7. Files and Chunks <ul><li>Files are divided into 64MB chunks. </li></ul><ul><li>Each Chunk has globally unique 64-bit handle. </li></ul><ul><li>Design trade off </li></ul><ul><li>Optimized for large file sizes for high throughput. </li></ul><ul><li>Have very few small files. </li></ul><ul><li>Highly contended small files have large replication factor. </li></ul>
  8. 8. GFS Chunk Servers <ul><li>Manage chunks. </li></ul><ul><li>Tells master what chunks it has </li></ul><ul><li>Store chunks as files. </li></ul><ul><li>Commodity Linux machines. </li></ul><ul><li>Maintain data consistency of chunks. </li></ul><ul><li>Design trade off </li></ul><ul><li>Chunk server knows what chunks are good </li></ul><ul><li>No need to keep Master and Chunk server in sync </li></ul>
  9. 9. GFS Master <ul><li>Manages file namespace operations. </li></ul><ul><li>Manages file meta-data. </li></ul><ul><li>Manages chunks in chunk servers. </li></ul><ul><ul><li>Creation/deletion. </li></ul></ul><ul><ul><li>Placement. </li></ul></ul><ul><ul><li>Load balancing. </li></ul></ul><ul><ul><li>Maintains replication. </li></ul></ul><ul><li>Uses a checkpointed operation log for replication. </li></ul>
  10. 10. Create Operation
  11. 11. Create GFS Master Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  12. 12. Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul>rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  13. 13. Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul><ul><li>choose locations for chunks </li></ul><ul><ul><li>across multiple racks </li></ul></ul><ul><ul><li>across multiple networks </li></ul></ul><ul><ul><li>machines with low contention </li></ul></ul><ul><ul><li>machines with low disk use </li></ul></ul>rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  14. 14. Create <ul><li>GFS Master </li></ul><ul><li>Update operation log </li></ul><ul><li>update metadata </li></ul><ul><li>choose locations for chunks </li></ul>rack 2 rack 1 Returns chunk handle, Chunk locations GFS Client Application GFS Chunk Server GFS Chunk Server …
  15. 15. Namespaces <ul><li>Syntax for file access same as regular file system </li></ul><ul><ul><li>/home/user/foo </li></ul></ul><ul><li>Semantics are different </li></ul><ul><ul><li>No directory structures </li></ul></ul><ul><ul><li>Paths exist for fine-grained locking </li></ul></ul><ul><ul><li>Paths stored using prefix compression </li></ul></ul><ul><li>No symbolic or hard links. </li></ul>
  16. 16. Locking example <ul><li>Write /home/user/foo </li></ul><ul><ul><li>Acquires read locks on /home, /home/user </li></ul></ul><ul><ul><li>Acquires write lock on /home/user/foo </li></ul></ul><ul><li>Delete /home/user/foo </li></ul><ul><ul><li>Acquires read lock on /home, /home/user </li></ul></ul><ul><ul><li>Acquires write lock on /home/user/foo </li></ul></ul><ul><ul><ul><li>Must wait for write to finish. </li></ul></ul></ul>
  17. 17. Locking <ul><li>Design trade off </li></ul><ul><li>Simple design </li></ul><ul><li>Supports concurrent mutations in same directory </li></ul><ul><li>Canonical lock order prevents deadlocks </li></ul>
  18. 18. Read Operation
  19. 19. Read Operation GFS Master filename and chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  20. 20. Read Operation GFS Master chunk handle, server locations GFS Client Application GFS Chunk Server GFS Chunk Server …
  21. 21. Read Operation GFS Master Chunk handle, bit range GFS Client Application GFS Chunk Server GFS Chunk Server …
  22. 22. Read Operation GFS Master Data GFS Client Application GFS Chunk Server GFS Chunk Server …
  23. 23. Write <ul><li>Except without primary, use master </li></ul><ul><li>Explain dataflow, and control flow </li></ul><ul><li>Dataflow pushing optimization </li></ul>
  24. 24. Write Operation
  25. 25. Write GFS Master Chunk id, chunk offset GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  26. 26. Write GFS Master Chunkserver locations (caches this) GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  27. 27. Write GFS Master GFS Chunk Server … data Pass along data to nearest replica GFS Client Application GFS Chunk Server GFS Chunk Server
  28. 28. Write GFS Master Serializes all concurrent writes GFS Chunk Server operation GFS Client Application GFS Chunk Server GFS Chunk Server …
  29. 29. Write GFS Master GFS Chunk Server serialized order of writes GFS Client Application GFS Chunk Server GFS Chunk Server …
  30. 30. Write GFS Master GFS Chunk Server ack ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  31. 31. Write GFS Master GFS Chunk Server ack, chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  32. 32. Write under failure GFS Master GFS Chunk Server ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  33. 33. Write under failure GFS Master GFS Chunk Server retry GFS Client Application GFS Chunk Server GFS Chunk Server …
  34. 34. Leases <ul><li>Master is bottleneck. </li></ul><ul><li>Designates a primary chunk server to handle mutations and serialization. </li></ul>
  35. 35. Write with primary GFS Master Chunk id chunk offset GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  36. 36. Write with primary GFS Master Chunkserver locations (caches this) GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  37. 37. Write with primary GFS Master GFS Chunk Server … data Pass along data to nearest replica GFS Client Application GFS Chunk Server GFS Chunk Server
  38. 38. Write with primary GFS Master Serializes all concurrent writes GFS Chunk Server operation GFS Client Application GFS Chunk Server GFS Chunk Server …
  39. 39. Write with primary GFS Master GFS Chunk Server serialized operations GFS Client Application GFS Chunk Server GFS Chunk Server …
  40. 40. Write with primary GFS Master GFS Chunk Server ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  41. 41. Write with primary GFS Master GFS Chunk Server Ack, chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  42. 42. Failures during writes <ul><li>Chunk boundary overflow </li></ul><ul><li>Replicas going down </li></ul><ul><ul><li>retry </li></ul></ul>
  43. 43. Write with primary <ul><li>Leases etc </li></ul>
  44. 44. Record Append
  45. 45. Record append GFS Master Chunk id GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  46. 46. Record append GFS Master GFS Chunk Server Ack, chunk index from end of file GFS Client Application GFS Chunk Server GFS Chunk Server …
  47. 47. Record Append Operation
  48. 48. Record append <ul><li>Most common mutation. </li></ul><ul><li>Write location determined by GFS. </li></ul><ul><li>Data is atomically appended at least once. </li></ul><ul><li>Append can’t be more than ¼ size of chunk to optimize chunk occupancy. </li></ul>
  49. 49. Consistency Model
  50. 50. Write -Single process Chunk 1 9:Hello Chunk 1’ 9:Hello
  51. 51. Write – Single process Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
  52. 52. Same with any failed mutation Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
  53. 53. Multiple Writers Chunk 1 9:Hello 10:Wor12345 Chunk 1’ 9:Hello 10:Wor12345 Write(“World”,10:0) Write(“12345”,10:3) Consistent and Undefined
  54. 54. Append Chunk 1 9:Hello 10:World Chunk 1’ 9:Hello Append(“World”) Inconsistent and Undefined retry
  55. 55. Append Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
  56. 56. Same for append with multiple writers Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
  57. 57. Consistency model <ul><li>Chunks are not bitwise identical. </li></ul><ul><li>Consistent – all servers agree. </li></ul><ul><li>Defined – Consistent and data as written by one mutation. </li></ul><ul><li>Fine for Map-Reduce. </li></ul><ul><li>Applications can differentiate defined from undefined regions. </li></ul>
  58. 58. Snapshot <ul><li>Snapshot of a file or dir. </li></ul><ul><li>Should be fast, minimal data overhead. </li></ul><ul><li>On a snapshot call: </li></ul><ul><ul><li>Revokes leases. </li></ul></ul><ul><ul><li>Logs the operation. </li></ul></ul><ul><ul><li>Copies meta data and makes new chunks pointing to same data. </li></ul></ul><ul><ul><li>Copy on write is used to create actual chunks. </li></ul></ul>
  59. 59. Delete Operation <ul><li>Meta data operation. </li></ul><ul><ul><li>Renames file to special name. </li></ul></ul><ul><ul><li>After certain time, deletes the actual chunks. </li></ul></ul><ul><li>Supports undelete for limited time. </li></ul><ul><li>Actual lazy garbage collection </li></ul><ul><ul><li>Master deletes meta data </li></ul></ul><ul><ul><li>Piggybacks active chunk list on HeartBeat . </li></ul></ul><ul><ul><li>Chunk servers delete files. </li></ul></ul>
  60. 60. Delete API <ul><li>Design trade off </li></ul><ul><li>Simple design </li></ul><ul><li>Can do when master is free. </li></ul><ul><li>Quick logical deletes. </li></ul><ul><li>Good when failure is common. </li></ul><ul><li>Difficult to tune when storage is tight. But, there are workarounds. </li></ul>
  61. 61. Fault Tolerance for chunks <ul><li>Re-replication – maintains replication factor. </li></ul><ul><li>Rebalancing </li></ul><ul><ul><li>Load balancing </li></ul></ul><ul><ul><li>Disk space usage. </li></ul></ul><ul><li>Data integrity </li></ul><ul><ul><li>Checksum for each chunk divided into 64KB blocks. </li></ul></ul><ul><ul><li>Checksum is checked every time an application reads the data. </li></ul></ul>
  62. 62. Fault tolerance for master <ul><li>Master </li></ul><ul><ul><li>Replication and checkpointing of Operation Log </li></ul></ul><ul><li>Shadow Masters. </li></ul><ul><ul><li>Read Check pointed operation log. </li></ul></ul><ul><ul><li>Doesn’t make meta data changes. </li></ul></ul><ul><ul><li>Reduces load on master. </li></ul></ul><ul><ul><li>Might have stale data. </li></ul></ul>
  63. 63. Fault tolerance for Chunk Server <ul><li>All chunks are versioned. </li></ul><ul><li>Version number updated when a new lease is granted. </li></ul><ul><li>Chunks with old versions are not served and are deleted. </li></ul>
  64. 64. High Availability <ul><li>Fast recovery </li></ul><ul><ul><li>Of Masters and chunk servers. </li></ul></ul><ul><li>HeartBeat messages </li></ul><ul><ul><li>Checking liveness of chunkservers </li></ul></ul><ul><ul><li>Piggybacking GC commands </li></ul></ul><ul><ul><li>Lease renewal </li></ul></ul><ul><li>Diagnostic tools. </li></ul>
  65. 65. Performance metrics.
  66. 66. Conclusions
  67. 67. Conclusions <ul><li>Extremely cheap hardware </li></ul><ul><ul><li>High failure rate </li></ul></ul><ul><li>Highly concurrent reads and writes </li></ul><ul><li>Highly scalable </li></ul><ul><li>Supports undelete (for configurable time) </li></ul>
  68. 68. Conclusions … <ul><li>Built for map-reduce </li></ul><ul><ul><li>Mostly appends and scanning reads </li></ul></ul><ul><ul><li>Mostly large files </li></ul></ul><ul><ul><li>Requires high throughput </li></ul></ul><ul><li>Developers understand the limitations and tune apps to suit GFS. </li></ul>
  69. 69. Thank you?
  70. 70. Design goals <ul><li>Component failures are a norm. </li></ul><ul><li>Files are huge (2GB is common). </li></ul><ul><li>Files are appended. </li></ul><ul><li>Application (map-reduce) and file-system are designed together. </li></ul>

×