Your SlideShare is downloading. ×
GFS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

GFS

3,549

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,549
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
220
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Google File System
    • Suman Karumuri
    • Andy Bartholomew
    • Justin Palmer
  • 2. GFS
    • High performing, scalable, distributed file system.
    • Batch oriented , data-intensive apps.
    • Fault-tolerant.
    • Inexpensive commodity hardware.
  • 3. Design Assumptions
    • Inexpensive commodity hardware.
    • Modest number of large files.
    • Large streaming reads, small random reads. (map-reduce)
    • Mostly appends.
    • Consistent concurrent execution is important.
    • High throughput and low latency.
  • 4. API
    • Open and close
    • Create and delete
    • Read and write
    • Record append
    • Snapshot
  • 5. Architecture
  • 6. Architecture GFS Master GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 7. Files and Chunks
    • Files are divided into 64MB chunks.
    • Each Chunk has globally unique 64-bit handle.
    • Design trade off
    • Optimized for large file sizes for high throughput.
    • Have very few small files.
    • Highly contended small files have large replication factor.
  • 8. GFS Chunk Servers
    • Manage chunks.
    • Tells master what chunks it has
    • Store chunks as files.
    • Commodity Linux machines.
    • Maintain data consistency of chunks.
    • Design trade off
    • Chunk server knows what chunks are good
    • No need to keep Master and Chunk server in sync
  • 9. GFS Master
    • Manages file namespace operations.
    • Manages file meta-data.
    • Manages chunks in chunk servers.
      • Creation/deletion.
      • Placement.
      • Load balancing.
      • Maintains replication.
    • Uses a checkpointed operation log for replication.
  • 10. Create Operation
  • 11. Create GFS Master Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 12. Create
    • GFS Master
    • Update operation log
    • update metadata
    rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 13. Create
    • GFS Master
    • Update operation log
    • update metadata
    • choose locations for chunks
      • across multiple racks
      • across multiple networks
      • machines with low contention
      • machines with low disk use
    rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 14. Create
    • GFS Master
    • Update operation log
    • update metadata
    • choose locations for chunks
    rack 2 rack 1 Returns chunk handle, Chunk locations GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 15. Namespaces
    • Syntax for file access same as regular file system
      • /home/user/foo
    • Semantics are different
      • No directory structures
      • Paths exist for fine-grained locking
      • Paths stored using prefix compression
    • No symbolic or hard links.
  • 16. Locking example
    • Write /home/user/foo
      • Acquires read locks on /home, /home/user
      • Acquires write lock on /home/user/foo
    • Delete /home/user/foo
      • Acquires read lock on /home, /home/user
      • Acquires write lock on /home/user/foo
        • Must wait for write to finish.
  • 17. Locking
    • Design trade off
    • Simple design
    • Supports concurrent mutations in same directory
    • Canonical lock order prevents deadlocks
  • 18. Read Operation
  • 19. Read Operation GFS Master filename and chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 20. Read Operation GFS Master chunk handle, server locations GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 21. Read Operation GFS Master Chunk handle, bit range GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 22. Read Operation GFS Master Data GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 23. Write
    • Except without primary, use master
    • Explain dataflow, and control flow
    • Dataflow pushing optimization
  • 24. Write Operation
  • 25. Write GFS Master Chunk id, chunk offset GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 26. Write GFS Master Chunkserver locations (caches this) GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 27. Write GFS Master GFS Chunk Server … data Pass along data to nearest replica GFS Client Application GFS Chunk Server GFS Chunk Server
  • 28. Write GFS Master Serializes all concurrent writes GFS Chunk Server operation GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 29. Write GFS Master GFS Chunk Server serialized order of writes GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 30. Write GFS Master GFS Chunk Server ack ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 31. Write GFS Master GFS Chunk Server ack, chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 32. Write under failure GFS Master GFS Chunk Server ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 33. Write under failure GFS Master GFS Chunk Server retry GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 34. Leases
    • Master is bottleneck.
    • Designates a primary chunk server to handle mutations and serialization.
  • 35. Write with primary GFS Master Chunk id chunk offset GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 36. Write with primary GFS Master Chunkserver locations (caches this) GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 37. Write with primary GFS Master GFS Chunk Server … data Pass along data to nearest replica GFS Client Application GFS Chunk Server GFS Chunk Server
  • 38. Write with primary GFS Master Serializes all concurrent writes GFS Chunk Server operation GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 39. Write with primary GFS Master GFS Chunk Server serialized operations GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 40. Write with primary GFS Master GFS Chunk Server ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 41. Write with primary GFS Master GFS Chunk Server Ack, chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 42. Failures during writes
    • Chunk boundary overflow
    • Replicas going down
      • retry
  • 43. Write with primary
    • Leases etc
  • 44. Record Append
  • 45. Record append GFS Master Chunk id GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 46. Record append GFS Master GFS Chunk Server Ack, chunk index from end of file GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 47. Record Append Operation
  • 48. Record append
    • Most common mutation.
    • Write location determined by GFS.
    • Data is atomically appended at least once.
    • Append can’t be more than ¼ size of chunk to optimize chunk occupancy.
  • 49. Consistency Model
  • 50. Write -Single process Chunk 1 9:Hello Chunk 1’ 9:Hello
  • 51. Write – Single process Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
  • 52. Same with any failed mutation Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
  • 53. Multiple Writers Chunk 1 9:Hello 10:Wor12345 Chunk 1’ 9:Hello 10:Wor12345 Write(“World”,10:0) Write(“12345”,10:3) Consistent and Undefined
  • 54. Append Chunk 1 9:Hello 10:World Chunk 1’ 9:Hello Append(“World”) Inconsistent and Undefined retry
  • 55. Append Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
  • 56. Same for append with multiple writers Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
  • 57. Consistency model
    • Chunks are not bitwise identical.
    • Consistent – all servers agree.
    • Defined – Consistent and data as written by one mutation.
    • Fine for Map-Reduce.
    • Applications can differentiate defined from undefined regions.
  • 58. Snapshot
    • Snapshot of a file or dir.
    • Should be fast, minimal data overhead.
    • On a snapshot call:
      • Revokes leases.
      • Logs the operation.
      • Copies meta data and makes new chunks pointing to same data.
      • Copy on write is used to create actual chunks.
  • 59. Delete Operation
    • Meta data operation.
      • Renames file to special name.
      • After certain time, deletes the actual chunks.
    • Supports undelete for limited time.
    • Actual lazy garbage collection
      • Master deletes meta data
      • Piggybacks active chunk list on HeartBeat .
      • Chunk servers delete files.
  • 60. Delete API
    • Design trade off
    • Simple design
    • Can do when master is free.
    • Quick logical deletes.
    • Good when failure is common.
    • Difficult to tune when storage is tight. But, there are workarounds.
  • 61. Fault Tolerance for chunks
    • Re-replication – maintains replication factor.
    • Rebalancing
      • Load balancing
      • Disk space usage.
    • Data integrity
      • Checksum for each chunk divided into 64KB blocks.
      • Checksum is checked every time an application reads the data.
  • 62. Fault tolerance for master
    • Master
      • Replication and checkpointing of Operation Log
    • Shadow Masters.
      • Read Check pointed operation log.
      • Doesn’t make meta data changes.
      • Reduces load on master.
      • Might have stale data.
  • 63. Fault tolerance for Chunk Server
    • All chunks are versioned.
    • Version number updated when a new lease is granted.
    • Chunks with old versions are not served and are deleted.
  • 64. High Availability
    • Fast recovery
      • Of Masters and chunk servers.
    • HeartBeat messages
      • Checking liveness of chunkservers
      • Piggybacking GC commands
      • Lease renewal
    • Diagnostic tools.
  • 65. Performance metrics.
  • 66. Conclusions
  • 67. Conclusions
    • Extremely cheap hardware
      • High failure rate
    • Highly concurrent reads and writes
    • Highly scalable
    • Supports undelete (for configurable time)
  • 68. Conclusions …
    • Built for map-reduce
      • Mostly appends and scanning reads
      • Mostly large files
      • Requires high throughput
    • Developers understand the limitations and tune apps to suit GFS.
  • 69. Thank you?
  • 70. Design goals
    • Component failures are a norm.
    • Files are huge (2GB is common).
    • Files are appended.
    • Application (map-reduce) and file-system are designed together.

×