Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Lifecycle of a Gluster Volume
Shreyas Siravara
Production Engineer
Automating GlusterFS @ Facebook
Stages of a Gluster Volume
1. Creation
2. Maintenance
• Software Upgrades
• Hardware Repairs
3. Decommission
Creation
• Homogenous hardware
•Bricks are the same size
•Exact same CPU, memory configuration
• Easy to debug problems
Va...
Creation
Layout Management
• Rack failure resilient layout
• Spread replicas across racks
• Automate entire process to avo...
Maintenance
Hardware Repair
• What happens if a brick needs repair?
• Some manual effort for physical repairs
• This is do...
Maintenance
Hardware Repair
• What happens if the root drive is replaced?
• Fresh OS install
• Automated “restore” flow
• ...
Maintenance
Software Upgrades: Goals
• Goals:
• Push quickly and safely
• Avoid quorum loss & split-brains
• The customer ...
Maintenance
Software Upgrades: Batching
• Create batches based on layout
• Every rack becomes a “batch”
• Batches are sche...
Maintenance
Software Upgrades: Host Procedure
• Single Host Procedure:
1. Check for quorum margin
2. Wait for pending heal...
Maintenance
Software Upgrades: Volume Procedure
• Volume Procedure:
• Upgrade every host in the batch
• Health-check
• Run...
Maintenance
Software Upgrades: Advantages & Potential Improvements
• Advantages:
• Maintain quorum
• Clients don’t need to...
Decommission
Requirements & Challenges
• Requirement:
• Replace 100% of the hardware in a Gluster volume
• Challenges:
• V...
Decommission
Simple Strategy: Replace-brick
• Replace bricks one-replica at a time, wait for rebuilds
• Use gluster volume...
Decommission
Improved Strategy: “Block” copy + Replace-brick
xfsdump
Source Brick Dest Brick
gluster volume replace-brick
...
Decommission
Improved Strategy: “Block” copy + Replace-brick
• Advantages:
• 100s of MB/s to run the first copy
• Self-hea...
Final Thoughts
• Layout is important
• Data unavailability can be avoided
• Decompose into host-level & volume-level proce...
Automating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas Siravara
Upcoming SlideShare
Loading in …5
×

Automating Gluster @ Facebook - Shreyas Siravara

1,012 views

Published on

Gluster Summit 2017

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Automating Gluster @ Facebook - Shreyas Siravara

  1. 1. Lifecycle of a Gluster Volume Shreyas Siravara Production Engineer Automating GlusterFS @ Facebook
  2. 2. Stages of a Gluster Volume 1. Creation 2. Maintenance • Software Upgrades • Hardware Repairs 3. Decommission
  3. 3. Creation • Homogenous hardware •Bricks are the same size •Exact same CPU, memory configuration • Easy to debug problems Validate Hardware
  4. 4. Creation Layout Management • Rack failure resilient layout • Spread replicas across racks • Automate entire process to avoid human error • Layout of replicas supports large-scale maintenance • Avoid data unavailability
  5. 5. Maintenance Hardware Repair • What happens if a brick needs repair? • Some manual effort for physical repairs • This is done with the local gluster daemons not running • What happens if a brick comes back empty? • Multiple replaced drives in a RAID • SHD automatically “discovers” that the brick is empty & heals it
  6. 6. Maintenance Hardware Repair • What happens if the root drive is replaced? • Fresh OS install • Automated “restore” flow • Facebook automation installs the OS • Install Gluster • Restore the nodes prior UUID & restore the peer list • SHD cleans up the pending heals
  7. 7. Maintenance Software Upgrades: Goals • Goals: • Push quickly and safely • Avoid quorum loss & split-brains • The customer should not know we’re doing a push • Halt the push if we find something critical • Code changes should not result in incompatibility between servers & clients
  8. 8. Maintenance Software Upgrades: Batching • Create batches based on layout • Every rack becomes a “batch” • Batches are scheduled serially • Concurrency within the batch Batch 1 Rack 1 Brick 1 Brick 4 Brick 7 Batch 2 Rack 2 Brick 2 Brick 5 Brick 8 Batch 3 Rack 3 Brick 3 Brick 6 Brick 9
  9. 9. Maintenance Software Upgrades: Host Procedure • Single Host Procedure: 1. Check for quorum margin 2. Wait for pending heals to drop 3. Stop Gluster & install the new version 4. Start Gluster
  10. 10. Maintenance Software Upgrades: Volume Procedure • Volume Procedure: • Upgrade every host in the batch • Health-check • Run the next batch Batch 1 Rack 1 Brick 1 Brick 4 Brick 7 Batch 2 Rack 2 Brick 2 Brick 5 Brick 8 Batch 3 Rack 3 Brick 3 Brick 6 Brick 9 Pending Upgraded
  11. 11. Maintenance Software Upgrades: Advantages & Potential Improvements • Advantages: • Maintain quorum • Clients don’t need to know that a volume is being upgraded • We should: • Correctly drain traffic when we stop Gluster daemons • Stop listening for new requests • Complete outstanding I/O
  12. 12. Decommission Requirements & Challenges • Requirement: • Replace 100% of the hardware in a Gluster volume • Challenges: • Volume size • Data Integrity • No customer impact • SLA: No errors, low latency
  13. 13. Decommission Simple Strategy: Replace-brick • Replace bricks one-replica at a time, wait for rebuilds • Use gluster volume replace-brick • Good for smaller volumes, with low numbers of files • Scales poorly with 10s of millions of files per brick • Self-heal daemon is not yet fast enough • Even with multi-threaded SHD
  14. 14. Decommission Improved Strategy: “Block” copy + Replace-brick xfsdump Source Brick Dest Brick gluster volume replace-brick Source Brick Dest Brick
  15. 15. Decommission Improved Strategy: “Block” copy + Replace-brick • Advantages: • 100s of MB/s to run the first copy • Self-heal daemon just has to “top-up” the node • Heals only the data that changed while the node was offline • Easy to automate • Predictable, fixed procedure
  16. 16. Final Thoughts • Layout is important • Data unavailability can be avoided • Decompose into host-level & volume-level procedures • Keep the procedures simple & predictable • Avoid overly-complex automation with many edge-cases

×