Performance and
Availability Tradeoffs in
Replicated File Systems
            Peter Honeyman
 Center for Information Techn...
Acknowledgements

• Joint work with Dr. Jiaying Zhang
 • Now at Google
 • This was a chapter of her dissertation
• Partial...
Storage replication

• Advantages ☺
 • Scalability
 • Reliability
 • Read performance
Storage replication

• Disadvantages ☹
 • Complex synchronization protocols
   • Concurrency
   • Durability
 • Write perf...
Durability

• If we weaken the durability guarantee, we
    may lose data ...
•   And be forced to restart the computation...
Utilization tradeoffs

• Adding replication servers enhances durability
 • Reduces the risk that computation must be
   re...
Placement tradeoffs

• Nearby replication servers reduce the
  replication penalty
 • Increases utilization ☺
• Nearby rep...
Run-time model

            recover
        fail
                ok

           fail
start        run      end
Parameters


• Failure free, single server run time
 • Can be estimated or measured
 • Our focus is on 1 to 10 days
Parameters

• Replication overhead
 • Penalty associated with replication to
   backup servers
 • Proportional to RTT
 • R...
Parameters

• Recovery time
 • Time to detect failure of the primary
   server and switch to a backup server
 • Not a sens...
Parameters


• Probability distribution functions
 • Server failure
 • Successful recovery
Server failure

• Estimated by analyzing PlanetLab ping data
 • 716 nodes, 349 sites, 25 countries
 • All-pairs, 15 minute...
PlanetLab failure




cumulative failure: log-linear scale
Correlated failures
                  failed
                  nodes
 nodes per site               2      3       4       ...
0.25
                                      Correlated failures
Average Failure Correlations




                          ...
Run-time model
• Discrete event simulation for expected run
  time and utilization

                     recover
         ...
Simulation results
                   one hour         no replication: utilization = .995

                               ...
Simulation results
                   one day         no replication: utilization = .934

                                ...
Simulation results
                    ten days      no replication: utilization = .668




           RTT                ...
Simulation discussion

• Replication improves utilization for long-
  running jobs
• Multiple backup servers do not improv...
Simulation discussion

• Distant backup servers improve utilization
  for light writers
• Distant backup servers do not im...
Checkpoint interval

                                       calculated on the
                                       back ...
Work in progress
• Realistic failure data
 • Storage and processor failure
 • PDSI failure data repository
• Realistic che...
Conclusions

• Conventional wisdom holds that
          consistent mutable replication
        in large-scale distributed ...
Conclusions
• Consistent replication in large-scale
  distributed storage systems is
               feasible and practical...
Thank you for your attention!
                www.citi.umich.edu




   Questions?
Upcoming SlideShare
Loading in …5
×

Performance and Availability Tradeoffs in Replicated File Systems

861 views

Published on

Presented at Resilience 2008 : Workshop on Resiliency in High Performance Computing (Lyon, May 22, 2008)

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
861
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Performance and Availability Tradeoffs in Replicated File Systems

  1. 1. Performance and Availability Tradeoffs in Replicated File Systems Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor
  2. 2. Acknowledgements • Joint work with Dr. Jiaying Zhang • Now at Google • This was a chapter of her dissertation • Partially supported by • NSF/NMI GridNFS • DOE/SciDAC Petascale Data Storage Institute • NetApp • IBM ARC
  3. 3. Storage replication • Advantages ☺ • Scalability • Reliability • Read performance
  4. 4. Storage replication • Disadvantages ☹ • Complex synchronization protocols • Concurrency • Durability • Write performance
  5. 5. Durability • If we weaken the durability guarantee, we may lose data ... • And be forced to restart the computation • But it might be worth it
  6. 6. Utilization tradeoffs • Adding replication servers enhances durability • Reduces the risk that computation must be restarted • Increases utilization ☺ • Replication increases run time • Reduces utilization ☹
  7. 7. Placement tradeoffs • Nearby replication servers reduce the replication penalty • Increases utilization ☺ • Nearby replication servers are vulnerable to correlated failure • Reduces utilization ☹
  8. 8. Run-time model recover fail ok fail start run end
  9. 9. Parameters • Failure free, single server run time • Can be estimated or measured • Our focus is on 1 to 10 days
  10. 10. Parameters • Replication overhead • Penalty associated with replication to backup servers • Proportional to RTT • Ratio can be measured by running with a backup server a few msec away
  11. 11. Parameters • Recovery time • Time to detect failure of the primary server and switch to a backup server • Not a sensitive parameter
  12. 12. Parameters • Probability distribution functions • Server failure • Successful recovery
  13. 13. Server failure • Estimated by analyzing PlanetLab ping data • 716 nodes, 349 sites, 25 countries • All-pairs, 15 minute interval, 1/04 to 6/05 • 692 nodes were alive throughout • We ascribe missing pings to node failure and network partition
  14. 14. PlanetLab failure cumulative failure: log-linear scale
  15. 15. Correlated failures failed nodes nodes per site 2 3 4 5 2 0.526 0.593 0.552 0.561 3 0.546 0.440 0.538 4 0.378 0.488 5 0.488 number of sites 259 65 21 11 P(n nodes down | 1 node down)
  16. 16. 0.25 Correlated failures Average Failure Correlations 0.20 0.15 0.10 0.05 0 25 75 125 175 RTT (ms) nodes slope y-intercept 2 -2.4 x 10-4 0.195 3 -2.3 x 10-4 0.155 4 -2.3 x 10-4 0.134 5 -2.4 x 10-4 0.119
  17. 17. Run-time model • Discrete event simulation for expected run time and utilization recover fail ok fail start run end
  18. 18. Simulation results one hour no replication: utilization = .995 write intensity 0.0001 0.001 0.01 RTT 0.1 1.0 1.0 0.8 0.8 0.6 0.6 RTT RTT One backup Four backups
  19. 19. Simulation results one day no replication: utilization = .934 write intensity 0.0001 0.001 0.01 RTT 0.1 1.0 1.0 0.8 0.8 0.6 0.6 RTT RTT One backup Four backups
  20. 20. Simulation results ten days no replication: utilization = .668 RTT RTT 1.00 1.00 0.75 0.75 0.50 0.50 RTT RTT One backup Four backups
  21. 21. Simulation discussion • Replication improves utilization for long- running jobs • Multiple backup servers do not improve utilization (due to low PlanetLab failure rates)
  22. 22. Simulation discussion • Distant backup servers improve utilization for light writers • Distant backup servers do not improve utilization for heavy writers • Implications for checkpoint interval …
  23. 23. Checkpoint interval calculated on the back of a napkin one day, 20% checkpoint overhead 10 day, 2% checkpoint overhead 10 day, 2% checkpoint overhead one backup server four backup servers
  24. 24. Work in progress • Realistic failure data • Storage and processor failure • PDSI failure data repository • Realistic checkpoint costs — help! • Realistic replication overhead • Depends on amount of computation • Less than 10% for NAS Grid Benchmarks
  25. 25. Conclusions • Conventional wisdom holds that consistent mutable replication in large-scale distributed systems is too expensive to consider • Our study suggests otherwise
  26. 26. Conclusions • Consistent replication in large-scale distributed storage systems is feasible and practical • Superior performance • Rigorous adherence to conventional file system semantics • Improved utilization
  27. 27. Thank you for your attention! www.citi.umich.edu Questions?

×