SlideShare a Scribd company logo
1 of 27
Performance and
Availability Tradeoffs in
Replicated File Systems
            Peter Honeyman
 Center for Information Technology Integration
      University of Michigan, Ann Arbor
Acknowledgements

• Joint work with Dr. Jiaying Zhang
 • Now at Google
 • This was a chapter of her dissertation
• Partially supported by
 • NSF/NMI GridNFS
 • DOE/SciDAC Petascale Data Storage Institute
 • NetApp
 • IBM ARC
Storage replication

• Advantages ☺
 • Scalability
 • Reliability
 • Read performance
Storage replication

• Disadvantages ☹
 • Complex synchronization protocols
   • Concurrency
   • Durability
 • Write performance
Durability

• If we weaken the durability guarantee, we
    may lose data ...
•   And be forced to restart the computation
•   But it might be worth it
Utilization tradeoffs

• Adding replication servers enhances durability
 • Reduces the risk that computation must be
   restarted
 • Increases utilization ☺
• Replication increases run time
 • Reduces utilization ☹
Placement tradeoffs

• Nearby replication servers reduce the
  replication penalty
 • Increases utilization ☺
• Nearby replication servers are vulnerable
  to correlated failure
 • Reduces utilization ☹
Run-time model

            recover
        fail
                ok

           fail
start        run      end
Parameters


• Failure free, single server run time
 • Can be estimated or measured
 • Our focus is on 1 to 10 days
Parameters

• Replication overhead
 • Penalty associated with replication to
   backup servers
 • Proportional to RTT
 • Ratio can be measured by running with a
    backup server a few msec away
Parameters

• Recovery time
 • Time to detect failure of the primary
   server and switch to a backup server
 • Not a sensitive parameter
Parameters


• Probability distribution functions
 • Server failure
 • Successful recovery
Server failure

• Estimated by analyzing PlanetLab ping data
 • 716 nodes, 349 sites, 25 countries
 • All-pairs, 15 minute interval, 1/04 to 6/05
 • 692 nodes were alive throughout
• We ascribe missing pings to node failure
  and network partition
PlanetLab failure




cumulative failure: log-linear scale
Correlated failures
                  failed
                  nodes
 nodes per site               2      3       4       5
                    2        0.526   0.593   0.552   0.561
                    3                0.546   0.440   0.538
                    4                        0.378   0.488
                    5                                0.488
number of sites              259     65      21      11

                           P(n nodes down | 1 node down)
0.25
                                      Correlated failures
Average Failure Correlations




                               0.20

                               0.15

                               0.10

                               0.05

                                 0
                                      25            75                     125   175
                                                             RTT (ms)

                                           nodes     slope       y-intercept
                                            2      -2.4 x 10-4     0.195
                                            3      -2.3 x 10-4     0.155
                                            4      -2.3 x 10-4     0.134
                                            5      -2.4 x 10-4     0.119
Run-time model
• Discrete event simulation for expected run
  time and utilization

                     recover
              fail          ok

                     fail
     start             run        end
Simulation results
                   one hour         no replication: utilization = .995

                                          write intensity
                                                        0.0001
                                                          0.001
                                                           0.01
          RTT                                                0.1

1.0                           1.0

0.8                           0.8

0.6                           0.6
          RTT                                    RTT


      One backup                     Four backups
Simulation results
                   one day         no replication: utilization = .934

                                         write intensity
                                                       0.0001
                                                         0.001
                                                          0.01
          RTT                                               0.1

1.0                          1.0

0.8                          0.8

0.6                          0.6
          RTT                                   RTT


      One backup                    Four backups
Simulation results
                    ten days      no replication: utilization = .668




           RTT                                 RTT

1.00                       1.00

0.75                       0.75

0.50                       0.50
           RTT                                 RTT


       One backup                 Four backups
Simulation discussion

• Replication improves utilization for long-
  running jobs
• Multiple backup servers do not improve
  utilization (due to low PlanetLab failure
  rates)
Simulation discussion

• Distant backup servers improve utilization
  for light writers
• Distant backup servers do not improve
  utilization for heavy writers
• Implications for checkpoint interval …
Checkpoint interval

                                       calculated on the
                                       back of a napkin
 one day, 20% checkpoint overhead




  10 day, 2% checkpoint overhead      10 day, 2% checkpoint overhead


one backup server                   four backup servers
Work in progress
• Realistic failure data
 • Storage and processor failure
 • PDSI failure data repository
• Realistic checkpoint costs — help!
• Realistic replication overhead
 • Depends on amount of computation
 • Less than 10% for NAS Grid Benchmarks
Conclusions

• Conventional wisdom holds that
          consistent mutable replication
        in large-scale distributed systems
           is too expensive to consider
• Our study suggests otherwise
Conclusions
• Consistent replication in large-scale
  distributed storage systems is
               feasible and practical
• Superior performance
• Rigorous adherence to conventional file
  system semantics
• Improved utilization
Thank you for your attention!
                www.citi.umich.edu




   Questions?

More Related Content

Similar to Performance and Availability Tradeoffs in Replicated File Systems

Resilience at exascale
Resilience at exascaleResilience at exascale
Resilience at exascaleMarc Snir
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore schedulingNicolas Navet
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented DesignRodrigo Campos
 
ClusterPresentation
ClusterPresentationClusterPresentation
ClusterPresentationWill Dixon
 
ioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionMasahito Zembutsu
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...DataWorks Summit
 
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010Atlassian
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudRose Toomey
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfRichHagarty
 
Structure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performanceStructure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performanceAtlassian
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudCeph Community
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 

Similar to Performance and Availability Tradeoffs in Replicated File Systems (20)

Resilience at exascale
Resilience at exascaleResilience at exascale
Resilience at exascale
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore scheduling
 
Multicore scheduling in automotive ECUs
Multicore scheduling in automotive ECUsMulticore scheduling in automotive ECUs
Multicore scheduling in automotive ECUs
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
ClusterPresentation
ClusterPresentationClusterPresentation
ClusterPresentation
 
No stress with state
No stress with stateNo stress with state
No stress with state
 
ioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distribution
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
 
Database Health Check
Database Health CheckDatabase Health Check
Database Health Check
 
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdf
 
Structure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performanceStructure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performance
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Performance and Availability Tradeoffs in Replicated File Systems

  • 1. Performance and Availability Tradeoffs in Replicated File Systems Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor
  • 2. Acknowledgements • Joint work with Dr. Jiaying Zhang • Now at Google • This was a chapter of her dissertation • Partially supported by • NSF/NMI GridNFS • DOE/SciDAC Petascale Data Storage Institute • NetApp • IBM ARC
  • 3. Storage replication • Advantages ☺ • Scalability • Reliability • Read performance
  • 4. Storage replication • Disadvantages ☹ • Complex synchronization protocols • Concurrency • Durability • Write performance
  • 5. Durability • If we weaken the durability guarantee, we may lose data ... • And be forced to restart the computation • But it might be worth it
  • 6. Utilization tradeoffs • Adding replication servers enhances durability • Reduces the risk that computation must be restarted • Increases utilization ☺ • Replication increases run time • Reduces utilization ☹
  • 7. Placement tradeoffs • Nearby replication servers reduce the replication penalty • Increases utilization ☺ • Nearby replication servers are vulnerable to correlated failure • Reduces utilization ☹
  • 8. Run-time model recover fail ok fail start run end
  • 9. Parameters • Failure free, single server run time • Can be estimated or measured • Our focus is on 1 to 10 days
  • 10. Parameters • Replication overhead • Penalty associated with replication to backup servers • Proportional to RTT • Ratio can be measured by running with a backup server a few msec away
  • 11. Parameters • Recovery time • Time to detect failure of the primary server and switch to a backup server • Not a sensitive parameter
  • 12. Parameters • Probability distribution functions • Server failure • Successful recovery
  • 13. Server failure • Estimated by analyzing PlanetLab ping data • 716 nodes, 349 sites, 25 countries • All-pairs, 15 minute interval, 1/04 to 6/05 • 692 nodes were alive throughout • We ascribe missing pings to node failure and network partition
  • 15. Correlated failures failed nodes nodes per site 2 3 4 5 2 0.526 0.593 0.552 0.561 3 0.546 0.440 0.538 4 0.378 0.488 5 0.488 number of sites 259 65 21 11 P(n nodes down | 1 node down)
  • 16. 0.25 Correlated failures Average Failure Correlations 0.20 0.15 0.10 0.05 0 25 75 125 175 RTT (ms) nodes slope y-intercept 2 -2.4 x 10-4 0.195 3 -2.3 x 10-4 0.155 4 -2.3 x 10-4 0.134 5 -2.4 x 10-4 0.119
  • 17. Run-time model • Discrete event simulation for expected run time and utilization recover fail ok fail start run end
  • 18. Simulation results one hour no replication: utilization = .995 write intensity 0.0001 0.001 0.01 RTT 0.1 1.0 1.0 0.8 0.8 0.6 0.6 RTT RTT One backup Four backups
  • 19. Simulation results one day no replication: utilization = .934 write intensity 0.0001 0.001 0.01 RTT 0.1 1.0 1.0 0.8 0.8 0.6 0.6 RTT RTT One backup Four backups
  • 20. Simulation results ten days no replication: utilization = .668 RTT RTT 1.00 1.00 0.75 0.75 0.50 0.50 RTT RTT One backup Four backups
  • 21. Simulation discussion • Replication improves utilization for long- running jobs • Multiple backup servers do not improve utilization (due to low PlanetLab failure rates)
  • 22. Simulation discussion • Distant backup servers improve utilization for light writers • Distant backup servers do not improve utilization for heavy writers • Implications for checkpoint interval …
  • 23. Checkpoint interval calculated on the back of a napkin one day, 20% checkpoint overhead 10 day, 2% checkpoint overhead 10 day, 2% checkpoint overhead one backup server four backup servers
  • 24. Work in progress • Realistic failure data • Storage and processor failure • PDSI failure data repository • Realistic checkpoint costs — help! • Realistic replication overhead • Depends on amount of computation • Less than 10% for NAS Grid Benchmarks
  • 25. Conclusions • Conventional wisdom holds that consistent mutable replication in large-scale distributed systems is too expensive to consider • Our study suggests otherwise
  • 26. Conclusions • Consistent replication in large-scale distributed storage systems is feasible and practical • Superior performance • Rigorous adherence to conventional file system semantics • Improved utilization
  • 27. Thank you for your attention! www.citi.umich.edu Questions?