SlideShare a Scribd company logo
1 of 19
Availability Analysis for Deployment
of In-Cloud Applications
Xiwei Xu, Qinghua Lu, Liming Zhu, Jim (Zhanwen) Li
Sherif Sakr, Hiroshi Wada, Ingo Weber
Software Systems Research Group, NICTA
ISARCS13, Vancouver
Slides at: http://www.slideshare.net/LimingZhu/
NICTA Copyright 2010 From imagination to impact 2
Motivation
• Uncertainties in Cloud are challenging for architecting
critical applications and understanding availability
– Shared resources, weak SLA guarantees and limited visibility
– Rare but high consequence events
– Sporadic activities: upgrade, backup, recovery…
– Subjective uncertainties: impact of configuration choices
• We want to explicitly model the above uncertainties in
application availability analysis of cloud deployment.
– from a cloud consumer perspective
– focusing on mechanisms most relevant to critical
applications: auto-scaling, over-
provisioning, backup, recovery and maintenance.
NICTA Copyright 2010 From imagination to impact 3
Contributions
• SRN(Stochastic Reward Net)-based availability models
• which allow you to specify:
– Deployment architecture (application placements in VM)
– Node/Aggregation level SLAs from infrastructure providers
– Auto-scaling policies and recovery strategies
– Rare events: availability zone or region down
• which give you application availability levels of different options
under different scenarios
• Model evaluation by analysing existing industry best
practices in cloud application deployment
– Quantifying the rule-of-thumb best practices
– Comparing different (best) practices
NICTA Copyright 2010 From imagination to impact 4
Deployment Architecture Assumption
– Stateless VMs: auto-scaling groups
– Stateful VMs: hot standbys
– Backup at separate region for recovery
NICTA Copyright 2010 From imagination to impact 5
Availability Analysis Overview
• SRN-based Models
• Architecture model and recovery model in this paper
• One SRN architecture model per availability zone
NICTA Copyright 2010 From imagination to impact 6
Availability Analysis Overview
• Deployment decisions and patterns
– stateless/stateful application placement within VMs
– auto-scaling policies
– multi-zone configurations
NICTA Copyright 2010 From imagination to impact 7
Availability Analysis Overview
• SLA from the cloud providers
• Node level (Rackspace) or zone level (Amazon)
NICTA Copyright 2010 From imagination to impact 8
Availability Analysis Overview
• Recovery strategy
• Auto-regeneration of stateless VMs and different
recovery mechanisms for stateful VMs
• Different Recovery-Time/Point-Objective (RTO/RPO)
NICTA Copyright 2010 From imagination to impact 9
Availability Analysis Overview
• Application-specific data
– Stateless VM start-up time…
– Stateful VM replication…
NICTA Copyright 2010 From imagination to impact 10
Stochastic Reward Net
• Stochastic Reward Net (SRN)
– Stochastic Petri Net variant
– Firing delays
– Reward function
• Constructs
• Places: VM states
(Full, Running, Stoped, Failed )
• Token: VMs
• Transition
• Guard function
• Transition rate: 1) frequency of
events, 2) delay before the
transition fires
• Reward Function:
if((#Running1>0) 1 else 0
NICTA Copyright 2010 From imagination to impact 11
SRN-based Availability Models
NICTA Copyright 2010 From imagination to impact 12
Availability Models: Auto-scaling
NICTA Copyright 2010 From imagination to impact 13
Availability Models: Auto-scaling
gScaleSelf1:
if(#Running1<=#Running2 && #Stopped1>0) 1 else 0
gScaleOther1:
if(#Running1>#Running2 && #Stopped2>0) 1 else 0
NICTA Copyright 2010 From imagination to impact 14
Availability Models: Stateful VM
NICTA Copyright 2010 From imagination to impact 15
Availability Models—Disaster Recovery
• Availability zone life cycle
– Interact with the big
architecture model
• Stateless VM recovery
– Backup/AMI
• Stateful VM recovery
– Backup
– Replica
– Hot standby
NICTA Copyright 2010 From imagination to impact 16
Case 1: Multi-zone Deployment
• Parameters
– Amazon EC2 SLA of 99.95% availability
– Zone fail rate: 0.00011, MTTR: 4.38 hours per year
– Application specific measurement of transitions
0.01% = 52.56 mins downtime per year
0.4% diff = 35 hours
0.76% diff = 66 hours
NICTA Copyright 2010 From imagination to impact 17
Case 2: Recovery across Availability Zone
• Industry rule of thumb: ―Target auto-scale 30-60% until you have
50% headroom for load spikes. Lose an AZ leads to 90% utilisation.‖
• Impact on overall availability?
• 30-60% vs. traditional 70-90%?
• over-provisioning vs. auto-scaling?
0.29% diff = 25 hours
NICTA Copyright 2010 From imagination to impact 18
Case 3: Disaster Recovery across Regions
• Trade-off between RPO and RTO
• RPO: Recovery Point Objective
• RTO: Recovery Time Objective
Yuruware — http://www.yuruware.com/
0.2% diff = 17 hours
NICTA Copyright 2010 From imagination to impact
Conclusion and Future Work
• SRN-based availability models
– Application-level availability
– Highly configurable for different deployment architectures
– Model different uncertainties and scenarios for critical systems
– Quantify and compare choices and enable what-if analysis
– Evaluated using industry best practices
• Future work
– Better evaluation!
– Integrated models on impact of upgrade, live migration, backup and
subjective uncertainties (in IEEE Cloud 13)
Q. Lu, X. Xu, L. Zhu, L. Bass, et al., "Incorporating Uncertainty into in-Cloud Application
Deployment Decisions for Availability," in IEEE Cloud 2013
Liming.Zhu@nicta.com.au
Slides available at http://www.slideshare.net/LimingZhu/
19

More Related Content

More from Liming Zhu

Trends & Innovation in Cyber and Digitaltech
Trends & Innovationin Cyber and DigitaltechTrends & Innovationin Cyber and Digitaltech
Trends & Innovation in Cyber and DigitaltechLiming Zhu
 
Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Liming Zhu
 
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AIICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AILiming Zhu
 
International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...Liming Zhu
 
RegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsRegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsLiming Zhu
 
Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Liming Zhu
 
Responsible AI The Australian Approach
Responsible AIThe Australian ApproachResponsible AIThe Australian Approach
Responsible AI The Australian ApproachLiming Zhu
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsLiming Zhu
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingLiming Zhu
 
Cyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsCyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsLiming Zhu
 
Emerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinEmerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinLiming Zhu
 
Responsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksResponsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksLiming Zhu
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...Liming Zhu
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Liming Zhu
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Liming Zhu
 
Dependable Operations
Dependable OperationsDependable Operations
Dependable OperationsLiming Zhu
 
Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Liming Zhu
 
Cloud API Issues: an Empirical Study and Impact
Cloud API Issues: an Empirical Study and ImpactCloud API Issues: an Empirical Study and Impact
Cloud API Issues: an Empirical Study and ImpactLiming Zhu
 

More from Liming Zhu (18)

Trends & Innovation in Cyber and Digitaltech
Trends & Innovationin Cyber and DigitaltechTrends & Innovationin Cyber and Digitaltech
Trends & Innovation in Cyber and Digitaltech
 
Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models
 
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AIICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
 
International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...
 
RegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsRegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and Lessons
 
Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61
 
Responsible AI The Australian Approach
Responsible AIThe Australian ApproachResponsible AIThe Australian Approach
Responsible AI The Australian Approach
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based Systems
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of Everything
 
Cyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsCyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and Solutions
 
Emerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinEmerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital Twin
 
Responsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksResponsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risks
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...
 
Dependable Operations
Dependable OperationsDependable Operations
Dependable Operations
 
Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability
 
Cloud API Issues: an Empirical Study and Impact
Cloud API Issues: an Empirical Study and ImpactCloud API Issues: an Empirical Study and Impact
Cloud API Issues: an Empirical Study and Impact
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Availability Analysis for Deployment of In-Cloud Applications

  • 1. Availability Analysis for Deployment of In-Cloud Applications Xiwei Xu, Qinghua Lu, Liming Zhu, Jim (Zhanwen) Li Sherif Sakr, Hiroshi Wada, Ingo Weber Software Systems Research Group, NICTA ISARCS13, Vancouver Slides at: http://www.slideshare.net/LimingZhu/
  • 2. NICTA Copyright 2010 From imagination to impact 2 Motivation • Uncertainties in Cloud are challenging for architecting critical applications and understanding availability – Shared resources, weak SLA guarantees and limited visibility – Rare but high consequence events – Sporadic activities: upgrade, backup, recovery… – Subjective uncertainties: impact of configuration choices • We want to explicitly model the above uncertainties in application availability analysis of cloud deployment. – from a cloud consumer perspective – focusing on mechanisms most relevant to critical applications: auto-scaling, over- provisioning, backup, recovery and maintenance.
  • 3. NICTA Copyright 2010 From imagination to impact 3 Contributions • SRN(Stochastic Reward Net)-based availability models • which allow you to specify: – Deployment architecture (application placements in VM) – Node/Aggregation level SLAs from infrastructure providers – Auto-scaling policies and recovery strategies – Rare events: availability zone or region down • which give you application availability levels of different options under different scenarios • Model evaluation by analysing existing industry best practices in cloud application deployment – Quantifying the rule-of-thumb best practices – Comparing different (best) practices
  • 4. NICTA Copyright 2010 From imagination to impact 4 Deployment Architecture Assumption – Stateless VMs: auto-scaling groups – Stateful VMs: hot standbys – Backup at separate region for recovery
  • 5. NICTA Copyright 2010 From imagination to impact 5 Availability Analysis Overview • SRN-based Models • Architecture model and recovery model in this paper • One SRN architecture model per availability zone
  • 6. NICTA Copyright 2010 From imagination to impact 6 Availability Analysis Overview • Deployment decisions and patterns – stateless/stateful application placement within VMs – auto-scaling policies – multi-zone configurations
  • 7. NICTA Copyright 2010 From imagination to impact 7 Availability Analysis Overview • SLA from the cloud providers • Node level (Rackspace) or zone level (Amazon)
  • 8. NICTA Copyright 2010 From imagination to impact 8 Availability Analysis Overview • Recovery strategy • Auto-regeneration of stateless VMs and different recovery mechanisms for stateful VMs • Different Recovery-Time/Point-Objective (RTO/RPO)
  • 9. NICTA Copyright 2010 From imagination to impact 9 Availability Analysis Overview • Application-specific data – Stateless VM start-up time… – Stateful VM replication…
  • 10. NICTA Copyright 2010 From imagination to impact 10 Stochastic Reward Net • Stochastic Reward Net (SRN) – Stochastic Petri Net variant – Firing delays – Reward function • Constructs • Places: VM states (Full, Running, Stoped, Failed ) • Token: VMs • Transition • Guard function • Transition rate: 1) frequency of events, 2) delay before the transition fires • Reward Function: if((#Running1>0) 1 else 0
  • 11. NICTA Copyright 2010 From imagination to impact 11 SRN-based Availability Models
  • 12. NICTA Copyright 2010 From imagination to impact 12 Availability Models: Auto-scaling
  • 13. NICTA Copyright 2010 From imagination to impact 13 Availability Models: Auto-scaling gScaleSelf1: if(#Running1<=#Running2 && #Stopped1>0) 1 else 0 gScaleOther1: if(#Running1>#Running2 && #Stopped2>0) 1 else 0
  • 14. NICTA Copyright 2010 From imagination to impact 14 Availability Models: Stateful VM
  • 15. NICTA Copyright 2010 From imagination to impact 15 Availability Models—Disaster Recovery • Availability zone life cycle – Interact with the big architecture model • Stateless VM recovery – Backup/AMI • Stateful VM recovery – Backup – Replica – Hot standby
  • 16. NICTA Copyright 2010 From imagination to impact 16 Case 1: Multi-zone Deployment • Parameters – Amazon EC2 SLA of 99.95% availability – Zone fail rate: 0.00011, MTTR: 4.38 hours per year – Application specific measurement of transitions 0.01% = 52.56 mins downtime per year 0.4% diff = 35 hours 0.76% diff = 66 hours
  • 17. NICTA Copyright 2010 From imagination to impact 17 Case 2: Recovery across Availability Zone • Industry rule of thumb: ―Target auto-scale 30-60% until you have 50% headroom for load spikes. Lose an AZ leads to 90% utilisation.‖ • Impact on overall availability? • 30-60% vs. traditional 70-90%? • over-provisioning vs. auto-scaling? 0.29% diff = 25 hours
  • 18. NICTA Copyright 2010 From imagination to impact 18 Case 3: Disaster Recovery across Regions • Trade-off between RPO and RTO • RPO: Recovery Point Objective • RTO: Recovery Time Objective Yuruware — http://www.yuruware.com/ 0.2% diff = 17 hours
  • 19. NICTA Copyright 2010 From imagination to impact Conclusion and Future Work • SRN-based availability models – Application-level availability – Highly configurable for different deployment architectures – Model different uncertainties and scenarios for critical systems – Quantify and compare choices and enable what-if analysis – Evaluated using industry best practices • Future work – Better evaluation! – Integrated models on impact of upgrade, live migration, backup and subjective uncertainties (in IEEE Cloud 13) Q. Lu, X. Xu, L. Zhu, L. Bass, et al., "Incorporating Uncertainty into in-Cloud Application Deployment Decisions for Availability," in IEEE Cloud 2013 Liming.Zhu@nicta.com.au Slides available at http://www.slideshare.net/LimingZhu/ 19

Editor's Notes

  1. In this paper, we only show the architecture model and the recovery model due to space limitations.