SlideShare a Scribd company logo
STRICTLY CONFIDENTIAL
Application Resilience: Challenges & Good Practice
City Business Club - ISITC Europe | July 2019
Dr Aled Sage
aled@cloudsoft.io
© Cloudsoft Corporation 2019
About Aled Sage
VP Engineering @ Cloudsoft
PhD and 20 years experience with
distributed systems
Expertise in Cloud, Devops and Automation
Across different verticals, including
financial services
@aledsage
aled@cloudsoft.io
linkedin.com/in/aledsage
github.com/aledsage© Cloudsoft Corporation 2019 2
© Cloudsoft Corporation 2019
1. What is resiliency?
2. Changing landscape
3. Case study
4. Problems & solutions
Agenda
© Cloudsoft Corporation 2019 3
© Cloudsoft Corporation 2019
• Business applications
• Service level objectives (SLOs)
Service level agreements (SLAs)
• Aim is not 100% availability
• “Availability” is not binary
• Design for failure
• Proactive and reactive
What is Resiliency
Google Site Reliability Engineers (SRE)
“Striving for Imperfection: Using an error budget
to move fast without compromising high
reliability”
CRM, payroll, analysis, trading systems,
settlement systems, …
Proactive: High availability (HA)
Avoid problems
Reactive: Know when something is wrong
Know how to respond
Repair
Disaster recovery (DR)© Cloudsoft Corporation 2019 4
© Cloudsoft Corporation 2019
Business Impact
© Cloudsoft Corporation 2019 5
Gartner
Why Business Leaders Don’t Care About the Cost of Downtime
9 April 2019
© Cloudsoft Corporation 2019
• Private and public cloud
• Kubernetes and Openshift (using containers)
• Bare-metal and mainframes
• Business wants agility
• Security and reliability still vital
• All while driving down costs
Changing Landscape
• Metro Bank, Monzo, Starling
• Facebook? Amazon?
© Cloudsoft Corporation 2019 6
© Cloudsoft Corporation 2019
Comparison with Disruptors
Incumbents Disruptors
Hybrid environment Everything is cloud-native
Chaos engineering?
“No way!”
Kill servers in production
Wall Street bank: decrease VM
provisioning to 9 hours
New container in seconds;
New Virtual Machines in minute(s)
3-6 month releases Small changes
several times a day
Improve the basics:
● Improve resiliency and
agility of existing apps
● Fit with old-world and the
future
© Cloudsoft Corporation 2019
100s of applications
Many lines of business
Small product set
Legacy high-value apps No legacy apps
7
© Cloudsoft Corporation 2019
APPLICATION-CENTRIC APPROACH
✓ Empower and involve application architects and developers
✓ Codify best practices and use of the bank’s strategic tools
✓ Consistent way to automate resiliency
Self-healing applications!
Your application can automatically detect and fix problems
Case-study: Tier-1 Investment Bank
ACHIEVEMENTS UNLOCKED
✓ Application-level Resiliency at scale
✓ Meets business & regulatory needs
✓ Across multiple evolving platforms
© Cloudsoft Corporation 2019 8
© Cloudsoft Corporation 2019
Challenges
© Cloudsoft Corporation 2019 9
Wide variation in
techniques and quality;
bespoke processes.
Involves many systems.
Fiefdoms & politics.
Resiliency is a
cross-cutting concern.
Manual procedures and
tribal knowledge.
Responses to
degradation or failure
not automated.
Manual Bespoke
Operations
fragmented
Rarely tested
Sufficient to meet
auditors’ requirements.
But little practice or
variation in scenarios;
manual testing is
expensive.
© Cloudsoft Corporation 2019
Problems and Solutions
Problems Solutions
Tribal knowledge, rarely tested Gamedays
Processes, environments and applications are
complicated and ever-changing
Post mortems & continual improvement
High availability and DR needs understanding of
application, but infrastructure-focused
Application-centric approach
Bespoke / inconsistent approaches Empower the application architects;
Codify and share best practices.
© Cloudsoft Corporation 2019 10
© Cloudsoft Corporation 2019
Problems and Solutions
© Cloudsoft Corporation 2019
Problems and Solutions
Problems Solutions
Manual runbooks Automated recovery
Involves many systems Encourage APIs and automation;
“glue code” to codify their use;
focus on common application architectures.
Runbooks and CMDB out-of-date “Living model” of the application;
Deployment and in-life management codified in this model.
© Cloudsoft Corporation 2019 12
© Cloudsoft Corporation 2019
• Restart the web-servers (e.g. on an out-of-memory)
• Repave servers or clusters (e.g. when a restart fails, or an update is
required)
• Auto-scale based on latency, load or time-of-day
• Manage DR within a region and/or across regions
Automated Recovery
© Cloudsoft Corporation 2019 13
© Cloudsoft Corporation 2019
Self Managing Systems: Autonomics
© Cloudsoft Corporation 2019 14
Self configuring
• Automated configuration of components and systems follows
high-level policies. Rest of system adjusts automatically and
seamlessly
Self healing
• System automatically detects, diagnoses, and repairs software and
hardware problems
Self optimizing
• Components and systems continually seek opportunities to
improve their own performance and efficiency
Self protecting
• System automatically defends against malicious attacks or
cascading failures. It uses early warning to anticipate and prevent
system wide failures
© Cloudsoft Corporation 2019
Cloudsoft AMP
© Cloudsoft Corporation 2019 15
AMP streamlines development, operations and governance for any
application on any cloud
Rapidly Scalable
Autonomic policy-based
management
Cloud Enabling
Liberate applications
from infrastructure
Powerful Tools
Consistent in-life management
tooling and features
Improve Visibility
Unified management
view of applications
© Cloudsoft Corporation 2019
Suggested Next Steps
© Cloudsoft Corporation 2019
This week
❏ Get the Gartner report: “Why Business Leaders Don’t Care About the Cost of Downtime” free of charge
❏ Talk to person(s) in charge of application resiliency
❏ Example post mortem from the last major incident; how was this knowledge shared and used?
❏ How are application authors involved in planning for and implementing resiliency?
❏ Is there consistent monitoring and recovery strategies (focusing 80:20 rule of existing apps)?
❏ How often is it tested? Are there gamedays?
Next 90 days
❏ Identify example app(s)
❏ Get a review from an expert of an example workload (“well-architected review”)
❏ Organise gamedays
❏ Introduce post mortems
Next 6 months
❏ Consider automation software at the application level for ‘low-hanging fruit’
❏ Involve and empower your application authors
16
get report for free
cloudsoft.io/report
17
e: aled@cloudsoft.io
w: cloudsoft.io
Thank You!
Any Questions?

More Related Content

What's hot

Digital Transformation, Cloud Adoption and the Impact on SAM and Security
Digital Transformation, Cloud Adoption and the Impact on SAM and SecurityDigital Transformation, Cloud Adoption and the Impact on SAM and Security
Digital Transformation, Cloud Adoption and the Impact on SAM and Security
Flexera
 
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
HP Enterprise Italia
 
Democratizing App Development in Insurance Industry
Democratizing App Development in Insurance IndustryDemocratizing App Development in Insurance Industry
Democratizing App Development in Insurance Industry
WaveMaker, Inc.
 
How to Increase Fintech Contact Center Productivity
How to Increase Fintech Contact Center ProductivityHow to Increase Fintech Contact Center Productivity
How to Increase Fintech Contact Center Productivity
PT Datacomm Diangraha
 
When it comes to ADCs, Perception is not Reality - Radware
When it comes to ADCs, Perception is not Reality - RadwareWhen it comes to ADCs, Perception is not Reality - Radware
When it comes to ADCs, Perception is not Reality - Radware
aliciasyc
 
Optimizing Technology Refresh
Optimizing Technology RefreshOptimizing Technology Refresh
Optimizing Technology Refresh
xpmigration
 
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone ProgramKespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
Kespry, Inc.
 
New Model for IT: Cloud Service Provider
New Model for IT: Cloud Service ProviderNew Model for IT: Cloud Service Provider
New Model for IT: Cloud Service Provider
VMware
 
FY11 PC Refresh Plan
FY11 PC Refresh PlanFY11 PC Refresh Plan
FY11 PC Refresh Plan
Jerry Bishop
 
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
VMware
 
Progress Pacific: Contemporary App Development
Progress Pacific: Contemporary App DevelopmentProgress Pacific: Contemporary App Development
Progress Pacific: Contemporary App Development
Progress
 
Why Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational ChallengesWhy Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational Challenges
Enterprise Management Associates
 
Unified PPM & Agile
Unified PPM & AgileUnified PPM & Agile
Unified PPM & Agile
Michelle Manimtim
 
XebiaLabs Overview Slides
XebiaLabs Overview SlidesXebiaLabs Overview Slides
XebiaLabs Overview Slides
XebiaLabs
 
Cloud Accounting Process Infographic
Cloud Accounting Process InfographicCloud Accounting Process Infographic
Cloud Accounting Process Infographic
Bentleys (WA) Pty Ltd
 
SECC_Software Testing Services
SECC_Software Testing ServicesSECC_Software Testing Services
SECC_Software Testing Services
SECC Egypt
 
SG MVPA Workshop Booklet Fall 2015
SG MVPA Workshop Booklet Fall 2015SG MVPA Workshop Booklet Fall 2015
SG MVPA Workshop Booklet Fall 2015
Josh Russ
 
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystems
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystemsTechnical Webinar: By the (Play) Book: The Agile Practice at OutSystems
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystems
OutSystems
 
CMMI services presentation -SECC
CMMI services presentation -SECCCMMI services presentation -SECC
CMMI services presentation -SECC
SECC Egypt
 

What's hot (19)

Digital Transformation, Cloud Adoption and the Impact on SAM and Security
Digital Transformation, Cloud Adoption and the Impact on SAM and SecurityDigital Transformation, Cloud Adoption and the Impact on SAM and Security
Digital Transformation, Cloud Adoption and the Impact on SAM and Security
 
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
 
Democratizing App Development in Insurance Industry
Democratizing App Development in Insurance IndustryDemocratizing App Development in Insurance Industry
Democratizing App Development in Insurance Industry
 
How to Increase Fintech Contact Center Productivity
How to Increase Fintech Contact Center ProductivityHow to Increase Fintech Contact Center Productivity
How to Increase Fintech Contact Center Productivity
 
When it comes to ADCs, Perception is not Reality - Radware
When it comes to ADCs, Perception is not Reality - RadwareWhen it comes to ADCs, Perception is not Reality - Radware
When it comes to ADCs, Perception is not Reality - Radware
 
Optimizing Technology Refresh
Optimizing Technology RefreshOptimizing Technology Refresh
Optimizing Technology Refresh
 
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone ProgramKespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
 
New Model for IT: Cloud Service Provider
New Model for IT: Cloud Service ProviderNew Model for IT: Cloud Service Provider
New Model for IT: Cloud Service Provider
 
FY11 PC Refresh Plan
FY11 PC Refresh PlanFY11 PC Refresh Plan
FY11 PC Refresh Plan
 
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
 
Progress Pacific: Contemporary App Development
Progress Pacific: Contemporary App DevelopmentProgress Pacific: Contemporary App Development
Progress Pacific: Contemporary App Development
 
Why Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational ChallengesWhy Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational Challenges
 
Unified PPM & Agile
Unified PPM & AgileUnified PPM & Agile
Unified PPM & Agile
 
XebiaLabs Overview Slides
XebiaLabs Overview SlidesXebiaLabs Overview Slides
XebiaLabs Overview Slides
 
Cloud Accounting Process Infographic
Cloud Accounting Process InfographicCloud Accounting Process Infographic
Cloud Accounting Process Infographic
 
SECC_Software Testing Services
SECC_Software Testing ServicesSECC_Software Testing Services
SECC_Software Testing Services
 
SG MVPA Workshop Booklet Fall 2015
SG MVPA Workshop Booklet Fall 2015SG MVPA Workshop Booklet Fall 2015
SG MVPA Workshop Booklet Fall 2015
 
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystems
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystemsTechnical Webinar: By the (Play) Book: The Agile Practice at OutSystems
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystems
 
CMMI services presentation -SECC
CMMI services presentation -SECCCMMI services presentation -SECC
CMMI services presentation -SECC
 

Similar to Application resilience: challenges and good practice

Enterprise mobile strategy framework - 1st part
Enterprise mobile strategy framework  - 1st partEnterprise mobile strategy framework  - 1st part
Enterprise mobile strategy framework - 1st part
Algarytm
 
Security Across the Cloud Native Continuum with ESG and Palo Alto Networks
Security Across the Cloud Native Continuum with ESG and Palo Alto NetworksSecurity Across the Cloud Native Continuum with ESG and Palo Alto Networks
Security Across the Cloud Native Continuum with ESG and Palo Alto Networks
DevOps.com
 
CloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to ResolutionCloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to Resolution
OpsRamp
 
Five keys to successful cloud migration
Five keys to successful cloud migrationFive keys to successful cloud migration
Five keys to successful cloud migration
IBM
 
Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020
VMware Tanzu
 
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud AdoptionAWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
Amazon Web Services
 
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
SOASTA
 
Industry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average BusinessIndustry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average Business
Michael Elder
 
DevOps for Enterprise Systems : Innovate like a Startup
DevOps for Enterprise Systems : Innovate like a StartupDevOps for Enterprise Systems : Innovate like a Startup
DevOps for Enterprise Systems : Innovate like a Startup
DevOps for Enterprise Systems
 
Adopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliveryAdopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous Delivery
SOASTA
 
Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
xKinAnx
 
Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
solarisyourep
 
La Digital Transformation ha un nuovo alleato: Value Stream Management
La Digital Transformation ha un nuovo alleato: Value Stream ManagementLa Digital Transformation ha un nuovo alleato: Value Stream Management
La Digital Transformation ha un nuovo alleato: Value Stream Management
Emerasoft, solutions to collaborate
 
Cloud computing in practice
Cloud computing in practiceCloud computing in practice
Cloud computing in practice
Andrzej Osmak
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOps
eG Innovations
 
Enterprise Mobile Strategy Framework - I
Enterprise Mobile Strategy Framework - IEnterprise Mobile Strategy Framework - I
Enterprise Mobile Strategy Framework - I
Propel Apps
 
Enterprise mobile strategy framework- I
Enterprise mobile strategy framework- IEnterprise mobile strategy framework- I
Enterprise mobile strategy framework- I
Algarytm
 
Under cloud cover: How leaders are accelerating competitive differentiation
Under cloud cover: How leaders are accelerating competitive differentiationUnder cloud cover: How leaders are accelerating competitive differentiation
Under cloud cover: How leaders are accelerating competitive differentiation
Susanne Hupfer, Ph.D.
 
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
Matt Ray
 
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Dell World
 

Similar to Application resilience: challenges and good practice (20)

Enterprise mobile strategy framework - 1st part
Enterprise mobile strategy framework  - 1st partEnterprise mobile strategy framework  - 1st part
Enterprise mobile strategy framework - 1st part
 
Security Across the Cloud Native Continuum with ESG and Palo Alto Networks
Security Across the Cloud Native Continuum with ESG and Palo Alto NetworksSecurity Across the Cloud Native Continuum with ESG and Palo Alto Networks
Security Across the Cloud Native Continuum with ESG and Palo Alto Networks
 
CloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to ResolutionCloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to Resolution
 
Five keys to successful cloud migration
Five keys to successful cloud migrationFive keys to successful cloud migration
Five keys to successful cloud migration
 
Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020
 
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud AdoptionAWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
 
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
 
Industry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average BusinessIndustry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average Business
 
DevOps for Enterprise Systems : Innovate like a Startup
DevOps for Enterprise Systems : Innovate like a StartupDevOps for Enterprise Systems : Innovate like a Startup
DevOps for Enterprise Systems : Innovate like a Startup
 
Adopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliveryAdopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous Delivery
 
Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
 
Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
 
La Digital Transformation ha un nuovo alleato: Value Stream Management
La Digital Transformation ha un nuovo alleato: Value Stream ManagementLa Digital Transformation ha un nuovo alleato: Value Stream Management
La Digital Transformation ha un nuovo alleato: Value Stream Management
 
Cloud computing in practice
Cloud computing in practiceCloud computing in practice
Cloud computing in practice
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOps
 
Enterprise Mobile Strategy Framework - I
Enterprise Mobile Strategy Framework - IEnterprise Mobile Strategy Framework - I
Enterprise Mobile Strategy Framework - I
 
Enterprise mobile strategy framework- I
Enterprise mobile strategy framework- IEnterprise mobile strategy framework- I
Enterprise mobile strategy framework- I
 
Under cloud cover: How leaders are accelerating competitive differentiation
Under cloud cover: How leaders are accelerating competitive differentiationUnder cloud cover: How leaders are accelerating competitive differentiation
Under cloud cover: How leaders are accelerating competitive differentiation
 
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
 
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
 

Recently uploaded

TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
aeeva
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
campbellclarkson
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
alowpalsadig
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdfThe Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
kalichargn70th171
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
vaishalijagtap12
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Peter Caitens
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
Pedro J. Molina
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
widenerjobeyrl638
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
sandeepmenon62
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
The Third Creative Media
 

Recently uploaded (20)

TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdfThe Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
 

Application resilience: challenges and good practice

  • 1. STRICTLY CONFIDENTIAL Application Resilience: Challenges & Good Practice City Business Club - ISITC Europe | July 2019 Dr Aled Sage aled@cloudsoft.io
  • 2. © Cloudsoft Corporation 2019 About Aled Sage VP Engineering @ Cloudsoft PhD and 20 years experience with distributed systems Expertise in Cloud, Devops and Automation Across different verticals, including financial services @aledsage aled@cloudsoft.io linkedin.com/in/aledsage github.com/aledsage© Cloudsoft Corporation 2019 2
  • 3. © Cloudsoft Corporation 2019 1. What is resiliency? 2. Changing landscape 3. Case study 4. Problems & solutions Agenda © Cloudsoft Corporation 2019 3
  • 4. © Cloudsoft Corporation 2019 • Business applications • Service level objectives (SLOs) Service level agreements (SLAs) • Aim is not 100% availability • “Availability” is not binary • Design for failure • Proactive and reactive What is Resiliency Google Site Reliability Engineers (SRE) “Striving for Imperfection: Using an error budget to move fast without compromising high reliability” CRM, payroll, analysis, trading systems, settlement systems, … Proactive: High availability (HA) Avoid problems Reactive: Know when something is wrong Know how to respond Repair Disaster recovery (DR)© Cloudsoft Corporation 2019 4
  • 5. © Cloudsoft Corporation 2019 Business Impact © Cloudsoft Corporation 2019 5 Gartner Why Business Leaders Don’t Care About the Cost of Downtime 9 April 2019
  • 6. © Cloudsoft Corporation 2019 • Private and public cloud • Kubernetes and Openshift (using containers) • Bare-metal and mainframes • Business wants agility • Security and reliability still vital • All while driving down costs Changing Landscape • Metro Bank, Monzo, Starling • Facebook? Amazon? © Cloudsoft Corporation 2019 6
  • 7. © Cloudsoft Corporation 2019 Comparison with Disruptors Incumbents Disruptors Hybrid environment Everything is cloud-native Chaos engineering? “No way!” Kill servers in production Wall Street bank: decrease VM provisioning to 9 hours New container in seconds; New Virtual Machines in minute(s) 3-6 month releases Small changes several times a day Improve the basics: ● Improve resiliency and agility of existing apps ● Fit with old-world and the future © Cloudsoft Corporation 2019 100s of applications Many lines of business Small product set Legacy high-value apps No legacy apps 7
  • 8. © Cloudsoft Corporation 2019 APPLICATION-CENTRIC APPROACH ✓ Empower and involve application architects and developers ✓ Codify best practices and use of the bank’s strategic tools ✓ Consistent way to automate resiliency Self-healing applications! Your application can automatically detect and fix problems Case-study: Tier-1 Investment Bank ACHIEVEMENTS UNLOCKED ✓ Application-level Resiliency at scale ✓ Meets business & regulatory needs ✓ Across multiple evolving platforms © Cloudsoft Corporation 2019 8
  • 9. © Cloudsoft Corporation 2019 Challenges © Cloudsoft Corporation 2019 9 Wide variation in techniques and quality; bespoke processes. Involves many systems. Fiefdoms & politics. Resiliency is a cross-cutting concern. Manual procedures and tribal knowledge. Responses to degradation or failure not automated. Manual Bespoke Operations fragmented Rarely tested Sufficient to meet auditors’ requirements. But little practice or variation in scenarios; manual testing is expensive.
  • 10. © Cloudsoft Corporation 2019 Problems and Solutions Problems Solutions Tribal knowledge, rarely tested Gamedays Processes, environments and applications are complicated and ever-changing Post mortems & continual improvement High availability and DR needs understanding of application, but infrastructure-focused Application-centric approach Bespoke / inconsistent approaches Empower the application architects; Codify and share best practices. © Cloudsoft Corporation 2019 10
  • 11. © Cloudsoft Corporation 2019 Problems and Solutions
  • 12. © Cloudsoft Corporation 2019 Problems and Solutions Problems Solutions Manual runbooks Automated recovery Involves many systems Encourage APIs and automation; “glue code” to codify their use; focus on common application architectures. Runbooks and CMDB out-of-date “Living model” of the application; Deployment and in-life management codified in this model. © Cloudsoft Corporation 2019 12
  • 13. © Cloudsoft Corporation 2019 • Restart the web-servers (e.g. on an out-of-memory) • Repave servers or clusters (e.g. when a restart fails, or an update is required) • Auto-scale based on latency, load or time-of-day • Manage DR within a region and/or across regions Automated Recovery © Cloudsoft Corporation 2019 13
  • 14. © Cloudsoft Corporation 2019 Self Managing Systems: Autonomics © Cloudsoft Corporation 2019 14 Self configuring • Automated configuration of components and systems follows high-level policies. Rest of system adjusts automatically and seamlessly Self healing • System automatically detects, diagnoses, and repairs software and hardware problems Self optimizing • Components and systems continually seek opportunities to improve their own performance and efficiency Self protecting • System automatically defends against malicious attacks or cascading failures. It uses early warning to anticipate and prevent system wide failures
  • 15. © Cloudsoft Corporation 2019 Cloudsoft AMP © Cloudsoft Corporation 2019 15 AMP streamlines development, operations and governance for any application on any cloud Rapidly Scalable Autonomic policy-based management Cloud Enabling Liberate applications from infrastructure Powerful Tools Consistent in-life management tooling and features Improve Visibility Unified management view of applications
  • 16. © Cloudsoft Corporation 2019 Suggested Next Steps © Cloudsoft Corporation 2019 This week ❏ Get the Gartner report: “Why Business Leaders Don’t Care About the Cost of Downtime” free of charge ❏ Talk to person(s) in charge of application resiliency ❏ Example post mortem from the last major incident; how was this knowledge shared and used? ❏ How are application authors involved in planning for and implementing resiliency? ❏ Is there consistent monitoring and recovery strategies (focusing 80:20 rule of existing apps)? ❏ How often is it tested? Are there gamedays? Next 90 days ❏ Identify example app(s) ❏ Get a review from an expert of an example workload (“well-architected review”) ❏ Organise gamedays ❏ Introduce post mortems Next 6 months ❏ Consider automation software at the application level for ‘low-hanging fruit’ ❏ Involve and empower your application authors 16 get report for free cloudsoft.io/report