SlideShare a Scribd company logo
1 of 22
§Release it! - Takeaways
1. Stability
2. Capacity
3. 0 downtime deployments
Agenda
§Stability
Major airline incident - introduction
• Started with a planned failover
on the database cluster that
served Core Facilities (CF)
• CF handled flight searches –
critical, so designed for high
availability
• CF was going to be used by
self-service check-in kiosks,
IVR, and “channel partner”
applications
Major airline incident – outage facts
• Thursday evening, 11 pm: a team of engineers executed a manual database
failover from CF db1 to CF db2, then updated db1, then migrated the
database back to db1 and applied the same change to db2
• 12:30 am: the crew marked the change as “Completed, Success” and signed
off (no downtime)
• 2:30 am: all the check-in kiosks in USA went red (stopped servicing requests)
• minutes later: the IVR servers went red too
• A Severity 1 case was opened immediately
• Priority – restore service: restart CF and kiosks application servers
• Total elapsed time: approx. 3 hours
Major airline incident – consequences
• Cost the company hundreds of thousands of dollars
• When the kiosks go down, off-shift agents are called in
• It took until 3 pm to deal with the backlog
• Delayed flights, reallocated gates
• Bad publicity for the airline in the media
• Affected FAA’s annual report card – measures customer complaints,
and on-time arrivals/departures (less money for CEO)
Major airline incident – post-mortem
• Data to collect:
application servers: log files, thread dumps, and configuration files
database servers: configuration files for the db and the cluster server
compare current db configuration files to those from the nightly backup
• Thread dumps:
 all threads blocked inside SocketInputStream.socketRead(), trying vainly to
read a response that would never come
all threads had called: FlightSearch.lookupByCity()
Major airline incident – the culprit
public class FlightSearch implements SessionBean {
private MonitoredDataSource connectionPool;
public List lookupByCity(. . .) throws SQLException, RemoteException {
Connection conn = null;
Statement stmt = null;
try {
conn = connectionPool.getConnection();
stmt = conn.createStatement(); //…
} finally {
if (stmt != null) { stmt.close(); }
if (conn != null) { conn.close(); }
}
}
}
What is stability
• Transaction = an abstract unit of work processed by the system
• System = the complete, interdependent set of hardware, applications,
and services required to process transactions for users
• Stability = system keeps processing transactions, even when there are
transient impulses, persistent stresses, or component failures disrupting
normal processing (users can still get work done)
• A component of the system which starts to fail before everything else
does = crack in the system
• Cracks propagate!
• Tight coupling accelerates cracks
Major airline incident – avoid propagation
• The pool could have been configured to create more connections if it
was exhausted or to block callers for a limited time, not forever
• The client could have set a timeout on the RMI sockets
• CF servers could have been partitioned into more than one service group
• Use a Circuit breaker
§Capacity
What is capacity
• Performance measures how fast the system processes a single
transaction
• Throughput describes the number of transactions the system can process
in a given time span
• Capacity is the maximum throughput a system can sustain, for a given
workload, while maintaining an acceptable response time for each
individual transaction
Retailer incident
• 300 people have worked for about 3 years to build a complete
replacement for the online store, content management, customer
service, and order-processing systems
• 9 am: the program manager hit the big red button and system went live
• 9:05 am: 10,000 sessions active on the servers
• 9:10 am: 50,000 sessions active on the servers
• 9:30 am: 250,000 sessions active on the servers CRASH!!!!
Retailer incident – reasons for failure
• The number of sessions killed the site
• Each session got serialized and transmitted to a session backup server
after each page request (session replication enabled)
• Sessions were consuming RAM, CPU, and network bandwidth
• All load test scripts used cookies to track sessions
• In production:
Search engines drove customers to old-style URLs
Search engine spiders expect the site to support session tracking via URL
rewriting
Scrapers and shopbots did not handle cookies properly
Retailer incident – fixes
• Use server scripting to protect the site
• Added a gateway page that served three critical capabilities:
if the requester did not handle cookies properly, the page redirected the
browser to a separate page that explained how to enable cookies
a throttle was set to determine what percentage of new sessions would be
allowed to the real home page
block specific IP addresses from hitting the site (shopbots, request floods)
§
0 downtime deployments
0 downtime deployments - Expansion
• Deploy new static files (images, stylesheets, JS)
• Create new service pools, if needed
• Add new tables
• Add new columns
• Run data migration scripts
• Add bridging triggers
• Apply recursive ZDD to prepare secondary clusters
0 downtime deployments - Rollout
• For each server:
• Unpack code on the server
• Stop accepting new requests
• Shutdown the server
• Point to the new code
• Start up the server
• Verify clean startup
0 downtime deployments - Cleanup
• Remove bridging triggers
• Remove obsolete referential integrity relations
• Remove obsolete columns
• Remove obsolete tables
• Add new referential integrity relations
• Add NOT NULL constraints
• Remove obsolete static files
• Remove the old code
• Remove old service pools
§
Thanks!
Capacity antipatterns/patterns
 Resource pool contention
 Excessive JSP fragments
 AJAX overkill
 Overstaying sessions
 Wasted space in HTML
 Reload button
 Handcrafted SQL
 Database eutrophication
 Integration point latency
 Cookie monsters
 Pool connections
 Use caching carefully
 Precompute content
 Tune the garbage collector

More Related Content

What's hot

Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayLaurent Bernaille
 
Running Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesRunning Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesanynines GmbH
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelDocker, Inc.
 
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Opcito Technologies
 
OSv presentation from Linux Foundation Collaboration Summit
OSv presentation from Linux Foundation Collaboration SummitOSv presentation from Linux Foundation Collaboration Summit
OSv presentation from Linux Foundation Collaboration SummitDon Marti
 
Docker 1.12 & Swarm Mode [Montreal Docker Meetup Sept. 2016]
Docker 1.12 & Swarm Mode [Montreal Docker Meetup Sept. 2016]Docker 1.12 & Swarm Mode [Montreal Docker Meetup Sept. 2016]
Docker 1.12 & Swarm Mode [Montreal Docker Meetup Sept. 2016]Brice Argenson
 
Docker 1.5
Docker 1.5Docker 1.5
Docker 1.5rajdeep
 
The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...
The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...
The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...Docker, Inc.
 
Docker cluster with swarm, consul, registrator and consul-template
Docker cluster with swarm, consul, registrator and consul-templateDocker cluster with swarm, consul, registrator and consul-template
Docker cluster with swarm, consul, registrator and consul-templateJulien Maitrehenry
 
Experience Report: Cloud Foundry Open Source Operations | anynines
Experience Report: Cloud Foundry Open Source Operations | anyninesExperience Report: Cloud Foundry Open Source Operations | anynines
Experience Report: Cloud Foundry Open Source Operations | anyninesanynines GmbH
 
Load Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXLoad Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXNGINX, Inc.
 
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)Yong Tang
 
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based InfrastructuresMesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructurespierrecdn -
 
Flink Forward Berlin 2018: Edward Alexander Rojas Clavijo - "Deploying a secu...
Flink Forward Berlin 2018: Edward Alexander Rojas Clavijo - "Deploying a secu...Flink Forward Berlin 2018: Edward Alexander Rojas Clavijo - "Deploying a secu...
Flink Forward Berlin 2018: Edward Alexander Rojas Clavijo - "Deploying a secu...Flink Forward
 

What's hot (20)

Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
 
Wcat
WcatWcat
Wcat
 
Flocker
FlockerFlocker
Flocker
 
Running Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesRunning Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anynines
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object Model
 
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
 
OSv presentation from Linux Foundation Collaboration Summit
OSv presentation from Linux Foundation Collaboration SummitOSv presentation from Linux Foundation Collaboration Summit
OSv presentation from Linux Foundation Collaboration Summit
 
Docker 1.12 & Swarm Mode [Montreal Docker Meetup Sept. 2016]
Docker 1.12 & Swarm Mode [Montreal Docker Meetup Sept. 2016]Docker 1.12 & Swarm Mode [Montreal Docker Meetup Sept. 2016]
Docker 1.12 & Swarm Mode [Montreal Docker Meetup Sept. 2016]
 
Docker 1.5
Docker 1.5Docker 1.5
Docker 1.5
 
The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...
The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...
The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...
 
Docker cluster with swarm, consul, registrator and consul-template
Docker cluster with swarm, consul, registrator and consul-templateDocker cluster with swarm, consul, registrator and consul-template
Docker cluster with swarm, consul, registrator and consul-template
 
Experience Report: Cloud Foundry Open Source Operations | anynines
Experience Report: Cloud Foundry Open Source Operations | anyninesExperience Report: Cloud Foundry Open Source Operations | anynines
Experience Report: Cloud Foundry Open Source Operations | anynines
 
Windows Package Manager Concept
Windows Package Manager ConceptWindows Package Manager Concept
Windows Package Manager Concept
 
Load Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXLoad Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINX
 
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)
 
ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based InfrastructuresMesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
 
Flink Forward Berlin 2018: Edward Alexander Rojas Clavijo - "Deploying a secu...
Flink Forward Berlin 2018: Edward Alexander Rojas Clavijo - "Deploying a secu...Flink Forward Berlin 2018: Edward Alexander Rojas Clavijo - "Deploying a secu...
Flink Forward Berlin 2018: Edward Alexander Rojas Clavijo - "Deploying a secu...
 
How Postman adopted Docker
How Postman adopted DockerHow Postman adopted Docker
How Postman adopted Docker
 
Openstack nova
Openstack novaOpenstack nova
Openstack nova
 

Viewers also liked

AREP CR BIMWorld2016 #1 /5 : Transition
AREP CR BIMWorld2016 #1 /5 : TransitionAREP CR BIMWorld2016 #1 /5 : Transition
AREP CR BIMWorld2016 #1 /5 : TransitionStanislas Taboureau
 
Craigslistpics
CraigslistpicsCraigslistpics
Craigslistpicsmelody1084
 
Présentation bim révolution
Présentation bim révolutionPrésentation bim révolution
Présentation bim révolutionStéphane IMBERT
 
Tidal depositional systems in the rock record christian romero 2016
Tidal depositional systems in the rock record christian romero 2016Tidal depositional systems in the rock record christian romero 2016
Tidal depositional systems in the rock record christian romero 2016ChrisTian Romero
 
The Secrets to Building a Better Brand
The Secrets to Building a Better BrandThe Secrets to Building a Better Brand
The Secrets to Building a Better BrandLeigh George, PhD
 
Café numérique "Evolutions et Tendances du web : les bons conseils pour prépa...
Café numérique "Evolutions et Tendances du web : les bons conseils pour prépa...Café numérique "Evolutions et Tendances du web : les bons conseils pour prépa...
Café numérique "Evolutions et Tendances du web : les bons conseils pour prépa...Destination Brocéliande
 
איך ניתן לשנות תאריך יום הולדת בגוגל פלוס
איך ניתן לשנות תאריך יום הולדת בגוגל פלוסאיך ניתן לשנות תאריך יום הולדת בגוגל פלוס
איך ניתן לשנות תאריך יום הולדת בגוגל פלוסziv lapld
 
Performance Lessons learned in vRouter - Stephen Hemminger
Performance Lessons learned in vRouter - Stephen HemmingerPerformance Lessons learned in vRouter - Stephen Hemminger
Performance Lessons learned in vRouter - Stephen Hemmingerharryvanhaaren
 
Clima organizacional y desempeño laboral del docente
Clima organizacional y desempeño laboral del docenteClima organizacional y desempeño laboral del docente
Clima organizacional y desempeño laboral del docenteRonald Araujo
 
Salesforce API Series: Release Management with the Metadata API webinar
Salesforce API Series: Release Management with the Metadata API webinarSalesforce API Series: Release Management with the Metadata API webinar
Salesforce API Series: Release Management with the Metadata API webinarSalesforce Developers
 
doTERRA Essential Oils 101 Class - revised 2-8-2016
doTERRA Essential Oils 101 Class - revised 2-8-2016doTERRA Essential Oils 101 Class - revised 2-8-2016
doTERRA Essential Oils 101 Class - revised 2-8-2016M.S.Ed. Gavin Coriell
 
Actividades sugeridas inicio de labor pedagógica 2016.
Actividades sugeridas inicio de labor pedagógica 2016.Actividades sugeridas inicio de labor pedagógica 2016.
Actividades sugeridas inicio de labor pedagógica 2016.Marly Rodriguez
 

Viewers also liked (19)

July 2016 newsletter
July 2016  newsletterJuly 2016  newsletter
July 2016 newsletter
 
Infographics on ECN research programs
Infographics on ECN research programsInfographics on ECN research programs
Infographics on ECN research programs
 
AREP CR BIMWorld2016 #1 /5 : Transition
AREP CR BIMWorld2016 #1 /5 : TransitionAREP CR BIMWorld2016 #1 /5 : Transition
AREP CR BIMWorld2016 #1 /5 : Transition
 
Sample Slides
Sample SlidesSample Slides
Sample Slides
 
Equipo24 tarea10
Equipo24 tarea10Equipo24 tarea10
Equipo24 tarea10
 
Craigslistpics
CraigslistpicsCraigslistpics
Craigslistpics
 
Présentation bim révolution
Présentation bim révolutionPrésentation bim révolution
Présentation bim révolution
 
Tidal depositional systems in the rock record christian romero 2016
Tidal depositional systems in the rock record christian romero 2016Tidal depositional systems in the rock record christian romero 2016
Tidal depositional systems in the rock record christian romero 2016
 
The Secrets to Building a Better Brand
The Secrets to Building a Better BrandThe Secrets to Building a Better Brand
The Secrets to Building a Better Brand
 
Mandujano nd
Mandujano ndMandujano nd
Mandujano nd
 
Café numérique "Evolutions et Tendances du web : les bons conseils pour prépa...
Café numérique "Evolutions et Tendances du web : les bons conseils pour prépa...Café numérique "Evolutions et Tendances du web : les bons conseils pour prépa...
Café numérique "Evolutions et Tendances du web : les bons conseils pour prépa...
 
איך ניתן לשנות תאריך יום הולדת בגוגל פלוס
איך ניתן לשנות תאריך יום הולדת בגוגל פלוסאיך ניתן לשנות תאריך יום הולדת בגוגל פלוס
איך ניתן לשנות תאריך יום הולדת בגוגל פלוס
 
Performance Lessons learned in vRouter - Stephen Hemminger
Performance Lessons learned in vRouter - Stephen HemmingerPerformance Lessons learned in vRouter - Stephen Hemminger
Performance Lessons learned in vRouter - Stephen Hemminger
 
SUSTENTACIÓN 2
SUSTENTACIÓN 2SUSTENTACIÓN 2
SUSTENTACIÓN 2
 
Clima organizacional y desempeño laboral del docente
Clima organizacional y desempeño laboral del docenteClima organizacional y desempeño laboral del docente
Clima organizacional y desempeño laboral del docente
 
Salesforce API Series: Release Management with the Metadata API webinar
Salesforce API Series: Release Management with the Metadata API webinarSalesforce API Series: Release Management with the Metadata API webinar
Salesforce API Series: Release Management with the Metadata API webinar
 
doTERRA Essential Oils 101 Class - revised 2-8-2016
doTERRA Essential Oils 101 Class - revised 2-8-2016doTERRA Essential Oils 101 Class - revised 2-8-2016
doTERRA Essential Oils 101 Class - revised 2-8-2016
 
Release It!
Release It!Release It!
Release It!
 
Actividades sugeridas inicio de labor pedagógica 2016.
Actividades sugeridas inicio de labor pedagógica 2016.Actividades sugeridas inicio de labor pedagógica 2016.
Actividades sugeridas inicio de labor pedagógica 2016.
 

Similar to Release it! - Takeaways

The server side story: Parallel and Asynchronous programming in .NET - ITPro...
The server side story:  Parallel and Asynchronous programming in .NET - ITPro...The server side story:  Parallel and Asynchronous programming in .NET - ITPro...
The server side story: Parallel and Asynchronous programming in .NET - ITPro...Panagiotis Kanavos
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep DiveAmazon Web Services
 
Migration of an Enterprise UI Microservice System from Cloud Foundry to Kuber...
Migration of an Enterprise UI Microservice System from Cloud Foundry to Kuber...Migration of an Enterprise UI Microservice System from Cloud Foundry to Kuber...
Migration of an Enterprise UI Microservice System from Cloud Foundry to Kuber...Tony Erwin
 
20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWSAmazon Web Services Korea
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Xavier Lucas
 
Transforming Legacy Applications Into Dynamically Scalable Web Services
Transforming Legacy Applications Into Dynamically Scalable Web ServicesTransforming Legacy Applications Into Dynamically Scalable Web Services
Transforming Legacy Applications Into Dynamically Scalable Web ServicesAdam Takvam
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool ManagementBIOVIA
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenParticular Software
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityLudovico Caldara
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...Amazon Web Services
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recoverySqlperfomance
 
CNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise ServicesCNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise ServicesSam Bowne
 
CNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise ServicesCNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise ServicesSam Bowne
 
Pm ix tutorial-june2019-pub (1)
Pm ix tutorial-june2019-pub (1)Pm ix tutorial-june2019-pub (1)
Pm ix tutorial-june2019-pub (1)ewerkboy
 
Powering Remote Developers with Amazon Workspaces
Powering Remote Developers with Amazon WorkspacesPowering Remote Developers with Amazon Workspaces
Powering Remote Developers with Amazon WorkspacesAmazon Web Services
 
Migrating Enterprise Microservices From Cloud Foundry to Kubernetes
Migrating Enterprise Microservices From Cloud Foundry to KubernetesMigrating Enterprise Microservices From Cloud Foundry to Kubernetes
Migrating Enterprise Microservices From Cloud Foundry to KubernetesTony Erwin
 

Similar to Release it! - Takeaways (20)

The server side story: Parallel and Asynchronous programming in .NET - ITPro...
The server side story:  Parallel and Asynchronous programming in .NET - ITPro...The server side story:  Parallel and Asynchronous programming in .NET - ITPro...
The server side story: Parallel and Asynchronous programming in .NET - ITPro...
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
 
Migration of an Enterprise UI Microservice System from Cloud Foundry to Kuber...
Migration of an Enterprise UI Microservice System from Cloud Foundry to Kuber...Migration of an Enterprise UI Microservice System from Cloud Foundry to Kuber...
Migration of an Enterprise UI Microservice System from Cloud Foundry to Kuber...
 
20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
Sitaram_Chalasani_CV
Sitaram_Chalasani_CVSitaram_Chalasani_CV
Sitaram_Chalasani_CV
 
Transforming Legacy Applications Into Dynamically Scalable Web Services
Transforming Legacy Applications Into Dynamically Scalable Web ServicesTransforming Legacy Applications Into Dynamically Scalable Web Services
Transforming Legacy Applications Into Dynamically Scalable Web Services
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High Availability
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recovery
 
CNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise ServicesCNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise Services
 
App fabric introduction
App fabric introductionApp fabric introduction
App fabric introduction
 
CNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise ServicesCNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise Services
 
Pm ix tutorial-june2019-pub (1)
Pm ix tutorial-june2019-pub (1)Pm ix tutorial-june2019-pub (1)
Pm ix tutorial-june2019-pub (1)
 
Powering Remote Developers with Amazon Workspaces
Powering Remote Developers with Amazon WorkspacesPowering Remote Developers with Amazon Workspaces
Powering Remote Developers with Amazon Workspaces
 
Migrating Enterprise Microservices From Cloud Foundry to Kubernetes
Migrating Enterprise Microservices From Cloud Foundry to KubernetesMigrating Enterprise Microservices From Cloud Foundry to Kubernetes
Migrating Enterprise Microservices From Cloud Foundry to Kubernetes
 

More from Manuela Grindei

More from Manuela Grindei (7)

TDD Training
TDD TrainingTDD Training
TDD Training
 
Java 104
Java 104Java 104
Java 104
 
Java 103
Java 103Java 103
Java 103
 
Java 102
Java 102Java 102
Java 102
 
Java 101
Java 101Java 101
Java 101
 
Continuous delivery - takeaways
Continuous delivery - takeawaysContinuous delivery - takeaways
Continuous delivery - takeaways
 
Exceptions and errors in Java
Exceptions and errors in JavaExceptions and errors in Java
Exceptions and errors in Java
 

Recently uploaded

Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Recently uploaded (20)

Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

Release it! - Takeaways

  • 1. §Release it! - Takeaways
  • 2. 1. Stability 2. Capacity 3. 0 downtime deployments Agenda
  • 4. Major airline incident - introduction • Started with a planned failover on the database cluster that served Core Facilities (CF) • CF handled flight searches – critical, so designed for high availability • CF was going to be used by self-service check-in kiosks, IVR, and “channel partner” applications
  • 5. Major airline incident – outage facts • Thursday evening, 11 pm: a team of engineers executed a manual database failover from CF db1 to CF db2, then updated db1, then migrated the database back to db1 and applied the same change to db2 • 12:30 am: the crew marked the change as “Completed, Success” and signed off (no downtime) • 2:30 am: all the check-in kiosks in USA went red (stopped servicing requests) • minutes later: the IVR servers went red too • A Severity 1 case was opened immediately • Priority – restore service: restart CF and kiosks application servers • Total elapsed time: approx. 3 hours
  • 6. Major airline incident – consequences • Cost the company hundreds of thousands of dollars • When the kiosks go down, off-shift agents are called in • It took until 3 pm to deal with the backlog • Delayed flights, reallocated gates • Bad publicity for the airline in the media • Affected FAA’s annual report card – measures customer complaints, and on-time arrivals/departures (less money for CEO)
  • 7. Major airline incident – post-mortem • Data to collect: application servers: log files, thread dumps, and configuration files database servers: configuration files for the db and the cluster server compare current db configuration files to those from the nightly backup • Thread dumps:  all threads blocked inside SocketInputStream.socketRead(), trying vainly to read a response that would never come all threads had called: FlightSearch.lookupByCity()
  • 8. Major airline incident – the culprit public class FlightSearch implements SessionBean { private MonitoredDataSource connectionPool; public List lookupByCity(. . .) throws SQLException, RemoteException { Connection conn = null; Statement stmt = null; try { conn = connectionPool.getConnection(); stmt = conn.createStatement(); //… } finally { if (stmt != null) { stmt.close(); } if (conn != null) { conn.close(); } } } }
  • 9. What is stability • Transaction = an abstract unit of work processed by the system • System = the complete, interdependent set of hardware, applications, and services required to process transactions for users • Stability = system keeps processing transactions, even when there are transient impulses, persistent stresses, or component failures disrupting normal processing (users can still get work done) • A component of the system which starts to fail before everything else does = crack in the system • Cracks propagate! • Tight coupling accelerates cracks
  • 10. Major airline incident – avoid propagation • The pool could have been configured to create more connections if it was exhausted or to block callers for a limited time, not forever • The client could have set a timeout on the RMI sockets • CF servers could have been partitioned into more than one service group • Use a Circuit breaker
  • 12. What is capacity • Performance measures how fast the system processes a single transaction • Throughput describes the number of transactions the system can process in a given time span • Capacity is the maximum throughput a system can sustain, for a given workload, while maintaining an acceptable response time for each individual transaction
  • 13. Retailer incident • 300 people have worked for about 3 years to build a complete replacement for the online store, content management, customer service, and order-processing systems • 9 am: the program manager hit the big red button and system went live • 9:05 am: 10,000 sessions active on the servers • 9:10 am: 50,000 sessions active on the servers • 9:30 am: 250,000 sessions active on the servers CRASH!!!!
  • 14. Retailer incident – reasons for failure • The number of sessions killed the site • Each session got serialized and transmitted to a session backup server after each page request (session replication enabled) • Sessions were consuming RAM, CPU, and network bandwidth • All load test scripts used cookies to track sessions • In production: Search engines drove customers to old-style URLs Search engine spiders expect the site to support session tracking via URL rewriting Scrapers and shopbots did not handle cookies properly
  • 15. Retailer incident – fixes • Use server scripting to protect the site • Added a gateway page that served three critical capabilities: if the requester did not handle cookies properly, the page redirected the browser to a separate page that explained how to enable cookies a throttle was set to determine what percentage of new sessions would be allowed to the real home page block specific IP addresses from hitting the site (shopbots, request floods)
  • 17. 0 downtime deployments - Expansion • Deploy new static files (images, stylesheets, JS) • Create new service pools, if needed • Add new tables • Add new columns • Run data migration scripts • Add bridging triggers • Apply recursive ZDD to prepare secondary clusters
  • 18. 0 downtime deployments - Rollout • For each server: • Unpack code on the server • Stop accepting new requests • Shutdown the server • Point to the new code • Start up the server • Verify clean startup
  • 19. 0 downtime deployments - Cleanup • Remove bridging triggers • Remove obsolete referential integrity relations • Remove obsolete columns • Remove obsolete tables • Add new referential integrity relations • Add NOT NULL constraints • Remove obsolete static files • Remove the old code • Remove old service pools
  • 21.
  • 22. Capacity antipatterns/patterns  Resource pool contention  Excessive JSP fragments  AJAX overkill  Overstaying sessions  Wasted space in HTML  Reload button  Handcrafted SQL  Database eutrophication  Integration point latency  Cookie monsters  Pool connections  Use caching carefully  Precompute content  Tune the garbage collector