4. Availability Metrics
A quantitative perspective can be used to build a model to
represent the availability qualities of a system.
Availability = uptime ÷ (uptime + downtime)
5. A system's availability can be calculated by the number of nines in
the digits, representing the percentage of time a system is
operating, without downtime.
7. To design for resiliency (恢復能力), two terms come into play, as
follows:
● Recovery Time Objective (RTO) - 復原時間目標
● Recovery Point Objective (RPO) - 復原點目標
Business Impact Analysis
8. ● Once the service interruption is detected, the RTO is the time
from that point until the service is restored to its regular service
level.
● According to Organization(Operational) Level Agreement (OLA,
運行級別協議)
Recovery Time Objective (RTO)
9. Recovery Point Objective (RPO)
The Recovery Point Objective is the acceptable amount of data that
can be listed in the event of a downtime.
14. Backup and Restore (1)
Amazon S3 is the destination for
data backup. For long term data
storage, Amazon Glacier can be
used.
15. Backup and Restore (2)
If a disaster occurs, we need to
recover the data very quickly and
reliably. It can be executed by:
● Manual Intervention
● AWS Lambda performing the
health check of Route 53
17. Pilot light(1)
In the Pilot Light method the core
piece of the system such as a
database is already running and up
to date in AWS.
The database is always activated
for data replication and for the
other layers, server images are
created and updated periodically.
18. Pilot light(2)
Route 53 will automatically fail
over to the warm standby, and,
with the use of Lambda
Amazon Cloud Formation can be
used to automate the provisioning
of these services. We can configure
load balancing and auto-scaling so
that when the traffic goes high the
service will scale up automatically.
20. Warm Standby(1)
Warm Standby is an extended
version of Pilot Light
In the preparation phase, an on-site
solution and an AWS solution run
side-by-side. The warm standby is
always running and fully functional,
but with the minimal amount of
resources.
21. Warm Standby(2)
It uses Route 53 with the failover
routing policy, requiring to
implement a fail-fast code strategy,
to switch traffic to the new master
database.
Scale horizontally, to accommodate
for the current production traffic.
In order to build resiliency,
use multiple AZs.
23. Multi-site active-active (1)
In Multi-Site, the application runs
in AWS as well as on the existing
infrastructure also.
Here the DNS service supports
weighted routing. The traffic will go
to the standby infrastructure as
well as the existing infrastructure.
24. Multi-site active-active (2)
If a disaster occurs on the existing
system, the whole traffic is routed
to the new AWS environment. By
using auto-scaling, the capacity of
services rapidly increases to handle
the full production load.