Blazeclan

1
Agenda
Introduction
High Availability
Scalability
Fault Tolerance

AWS Global Infrastructure
Key Design Concepts
Design for Failure
Scaling
Self Healing / Fault Tolerant
Multiple AZ Architecture
Loose Coupling

Sample Architectures
Blazeclan

2

Cloud IT Better
Introduction

Blazeclan

3

Cloud IT Better
How Often Do You See This?

Blazeclan

4

Cloud IT Better
Cost of Downtime
A report published in 2010 for top
412 eCommerce sites says,
• The median length of downtime was 840
minutes

• On average, each of them saw 3291 minutes
of downtime

Lost Revenue
• On average, each of them lost $800,099 in
revenue due to downtime

• The total amount of revenue lost due to
downtime
of
all
was $329,640,928!

Blazeclan

412

companies

5

Cloud IT Better
Online Business & Downtime Facts
The Average Hourly Loss because
of Data Center Down Time in 2012

Source: http://www.techrepublic.com/blog/data-center/infographic-the-outrageous-costs-of-data-center-downtime

Blazeclan

6

Cloud IT Better
How to Build a HIGHLY
AVAILABLE, SCALABLE,
DURABLE AND
RESILIENT Web Application

Blazeclan

7

Cloud IT Better
High Availability
99.999%

• Up Time of an Application

uptime

• Planned or Unplanned Outage or Downtime
• Offline, Unreachable, or Partially Available
• Slow to Use

• Goal
• No Downtime
• Always Available

Blazeclan

8

Cloud IT Better
Scalability
Ability of an
Application to
accommodate
change in traffic
without
architectural
changes

Availability may
be impacted if
application
cannot Scale

Resources

Demand

Scalability
doesn’t
Guarantee
Availability

Blazeclan

Time

9

Cloud IT Better
Fault Tolerance
X

• Built-in Redundancy so
applications can Continue
Functioning when Components
fail

X

• Fault tolerance is crucial to
High Availability

Image courtesy: Gigamone.com

Blazeclan

10

Cloud IT Better
AWS Global
Infrastructure

Blazeclan

11

Cloud IT Better
AWS democratizes High Availability
• Multiple Servers
• Isolated Redundant Data
Centers

• Regions across the
Globe

• Availability Zones within

Source: http://aws.amazon.com/about-aws/globalinfrastructure/#reglink-sa

Regions

Blazeclan

12

Cloud IT Better
AWS Capacity

Source: http://www.slideshare.net/AmazonWebServices/aws-webinar-scaling-on-aws-for-the-first-10-million-users

Blazeclan

13

Cloud IT Better
AWS Platform

Source : http://www.slideshare.net/AmazonWebServices/aws-webinar-scaling-on-aws-for-the-first-10-million-users

Blazeclan

14

Cloud IT Better
AWS Building Blocks
Inherently Highly Available
and Fault Tolerant Services

 Amazon S3

 Amazon DynamoDB


Amazon SNS

 Amazon CloudFront
 Amazon SES
 Amazon Route53

Architect Across AZ’s

Span Across AZ’s

 Amazon SQS

Highly Available with Right
Architecture

 Amazon EC2
 Amazon EBS
 Amazon RDS
 Amazon VPC

 Amazon SWF
 Elastic Load Balancer
 …

Blazeclan

15

Cloud IT Better
Design For
Failure

Blazeclan

16

Cloud IT Better
Everything fails, all the time
– Werner Vogels, CTO, Amazon
Avoid
single
points of
failure
Application
Should
Continue to
Function
Assume
everything
fails, and
work
backwards

Obama’s Prized Limo after it
broke down in his Israel visit!

Blazeclan

17

Avoid Impact on
Business

Cloud IT Better
Ask Questions for Right Architecture

What kind of
Scenarios do I
have to
plan for?

What are my
single points
of failure?

If there are
master and slaves
In your architecture,
what if the master
node fails?

Blazeclan

If a load balancer
is sitting in front
of an array of application
servers, what if
that load
balancer fails?

What happens
if a node in your
system fails?

18

Cloud IT Better
Lots of Questions
How do you recognize
that failure?

How do I replace that node?

What if the cache keys grow beyond
memory limit of an instance?

How does the failover occur &
how is a new slave instantiated &
brought into sync with the master?

What if downstream service
times out or returns an exception?

Blazeclan

19

Cloud IT Better
Build Mechanisms to Handle Failure
• Build process threads that resume on reboot

• Allow the state of the system to re-sync
by reloading messages from queues

• Keep pre-configured and pre-optimized
virtual images to support above point
on launch/boot

• Avoid in-memory sessions or stateful
user context, move that to data stores
Image courtesy: http://www.outsmarthormones.com/wp-content/uploads/2011/06/Fix.jpg

• Have a coherent backup and restore
strategy for your data and automate it
Blazeclan

20

Cloud IT Better
Design for Failure

Source:
http://media.amazonwebservices.com/architecturecenter/AWS_ac_
ra_ftha_04.pdf

Blazeclan

21

Cloud IT Better
Scaling

Blazeclan

22

Cloud IT Better
Auto Scaling
• Enables to automatically scale
Amazon EC2 capacity up or down

• Enables to terminate Server
Instances at will

• Enables to add more instances
in response to an increasing load

• Enables launch of a replacement

Image Courtesy: http://www.knovelblogs.com/wp-content/uploads

instance immediately, in case of a failure

• Enables application to transition
seamlessly in case the primary server fails
Blazeclan

23

Cloud IT Better
Elastic Load Balancing (ELB)
• Distributes incoming traffic to a
application across several Amazon
EC2 instances

• ELB is given a DNS host name &
Requests Sent to this host name
are Delegated to a pool
of Amazon EC2 instances

• ELB Detects Unhealthy Instances
within its pool of Amazon EC2 instances and automatically
reroutes traffic to healthy instances, until the unhealthy
instances have been restored
Blazeclan

24

Cloud IT Better
ELB & Auto Scaling
• Auto Scaling & ELB are
an ideal combination

• ELB gives a single DNS
name for addressing

• Auto

Scaling ensures
there is always the right
number
of
healthy
Amazon EC2 instances to
accept requests

Blazeclan

25

Cloud IT Better
Fault
Tolerant

Blazeclan

26

Cloud IT Better
Fault Tolerance
• In order to build fault-tolerant
applications on Amazon EC2,
it’s important to follow best
practices such as,
• Quickly being able to commission
replacement instances

• Using Amazon EBS for persistent
storage

• Use Multiple Availability Zones and
elastic IP addresses.

Blazeclan

27

Cloud IT Better
Multi-AZ
Architecture

Blazeclan

28

Cloud IT Better
Multi-AZ Design Considerations
• Achieve greater Fault Tolerance
by Distributing your application geographically

• The Amazon EC2 service level
agreement commitment is 99.95%
availability for each Amazon EC2 Region

• Deploy application that spans
across multiple Availability Zones

• Redundant instances for each tier of an

Image Courtesy: http://chriscampcommunications.blogspot.in

application could be placed in distinct Availability Zones

• ELB can automatically balance traffic across multiple instances &
multiple Availability Zones
Blazeclan

29

Cloud IT Better
Multi- AZ Architecture

Blazeclan

30

Cloud IT Better
Loose
Coupling

Blazeclan

31

Cloud IT Better
Loose Coupled Systems

• Loosely coupled systems are
more fault tolerant and can achieve
a bigger scale

• Loosely coupled systems on AWS
• De-coupling systems allows for hybrid models
(in-cloud + in-physical data center)
• Balancing between clusters enables easier scaling
• Using queues (Amazon SQS) buffers against failures

• Design for a jumble of black boxes
Blazeclan

32

Cloud IT Better
Decoupling using SQS

Blazeclan

33

Cloud IT Better
Loose Coupling - Best Practices on AWS
• Use Amazon SQS to isolate components
• Use Amazon SQS as buffers between components

• Design every component such that it expose a service
interface and is responsible for its own scalability and
interacts with other components asynchronously

• Bundle the logical construct of a component
into an Amazon Machine Image so that it can
be deployed more often

• Make your applications as stateless as
possible. Store session state outside of component
(in Amazon SimpleDB, if appropriate)
Blazeclan

34

Cloud IT Better
Sample
Architectures

Blazeclan

35

Cloud IT Better
High Availability Architecture in RDS

Blazeclan

36

Cloud IT Better
Web Hosting on AWS

Blazeclan

37

Cloud IT Better
Scalable Reader Farm

Blazeclan

38

Cloud IT Better
Design for High Availability & Scale
Don’t let this happen to your Business

Our AWS Expert Solution Architects can help
you review your Architecture.

Avail for our 2hr Free Consultancy!
For any assistance please contact us at
info@blazeclan.com
Blazeclan

39

Cloud IT Better
Upcoming Webinars
Check out Our Upcoming Webinars
www.blazeclan.com/webinars

Blazeclan

40

Cloud IT Better
Thank you
info@blazeclan.com
Follow Us On :
Our Blog :
Blazeclan

http://blog.blazeclan.com/

How to Design for High Availability & Scale with AWS

  • 1.
  • 2.
    Agenda Introduction High Availability Scalability Fault Tolerance AWSGlobal Infrastructure Key Design Concepts Design for Failure Scaling Self Healing / Fault Tolerant Multiple AZ Architecture Loose Coupling Sample Architectures Blazeclan 2 Cloud IT Better
  • 3.
  • 4.
    How Often DoYou See This? Blazeclan 4 Cloud IT Better
  • 5.
    Cost of Downtime Areport published in 2010 for top 412 eCommerce sites says, • The median length of downtime was 840 minutes • On average, each of them saw 3291 minutes of downtime Lost Revenue • On average, each of them lost $800,099 in revenue due to downtime • The total amount of revenue lost due to downtime of all was $329,640,928! Blazeclan 412 companies 5 Cloud IT Better
  • 6.
    Online Business &Downtime Facts The Average Hourly Loss because of Data Center Down Time in 2012 Source: http://www.techrepublic.com/blog/data-center/infographic-the-outrageous-costs-of-data-center-downtime Blazeclan 6 Cloud IT Better
  • 7.
    How to Builda HIGHLY AVAILABLE, SCALABLE, DURABLE AND RESILIENT Web Application Blazeclan 7 Cloud IT Better
  • 8.
    High Availability 99.999% • UpTime of an Application uptime • Planned or Unplanned Outage or Downtime • Offline, Unreachable, or Partially Available • Slow to Use • Goal • No Downtime • Always Available Blazeclan 8 Cloud IT Better
  • 9.
    Scalability Ability of an Applicationto accommodate change in traffic without architectural changes Availability may be impacted if application cannot Scale Resources Demand Scalability doesn’t Guarantee Availability Blazeclan Time 9 Cloud IT Better
  • 10.
    Fault Tolerance X • Built-inRedundancy so applications can Continue Functioning when Components fail X • Fault tolerance is crucial to High Availability Image courtesy: Gigamone.com Blazeclan 10 Cloud IT Better
  • 11.
  • 12.
    AWS democratizes HighAvailability • Multiple Servers • Isolated Redundant Data Centers • Regions across the Globe • Availability Zones within Source: http://aws.amazon.com/about-aws/globalinfrastructure/#reglink-sa Regions Blazeclan 12 Cloud IT Better
  • 13.
  • 14.
    AWS Platform Source :http://www.slideshare.net/AmazonWebServices/aws-webinar-scaling-on-aws-for-the-first-10-million-users Blazeclan 14 Cloud IT Better
  • 15.
    AWS Building Blocks InherentlyHighly Available and Fault Tolerant Services  Amazon S3  Amazon DynamoDB  Amazon SNS  Amazon CloudFront  Amazon SES  Amazon Route53 Architect Across AZ’s Span Across AZ’s  Amazon SQS Highly Available with Right Architecture  Amazon EC2  Amazon EBS  Amazon RDS  Amazon VPC  Amazon SWF  Elastic Load Balancer  … Blazeclan 15 Cloud IT Better
  • 16.
  • 17.
    Everything fails, allthe time – Werner Vogels, CTO, Amazon Avoid single points of failure Application Should Continue to Function Assume everything fails, and work backwards Obama’s Prized Limo after it broke down in his Israel visit! Blazeclan 17 Avoid Impact on Business Cloud IT Better
  • 18.
    Ask Questions forRight Architecture What kind of Scenarios do I have to plan for? What are my single points of failure? If there are master and slaves In your architecture, what if the master node fails? Blazeclan If a load balancer is sitting in front of an array of application servers, what if that load balancer fails? What happens if a node in your system fails? 18 Cloud IT Better
  • 19.
    Lots of Questions Howdo you recognize that failure? How do I replace that node? What if the cache keys grow beyond memory limit of an instance? How does the failover occur & how is a new slave instantiated & brought into sync with the master? What if downstream service times out or returns an exception? Blazeclan 19 Cloud IT Better
  • 20.
    Build Mechanisms toHandle Failure • Build process threads that resume on reboot • Allow the state of the system to re-sync by reloading messages from queues • Keep pre-configured and pre-optimized virtual images to support above point on launch/boot • Avoid in-memory sessions or stateful user context, move that to data stores Image courtesy: http://www.outsmarthormones.com/wp-content/uploads/2011/06/Fix.jpg • Have a coherent backup and restore strategy for your data and automate it Blazeclan 20 Cloud IT Better
  • 21.
  • 22.
  • 23.
    Auto Scaling • Enablesto automatically scale Amazon EC2 capacity up or down • Enables to terminate Server Instances at will • Enables to add more instances in response to an increasing load • Enables launch of a replacement Image Courtesy: http://www.knovelblogs.com/wp-content/uploads instance immediately, in case of a failure • Enables application to transition seamlessly in case the primary server fails Blazeclan 23 Cloud IT Better
  • 24.
    Elastic Load Balancing(ELB) • Distributes incoming traffic to a application across several Amazon EC2 instances • ELB is given a DNS host name & Requests Sent to this host name are Delegated to a pool of Amazon EC2 instances • ELB Detects Unhealthy Instances within its pool of Amazon EC2 instances and automatically reroutes traffic to healthy instances, until the unhealthy instances have been restored Blazeclan 24 Cloud IT Better
  • 25.
    ELB & AutoScaling • Auto Scaling & ELB are an ideal combination • ELB gives a single DNS name for addressing • Auto Scaling ensures there is always the right number of healthy Amazon EC2 instances to accept requests Blazeclan 25 Cloud IT Better
  • 26.
  • 27.
    Fault Tolerance • Inorder to build fault-tolerant applications on Amazon EC2, it’s important to follow best practices such as, • Quickly being able to commission replacement instances • Using Amazon EBS for persistent storage • Use Multiple Availability Zones and elastic IP addresses. Blazeclan 27 Cloud IT Better
  • 28.
  • 29.
    Multi-AZ Design Considerations •Achieve greater Fault Tolerance by Distributing your application geographically • The Amazon EC2 service level agreement commitment is 99.95% availability for each Amazon EC2 Region • Deploy application that spans across multiple Availability Zones • Redundant instances for each tier of an Image Courtesy: http://chriscampcommunications.blogspot.in application could be placed in distinct Availability Zones • ELB can automatically balance traffic across multiple instances & multiple Availability Zones Blazeclan 29 Cloud IT Better
  • 30.
  • 31.
  • 32.
    Loose Coupled Systems •Loosely coupled systems are more fault tolerant and can achieve a bigger scale • Loosely coupled systems on AWS • De-coupling systems allows for hybrid models (in-cloud + in-physical data center) • Balancing between clusters enables easier scaling • Using queues (Amazon SQS) buffers against failures • Design for a jumble of black boxes Blazeclan 32 Cloud IT Better
  • 33.
  • 34.
    Loose Coupling -Best Practices on AWS • Use Amazon SQS to isolate components • Use Amazon SQS as buffers between components • Design every component such that it expose a service interface and is responsible for its own scalability and interacts with other components asynchronously • Bundle the logical construct of a component into an Amazon Machine Image so that it can be deployed more often • Make your applications as stateless as possible. Store session state outside of component (in Amazon SimpleDB, if appropriate) Blazeclan 34 Cloud IT Better
  • 35.
  • 36.
    High Availability Architecturein RDS Blazeclan 36 Cloud IT Better
  • 37.
    Web Hosting onAWS Blazeclan 37 Cloud IT Better
  • 38.
  • 39.
    Design for HighAvailability & Scale Don’t let this happen to your Business Our AWS Expert Solution Architects can help you review your Architecture. Avail for our 2hr Free Consultancy! For any assistance please contact us at info@blazeclan.com Blazeclan 39 Cloud IT Better
  • 40.
    Upcoming Webinars Check outOur Upcoming Webinars www.blazeclan.com/webinars Blazeclan 40 Cloud IT Better
  • 41.
    Thank you info@blazeclan.com Follow UsOn : Our Blog : Blazeclan http://blog.blazeclan.com/