AWS fault tolerant architecture

S
+
Dynamic Fault Tolerant
Applications using AWS
Sumit Kadyan
University Of Victoria
+
Agenda
 Motivation
 How do we design FT web services on AWS
 Research in Load Balancing Algorithms
 Future Study
 Questions!!
+
Motivation
 Not everything on the cloud is fault tolerant!!
 You have to design it to be Fault Tolerant
 AWS offers Dynamic Fault tolerance
 Around 40% of the users using AWS do not deploy any redundancy in
their setup.
 The price involved in using resources on the cloud has fallen by
Roughly 2500% in 7 years.
 AWS service warranty claims 99.95% availability. That‟s around 4
hours downtime in a year.
+
Inherent Fault tolerant components
 Amazon Simple storage (S3)
 Amazon Elastic Load Balancing(ELB)
 Amazon Elastic Compute Cloud(EC2)
 Amazon Elastic Block Store (EBS)
“The above inherit Fault tolerant components provide features
such as AZ, Elastic IP‟s , Snapshots that a Fault Tolerant HA
system must take advantage of and use Correctly” .
Simply said AWS has given you the resources to make HA / FT
applications.
+
AWS Components
 Amazon EC2 (Amazon Elastic Compute
Cloud) :- Web service that provides
computing resources i.e. server
instances to host your software.
 AMI (Amazon Machine Image) :
Template basically contains s/w & h/w
configuration applied to instance type.
 EBS (Elastic Block Store) :- Block Level
storage volumes for EC2‟s. Not
associated with instance. AFR is around
.1 to .5 %.
+
Availability Zones
 Amazon AZ are zones within same region.
 Engineered to be insulated from failures of other AZ‟s.
 Independent Power, cooling, network & security.
+
Elastic IP Addresses
 Public IP addresses that can be
mapped to any EC2 Instance within
a particular EC2 region.
 Addresses are associated with AWS
account and not the instance.
 In case of failure of EC2 Component
, detach Elastic IP from the failed
component and map it to a reserve
EC2.
 Mapping downtime around 1-2 Mins.
+
Auto Scaling
 Auto Scaling enables you to automatically scale up or down the
EC2 capacity.
 You Define your own rules to achieve this. E.g. When no of
running EC2‟s < X , launch Y EC2‟s.
 Use metrics from Amazon CloudWatch to launch/terminate
EC2‟s . E.g. resource utilization above certain threshold.
 E.g. of AS & ELB next ->
+
Elastic Load Balancing
 Elastic Load Balancer distributes
incoming traffic across available EC2
instances.
 Monitors EC2‟s and removes Failed
EC2 resources.
 Works in parallel with Auto Scaling to
provide FT.
+
Implement N+1 Redundancy Auto
Scaling & ELB
 Lets say N=1 .
 Define rule X :- 2 Instances of defined AMI always available.
 ELB distributes load among the 2 servers. Enough capacity for
each server to handle the entire capacity i.e. N=1
 Server 1 Goes down
 Server 2 can process the entire traffic.
 Auto Scaling identifies failure and launches healthy EC2 using
the AMI to fulfill rule X.
+
Fault Tolerance Web Design
 Architecting High Availability in AWS
 High Availability in the Web/App Layer
 High Availability in the Load Balancing Layer
 High Availability in the Database Layer
+
Web/App Layer
 It is a common practice to launch the Web/App layer in more
than one EC2 Instance to avoid SPOF.
 How would user session information be shared between the
EC2 servers?
 It is hence necessary to synchronize session data among EC2
servers.
 Not every user can work with stateless server configurations.
+
Web/App Layer
+
Web/App Layer
 Option 1 : JGroups
 Toolkit for reliable messaging
 Can be used by Java based servers.
 Suited for max of around 5-10 EC2‟s.
 Not suited for larger architectures.
+
Web/App server
 Option 3 : RDMS
 Many use it but considered poor design.
 Master will be overwhelmed by session
requests.
 A m1.RDS MySQL Master has max 600
connections. 400 online users will
generate session requests. Only 200
connections left to serve transaction/user
authentication requests.
 Can cause intermittent web service
downtime due to above reason.
+
Web/App Layer
 Option 2:- MemCached
 Highly Used , Supports multiple
platforms.
 Save user session data in multiple
nodes to avoid SPOF (trade off
latency to write to multiple nodes)
 Depending on requirements create
high memory EC2 instances for
MemCached/Elasti Cache.
 Can scale up to tens of thousands of
requests.
+
Load Balancing Layer
 It balances the load among the available EC2 instances.
 SPOF in the LB can bring down the entire site during outage.
 Equally important as replicating servers, databases etc.
 Many ways to build highly available Load balancing Tier.
+
Load Balancing Tier
 Option 1: Elastic Load Balancer
 Inherently Fault Tolerant.
 Automatically distributes incoming traffic
among EC2 Instances.
 Automatically creates more ELB EC2
Instance when load increases to avoid
SPOF.
 Detects health of EC2 Instances and
routes to only healthy instances.
+
ELB Implementation Architecture
Single Server Setup
 Not Recommended , yet most
followed!!
 What is there to balance !!!??
 No fault tolerance benefit.
 SPOF in the terms of LB & EC2
instance.
+
ELB Implementation Architecture
Multi-Server Setup (in AZ)
 HTTP/S requests are directed to EC2
by the ELB.
 Multiple EC2 instances in same AZ
under ELB tier.
 ELB load balances the requests
between the Web/App EC2 instances.
+
ELB Implementation Architecture
ELB with Auto Scaling(inside AZ)
 Web/App Ec2 are configured with
AutoScaling to scale out/down.
 Amazon ELB can direct the load
seamlessly to the EC2 instances
configured with AutoScaling.
+
ELB Implementation Architecture
Multiple AZ’s inside a Region
 Multiple Web/App EC2 instances can
reside across multiple AZ‟s inside a
AWS region.
 ELB is doing multi AZ load balancing.
+
ELB Implementation Architecture
ELB with Amazon AutoScaling
across AZ’s
 EC2 can be configured with
amazon autoscaling to scale
out/down across AZ’s.
 Highly recommended . Highest
Availability offered among all ELB
implementations.
+
Issues with ELB
 Supports only round-robin & sticky session algorithms.
Weighted as of 2013.
 Designed to handle incremental traffic. Sudden Flash traffic can
lead to non availability until scaling up occurs.
 The ELB needs to be “Pre-warmed” to handle sudden traffic.
Currently not configurable from the AWS console.
 Known to be “non – round robin” when requests are generated
from single or specific range of IP‟s.
 Like multiple requests from within a company operating on a
specific range of IP.
+
3rd party Load Balancer
 3rd Party Load Balancers
 Nginx & Haproxy to work as Load
Balancers.
 Use your own scripts to scale up EC2 „s
& LB‟s.
 AutoScaling Works best with ELB.
+
Load Balancing Algorithms
 Random :- Send connection requests to server randomly (Simple
but inefficient)
 Round Robin :- Round Robin passes each new connection request
to next server in line. Eventually distributing connections evenly.
 Weighted Round Robin :- Assign weights to Machines based on the
capacity , no of connections each machine receives depends on
weights.
 More Algos such as Least Connections, Fastest etc.
+
Proposed Research
 A Load Balancing Algorithm that adapts its strategies for
allocating web requests dynamically.
 Prober :- Gather Status info from Web Servers every 50 ms.
 CPU Load on server
 Server‟s response rate
 No of requests served
 Allocator: - Based on prober update , allocator updates weights
allocated.
 The proposed algo differs by considering local & local
information at each web server to choose the best server to
allocate request.
+
Real Time Server Stats Load
Balancing (RTSLB)
Deciding Factors used in algorithm
 Weighted metric of cache hits on different servers.
 CPU Load of Web Server
 Server Response Rate
 No of Clients requests being handled
+
Architecture
+
Algorithm
+
Results
RTSLB outperforms the other Load based algorithms. The difference would
be much higher if the no of connections would increase.
+
Future Study
 Neural Networks based LB algorithms have a promising future.
 Increasing availability by further improving existing LB
Algorithms.
 Studying the results in a cloud environment.
+
Questions
1 of 33

Recommended

Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC by
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Amazon Web Services
6K views35 slides
AWSome Day Intro by
AWSome Day IntroAWSome Day Intro
AWSome Day IntroAmazon Web Services
3.9K views27 slides
Designing Fault Tolerant Applications on AWS - Janakiram MSV by
Designing Fault Tolerant Applications on AWS - Janakiram MSVDesigning Fault Tolerant Applications on AWS - Janakiram MSV
Designing Fault Tolerant Applications on AWS - Janakiram MSVAmazon Web Services
1.3K views166 slides
Deep Dive On Serverless App Development by
Deep Dive On Serverless App DevelopmentDeep Dive On Serverless App Development
Deep Dive On Serverless App DevelopmentAmazon Web Services
549 views63 slides
Deep Dive on Serverless App Development by
Deep Dive on Serverless App DevelopmentDeep Dive on Serverless App Development
Deep Dive on Serverless App DevelopmentAmazon Web Services
397 views62 slides
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances by
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot InstancesWKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot InstancesAmazon Web Services
505 views32 slides

More Related Content

What's hot

Cost Optimisation by
Cost OptimisationCost Optimisation
Cost OptimisationAmazon Web Services
1.7K views28 slides
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances by
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot InstancesWKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot InstancesAmazon Web Services
1.3K views41 slides
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances by
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot InstancesWKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot InstancesAmazon Web Services
368 views41 slides
Advanced Container Management and Scheduling by
Advanced Container Management and SchedulingAdvanced Container Management and Scheduling
Advanced Container Management and SchedulingAmazon Web Services
320 views58 slides
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017 by
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Amazon Web Services
327 views53 slides
Deep Dive on Microservices and Docker by
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and DockerKristana Kane
254 views59 slides

What's hot(20)

WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances by Amazon Web Services
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot InstancesWKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances
Amazon Web Services1.3K views
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances by Amazon Web Services
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot InstancesWKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances
WKS401 Deploy a Deep Learning Framework on Amazon ECS and EC2 Spot Instances
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017 by Amazon Web Services
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Deep Dive on Microservices and Docker by Kristana Kane
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and Docker
Kristana Kane254 views
AWS June 2016 Webinar Series - AWS Quarterly Update by Amazon Web Services
AWS June 2016 Webinar Series - AWS Quarterly Update AWS June 2016 Webinar Series - AWS Quarterly Update
AWS June 2016 Webinar Series - AWS Quarterly Update
Architecting for High Availability - Pop-up Loft Tel Aviv by Amazon Web Services
Architecting for High Availability - Pop-up Loft Tel AvivArchitecting for High Availability - Pop-up Loft Tel Aviv
Architecting for High Availability - Pop-up Loft Tel Aviv
Amazon Web Services1.3K views
Workshop: Deploy a Deep Learning Framework on Amazon ECS by Amazon Web Services
Workshop: Deploy a Deep Learning Framework on Amazon ECSWorkshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECS
How AWS is reinventing the cloud by javier ramirez
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
javier ramirez773 views
Deep Dive into Apache MXNet on AWS by Kristana Kane
Deep Dive into Apache MXNet on AWSDeep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWS
Kristana Kane144 views
AWS AI Media & Entertainment Seminar - NYC, August 15, 2017 by Amazon Web Services
AWS AI Media & Entertainment Seminar - NYC, August 15, 2017AWS AI Media & Entertainment Seminar - NYC, August 15, 2017
AWS AI Media & Entertainment Seminar - NYC, August 15, 2017
Amazon Web Services1.8K views
使用 AWS 負載平衡服務讓您的應用程式規模化 by Amazon Web Services
使用 AWS 負載平衡服務讓您的應用程式規模化使用 AWS 負載平衡服務讓您的應用程式規模化
使用 AWS 負載平衡服務讓您的應用程式規模化
Amazon Web Services1.8K views
網路安全自動化 - 縮短應用維安的作業時間 by Amazon Web Services
網路安全自動化 - 縮短應用維安的作業時間網路安全自動化 - 縮短應用維安的作業時間
網路安全自動化 - 縮短應用維安的作業時間

Similar to AWS fault tolerant architecture

AWS.doc by
AWS.docAWS.doc
AWS.docRakeshKumarKumar11
42 views37 slides
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403) by
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)Amazon Web Services
6.2K views80 slides
Deep Dive on Elastic Load Balancing by
Deep Dive on Elastic Load BalancingDeep Dive on Elastic Load Balancing
Deep Dive on Elastic Load BalancingAmazon Web Services
929 views47 slides
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel Aviv by
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel AvivElastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel Aviv
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel AvivAmazon Web Services
8K views50 slides
Scaling drupal horizontally and in cloud by
Scaling drupal horizontally and in cloudScaling drupal horizontally and in cloud
Scaling drupal horizontally and in cloudVladimir Ilic
9.6K views38 slides
Aws 201:Advanced Breakout Track on HA and DR by
Aws 201:Advanced Breakout Track on HA and DRAws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRHarish Ganesan
23.8K views57 slides

Similar to AWS fault tolerant architecture(20)

AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403) by Amazon Web Services
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)
Amazon Web Services6.2K views
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel Aviv by Amazon Web Services
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel AvivElastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel Aviv
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel Aviv
Scaling drupal horizontally and in cloud by Vladimir Ilic
Scaling drupal horizontally and in cloudScaling drupal horizontally and in cloud
Scaling drupal horizontally and in cloud
Vladimir Ilic9.6K views
Aws 201:Advanced Breakout Track on HA and DR by Harish Ganesan
Aws 201:Advanced Breakout Track on HA and DRAws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DR
Harish Ganesan23.8K views
AWS Elastic Load Balancing for AWS Architect & SysOps Certification by Sanjay Sharma
AWS Elastic Load Balancing for AWS Architect & SysOps CertificationAWS Elastic Load Balancing for AWS Architect & SysOps Certification
AWS Elastic Load Balancing for AWS Architect & SysOps Certification
Sanjay Sharma493 views
Aws cloud infrastructure and cost estimation for angular site by Le Kien Truc
Aws cloud infrastructure and cost estimation for angular siteAws cloud infrastructure and cost estimation for angular site
Aws cloud infrastructure and cost estimation for angular site
Le Kien Truc1.3K views
Introduction to EC2 by Mark Squires
Introduction to EC2Introduction to EC2
Introduction to EC2
Mark Squires1.1K views
Auto scaling websites in the cloud by David Veksler
Auto scaling websites in the cloudAuto scaling websites in the cloud
Auto scaling websites in the cloud
David Veksler533 views
Aws interview questions and answers by kavinilavuG
Aws interview questions and answersAws interview questions and answers
Aws interview questions and answers
kavinilavuG207 views
Overview oracle-e-business-suite-aws by Alf Baez
Overview oracle-e-business-suite-awsOverview oracle-e-business-suite-aws
Overview oracle-e-business-suite-aws
Alf Baez75 views
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia by Amazon Web Services
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh VariaAWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia

Recently uploaded

7 NOVEL DRUG DELIVERY SYSTEM.pptx by
7 NOVEL DRUG DELIVERY SYSTEM.pptx7 NOVEL DRUG DELIVERY SYSTEM.pptx
7 NOVEL DRUG DELIVERY SYSTEM.pptxSachin Nitave
58 views35 slides
REPRESENTATION - GAUNTLET.pptx by
REPRESENTATION - GAUNTLET.pptxREPRESENTATION - GAUNTLET.pptx
REPRESENTATION - GAUNTLET.pptxiammrhaywood
83 views26 slides
The Accursed House by Émile Gaboriau by
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile GaboriauDivyaSheta
158 views15 slides
11.28.23 Social Capital and Social Exclusion.pptx by
11.28.23 Social Capital and Social Exclusion.pptx11.28.23 Social Capital and Social Exclusion.pptx
11.28.23 Social Capital and Social Exclusion.pptxmary850239
281 views25 slides
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptx by
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptxCollective Bargaining and Understanding a Teacher Contract(16793704.1).pptx
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptxCenter for Integrated Training & Education
90 views57 slides
Psychology KS5 by
Psychology KS5Psychology KS5
Psychology KS5WestHatch
77 views5 slides

Recently uploaded(20)

7 NOVEL DRUG DELIVERY SYSTEM.pptx by Sachin Nitave
7 NOVEL DRUG DELIVERY SYSTEM.pptx7 NOVEL DRUG DELIVERY SYSTEM.pptx
7 NOVEL DRUG DELIVERY SYSTEM.pptx
Sachin Nitave58 views
REPRESENTATION - GAUNTLET.pptx by iammrhaywood
REPRESENTATION - GAUNTLET.pptxREPRESENTATION - GAUNTLET.pptx
REPRESENTATION - GAUNTLET.pptx
iammrhaywood83 views
The Accursed House by Émile Gaboriau by DivyaSheta
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile Gaboriau
DivyaSheta158 views
11.28.23 Social Capital and Social Exclusion.pptx by mary850239
11.28.23 Social Capital and Social Exclusion.pptx11.28.23 Social Capital and Social Exclusion.pptx
11.28.23 Social Capital and Social Exclusion.pptx
mary850239281 views
Psychology KS5 by WestHatch
Psychology KS5Psychology KS5
Psychology KS5
WestHatch77 views
JiscOAWeek_LAIR_slides_October2023.pptx by Jisc
JiscOAWeek_LAIR_slides_October2023.pptxJiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptx
Jisc79 views
Narration lesson plan.docx by TARIQ KHAN
Narration lesson plan.docxNarration lesson plan.docx
Narration lesson plan.docx
TARIQ KHAN104 views
Community-led Open Access Publishing webinar.pptx by Jisc
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptx
Jisc74 views
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively by PECB
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
PECB 545 views
Are we onboard yet University of Sussex.pptx by Jisc
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptx
Jisc77 views
AUDIENCE - BANDURA.pptx by iammrhaywood
AUDIENCE - BANDURA.pptxAUDIENCE - BANDURA.pptx
AUDIENCE - BANDURA.pptx
iammrhaywood69 views
Narration ppt.pptx by TARIQ KHAN
Narration  ppt.pptxNarration  ppt.pptx
Narration ppt.pptx
TARIQ KHAN119 views
Lecture: Open Innovation by Michal Hron
Lecture: Open InnovationLecture: Open Innovation
Lecture: Open Innovation
Michal Hron96 views
Create a Structure in VBNet.pptx by Breach_P
Create a Structure in VBNet.pptxCreate a Structure in VBNet.pptx
Create a Structure in VBNet.pptx
Breach_P70 views
Dance KS5 Breakdown by WestHatch
Dance KS5 BreakdownDance KS5 Breakdown
Dance KS5 Breakdown
WestHatch68 views

AWS fault tolerant architecture

  • 1. + Dynamic Fault Tolerant Applications using AWS Sumit Kadyan University Of Victoria
  • 2. + Agenda  Motivation  How do we design FT web services on AWS  Research in Load Balancing Algorithms  Future Study  Questions!!
  • 3. + Motivation  Not everything on the cloud is fault tolerant!!  You have to design it to be Fault Tolerant  AWS offers Dynamic Fault tolerance  Around 40% of the users using AWS do not deploy any redundancy in their setup.  The price involved in using resources on the cloud has fallen by Roughly 2500% in 7 years.  AWS service warranty claims 99.95% availability. That‟s around 4 hours downtime in a year.
  • 4. + Inherent Fault tolerant components  Amazon Simple storage (S3)  Amazon Elastic Load Balancing(ELB)  Amazon Elastic Compute Cloud(EC2)  Amazon Elastic Block Store (EBS) “The above inherit Fault tolerant components provide features such as AZ, Elastic IP‟s , Snapshots that a Fault Tolerant HA system must take advantage of and use Correctly” . Simply said AWS has given you the resources to make HA / FT applications.
  • 5. + AWS Components  Amazon EC2 (Amazon Elastic Compute Cloud) :- Web service that provides computing resources i.e. server instances to host your software.  AMI (Amazon Machine Image) : Template basically contains s/w & h/w configuration applied to instance type.  EBS (Elastic Block Store) :- Block Level storage volumes for EC2‟s. Not associated with instance. AFR is around .1 to .5 %.
  • 6. + Availability Zones  Amazon AZ are zones within same region.  Engineered to be insulated from failures of other AZ‟s.  Independent Power, cooling, network & security.
  • 7. + Elastic IP Addresses  Public IP addresses that can be mapped to any EC2 Instance within a particular EC2 region.  Addresses are associated with AWS account and not the instance.  In case of failure of EC2 Component , detach Elastic IP from the failed component and map it to a reserve EC2.  Mapping downtime around 1-2 Mins.
  • 8. + Auto Scaling  Auto Scaling enables you to automatically scale up or down the EC2 capacity.  You Define your own rules to achieve this. E.g. When no of running EC2‟s < X , launch Y EC2‟s.  Use metrics from Amazon CloudWatch to launch/terminate EC2‟s . E.g. resource utilization above certain threshold.  E.g. of AS & ELB next ->
  • 9. + Elastic Load Balancing  Elastic Load Balancer distributes incoming traffic across available EC2 instances.  Monitors EC2‟s and removes Failed EC2 resources.  Works in parallel with Auto Scaling to provide FT.
  • 10. + Implement N+1 Redundancy Auto Scaling & ELB  Lets say N=1 .  Define rule X :- 2 Instances of defined AMI always available.  ELB distributes load among the 2 servers. Enough capacity for each server to handle the entire capacity i.e. N=1  Server 1 Goes down  Server 2 can process the entire traffic.  Auto Scaling identifies failure and launches healthy EC2 using the AMI to fulfill rule X.
  • 11. + Fault Tolerance Web Design  Architecting High Availability in AWS  High Availability in the Web/App Layer  High Availability in the Load Balancing Layer  High Availability in the Database Layer
  • 12. + Web/App Layer  It is a common practice to launch the Web/App layer in more than one EC2 Instance to avoid SPOF.  How would user session information be shared between the EC2 servers?  It is hence necessary to synchronize session data among EC2 servers.  Not every user can work with stateless server configurations.
  • 14. + Web/App Layer  Option 1 : JGroups  Toolkit for reliable messaging  Can be used by Java based servers.  Suited for max of around 5-10 EC2‟s.  Not suited for larger architectures.
  • 15. + Web/App server  Option 3 : RDMS  Many use it but considered poor design.  Master will be overwhelmed by session requests.  A m1.RDS MySQL Master has max 600 connections. 400 online users will generate session requests. Only 200 connections left to serve transaction/user authentication requests.  Can cause intermittent web service downtime due to above reason.
  • 16. + Web/App Layer  Option 2:- MemCached  Highly Used , Supports multiple platforms.  Save user session data in multiple nodes to avoid SPOF (trade off latency to write to multiple nodes)  Depending on requirements create high memory EC2 instances for MemCached/Elasti Cache.  Can scale up to tens of thousands of requests.
  • 17. + Load Balancing Layer  It balances the load among the available EC2 instances.  SPOF in the LB can bring down the entire site during outage.  Equally important as replicating servers, databases etc.  Many ways to build highly available Load balancing Tier.
  • 18. + Load Balancing Tier  Option 1: Elastic Load Balancer  Inherently Fault Tolerant.  Automatically distributes incoming traffic among EC2 Instances.  Automatically creates more ELB EC2 Instance when load increases to avoid SPOF.  Detects health of EC2 Instances and routes to only healthy instances.
  • 19. + ELB Implementation Architecture Single Server Setup  Not Recommended , yet most followed!!  What is there to balance !!!??  No fault tolerance benefit.  SPOF in the terms of LB & EC2 instance.
  • 20. + ELB Implementation Architecture Multi-Server Setup (in AZ)  HTTP/S requests are directed to EC2 by the ELB.  Multiple EC2 instances in same AZ under ELB tier.  ELB load balances the requests between the Web/App EC2 instances.
  • 21. + ELB Implementation Architecture ELB with Auto Scaling(inside AZ)  Web/App Ec2 are configured with AutoScaling to scale out/down.  Amazon ELB can direct the load seamlessly to the EC2 instances configured with AutoScaling.
  • 22. + ELB Implementation Architecture Multiple AZ’s inside a Region  Multiple Web/App EC2 instances can reside across multiple AZ‟s inside a AWS region.  ELB is doing multi AZ load balancing.
  • 23. + ELB Implementation Architecture ELB with Amazon AutoScaling across AZ’s  EC2 can be configured with amazon autoscaling to scale out/down across AZ’s.  Highly recommended . Highest Availability offered among all ELB implementations.
  • 24. + Issues with ELB  Supports only round-robin & sticky session algorithms. Weighted as of 2013.  Designed to handle incremental traffic. Sudden Flash traffic can lead to non availability until scaling up occurs.  The ELB needs to be “Pre-warmed” to handle sudden traffic. Currently not configurable from the AWS console.  Known to be “non – round robin” when requests are generated from single or specific range of IP‟s.  Like multiple requests from within a company operating on a specific range of IP.
  • 25. + 3rd party Load Balancer  3rd Party Load Balancers  Nginx & Haproxy to work as Load Balancers.  Use your own scripts to scale up EC2 „s & LB‟s.  AutoScaling Works best with ELB.
  • 26. + Load Balancing Algorithms  Random :- Send connection requests to server randomly (Simple but inefficient)  Round Robin :- Round Robin passes each new connection request to next server in line. Eventually distributing connections evenly.  Weighted Round Robin :- Assign weights to Machines based on the capacity , no of connections each machine receives depends on weights.  More Algos such as Least Connections, Fastest etc.
  • 27. + Proposed Research  A Load Balancing Algorithm that adapts its strategies for allocating web requests dynamically.  Prober :- Gather Status info from Web Servers every 50 ms.  CPU Load on server  Server‟s response rate  No of requests served  Allocator: - Based on prober update , allocator updates weights allocated.  The proposed algo differs by considering local & local information at each web server to choose the best server to allocate request.
  • 28. + Real Time Server Stats Load Balancing (RTSLB) Deciding Factors used in algorithm  Weighted metric of cache hits on different servers.  CPU Load of Web Server  Server Response Rate  No of Clients requests being handled
  • 31. + Results RTSLB outperforms the other Load based algorithms. The difference would be much higher if the no of connections would increase.
  • 32. + Future Study  Neural Networks based LB algorithms have a promising future.  Increasing availability by further improving existing LB Algorithms.  Studying the results in a cloud environment.