Your SlideShare is downloading. ×
13h00   aws 2012-fault_tolerant_applications
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

13h00 aws 2012-fault_tolerant_applications


Published on

Published in: Technology, Travel

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • We are going to talk today about building fault-tolerant systems, andlmorespecificically look at how AWS enables the cost effective and scalable design of these systems in ways that simply cannot be done otherwise.
  • So what types of faults are we trying to survive? If we stop and think about most applications there’s Really there are a wide array of different ways most applications can fail, the facitilits themselves, could have failures ranging from something extremely catasrophic like the building catching fire to something as simple as a power outage. Inside the facilities we are relying on a number of systems to be opperating, we have a network stack with routers, switches and firewalls, as well as servers and storage devices all of which very much have the ability to fail, either through hardware failures or configuraiton errors. And all of that is before we even get to the code for your applications and the peple that manage it, both of which are also potential areas of where failures can occur.
  • So what does fault tolerant mean, first its important to point out this sin’t an absolte there sin’t a magic easy button for this nor is there a one size fits all approach to building applications that can survive every possible failure, generally speaking there are costs associated with mitigating the risk of different types of failures as well as likelithoods of those fialuresoccuring so the design of these applications becomes an exercise in risk mitigation. For example of hard drives, the risk associated with a hard drive failing is pretty high compared to an entire datacenter being destroyed luckily the cost of mitigating against a failed hard drive is also far lower than building duplicate datacenters. The second bullet on there is very important, given that people, or human error, is probably the most common cause of failures for applications to truly be fualtfolerant they must leveage automation in the case of failure, this not only makes recovery much faster but also assures it happens in a known and controlled manner. And lastly if you don’t test your design, you won’t know if it works.
  • So here’s how we used to implimentfualt tolerance, it was really simple: build two of everything. Now there’s some signifigant problems with this approach, the obvious one is cost since your application just got 100% more expensive, and here in brazil which already has much higher server hw costs that can make the cost of mitigating against many types of failures impracticle for many applications. So what ends up happening is, again going back the risk mitigation idea someone is going to have to look at the cost of purchaings, maintaining and opperating a second instance of the applcaition and decide if its worth the cost.
  • I’m sure people are familiar with a lot of the commonly talked about beneifts of cloud computing from removing upfront capital costs to time to market and agility and in the area of fault tolerance these benefits all translate very well.
  • The upfront capital cost of adding a second server or mirroring stroage is gone for HA, backups are far simipler to use and extremely cost effective, and today with our release of Glacier which will revolutionize the way business backup and archive data that’s never been more true. From a DR perspective you can stage infrastructure and only launch and pay for it when its actually needed, versus paying for infrastrucuture 24/7 you hope to never use. Services that are part of AWS greatly simpliy making your applications highly available and fault tolerant, often at a m
  • With DR this becomes every evident, think of how often you actually use your DR site, hopefully your thinking of a really small number, now think of how youre paying for that. With AWS we have massive ammounts of infrastructure, in 8 different regions around the world and the ability of stage and programmically deploy infrastrcture to any of those regions. So you can stage your DR site and have it ready to spring into action if needed but only pay for what your actually using.
  • The next eveoltion beyond DR is HA, and HA has traditionally had a number of barriers that limited the applications that could be deployed in a HA manner, the first being cost but also from the standpoint of complexity, now this is different that DR where something is broke and you need to have some method of getting it back online either by using a second location with HA you want to have components be able to fail but have the system still opperate normally because there’s multiple servers that can perform that fuction or multiple online replicas of the data. In the traditional DC this can be very difficult and complex as well as costly, but with AWS we have built HA services that you can leverage which not only bends the cost curve but also makes its extremely simple to do.
  • As you can see here many of the servives we provide, are inherently fault-tolerant, we’ve done all the work to create them in a fashion that is resiliant to failure and highly durable so you don’t have to. So now if you need a fault-tolerant NoSQL DB you don’t have to worry about how to architect that you can simply use DynamoDB. So with the right design and by leveraging the services we provide that are inherently fault-tolerant you can focus on building your application rather than the infrastrcutre. Some of the services you see on the right are fault tolerant with the right architecture, and what we mean by that is we give you options on how you’re going to architect and deploy those services, RDS with mysql for example is fault-tolerant when Multi-AZ deployments are used since it will be replicating the data to multiple datacenters.
  • So we know there are opprotunities for failure at every layer of the stack, from disasters that affect entire geographies, or indidividualbuildingsall they up to the sever you’re application is running on. Now lets see how this translates in AWS and look at the service we have that provide fault-tolerance
  • At AWS we’ve built fault tolerant systems at every level of that stack.
  • Fault Separation Amazon EC2 provides customers the flexibility to place instances within multiple geographic regions as well as across multiple Availability Zones. Each Availability Zone is designed with fault separation. This means that Availability Zones are physically separated within a typical metropolitan region, on different flood plains, in seismically stable areas. In addition to discrete uninterruptable power source (UPS) and onsite backup generation facilities, they are each fed via different grids from independent utilities to further reduce single points of failure. They are all redundantly connected to multiple tier-1 transit providers. It should be noted that although traffic flowing across the private networks between Availability Zones in a single region is on AWS-controlled infrastructure, all communications between regions is across public Internet infrastructure, so appropriate encryption methods should be used to protect sensitive data. Data are not replicated between regions unless proactively done so by the customer.
  • Distinct physical locationsLow-latency network connections between AzsIndependent power, cooling, network, securityAlways partition app stacks across 2 or more AzsElastic Load Balance across instances in multiple AzsDon’t confuse AZ’s with Regions!
  • Note, the question is not “do you need to automate your deployment” or “should I use automation when I’m using the cloud?” the answer to that is YES!The question is; if you’re using fully standard PHP or Java stacks, why manage it? Beanstalk does that great, with zero lock-in. If what you need is more complex, perhaps cloudformation (note, you can do BOTH!)
  • Three-Tier Web App has been “fork-lifted” to the cloudEverything in a single Availability ZoneLoad balanced at the Web tier and App tier using software load balancersMaster and Standby databaseElastic IP on front end load balancer onlyS3 used as DB backup instead of tapeHow can you use AWS features to make this app more highly available?
  • Three-Tier Web App has been “fork-lifted” to the cloudEverything in a single Availability ZoneLoad balanced at the Web tier and App tier using software load balancersMaster and Standby databaseElastic IP on front end load balancer onlyS3 used as DB backup instead of tapeHow can you use AWS features to make this app more highly available?
  • Avoid single points of failureAssume everything fails, and design backwardsGoal: Applications should continue to function even if the underlying physical hardware fails or is removed or replaced.Design your recovery processTrade off business needs vs. cost of high-availability
  • Multiple DNS TargetsLoad Balanced across Availability ZonesAuto-scaled web-cache servers with health checksAuto-scaled web-servers with health checksComprehensive config, data, and AMI backupMonitoring, alarming and logging
  • DB-Tier Load Balancing or QueueingAuto-scaled Database cache servers with health checksRedundant Relational Database systems Mirrored, log-shipped, async or sync replicatedDesigned to scale horizontally (sharding)Durable NoSQL or KV-store Data SystemsNo SPOF designSupports automatic re-balancing, replication, and fault-recoveryMonitoring, alarming and logging
  • DB-Tier Load Balancing or QueueingAuto-scaled Database cache servers with health checksRedundant Relational Database systems Mirrored, log-shipped, async or sync replicatedDesigned to scale horizontally (sharding)Durable NoSQL or KV-store Data SystemsNo SPOF designSupports automatic re-balancing, replication, and fault-recoveryMonitoring, alarming and logging
  • Multi-AZ DeploymentsSynchronous replication across AZsAutomatic fail-over to standby replicaAutomated BackupsEnables point-in-time recovery of the DB instanceRetention period configurableSnapshotsUser initiated full backup of DBNew DB can be created from snapshots
  • Transcript

    • 1. Building Fault-TolerantApplications in the CloudRyan HollandEcosystem Solution Architect
    • 2. Faults?FacilitiesHardwareNetworkingCodePeople
    • 3. What is “Fault-Tolerant”?Degrees of risk mitigation - not binaryAutomatedTested!
    • 4. AgendaThe AWS ApproachBuilding BlocksDesign Patterns
    • 5. Old School Fault-Tolerance: Build Two
    • 6. Cloud Computing Benefits No Up-Front Low Cost Pay Only for Capital Expense What You Use Self-Service Easily Scale Improve Agility & Infrastructure Up and Down Time-to-Market Deploy
    • 7. Cloud Computing Fault-Tolerance Benefits No Up-Front HA Low Cost Pay for DR Only Capital Expense Backups When You Use it Self-Service Easily Deliver Fault- Improve Agility & DR Infrastructure Tolerant Applications Time-to-Recovery Deploy
    • 8. AWS Cloud allows Overcast Redundancy Have the shadow duplicate of your infrastructure ready to go when you need it……but only pay for whatyou actually use
    • 9. Old Barriers to HAare now SurmountableCostComplexityExpertise
    • 10. AWS Building Blocks: Two Strategies Inherently fault- Services that are fault-toleranttolerant services with the right architecture S3 Amazon EC2 SimpleDb VPC DynamoDB Cloudfront EBSSWF, SQS, SNS, SES RDS Route53Elastic Load Balancer Elastic Beanstalk ElastiCache Elastic MapReduce IAM
    • 11. Resources DeploymentThe Stack: Management Configuration Networking Facilities Geographies
    • 12. EC2 Instances Amazon Machine ImagesThe Stack: CW Alarms - AutoScaling Cloudformation - Beanstalk Route53 – ElasticIP – ELB Availability Zones Regions
    • 13. Regional DiversityUse Regions for: Latency • Customers • Data Vendors • Staff Compliance Disaster Recovery … and Fault Tolerance!
    • 14. Proper Use of Multiple Availability Zones
    • 15. Network Fault-Tolerance Tools107.22.18.45 isn’t fault-tolerant but is: EIPElastic Load BalancingAutomated DNS: Route53Latency-Based Routing
    • 16. New EC2 VPC feature:Elastic Network Interface Up to 8 Interfaces with 30 Addresses each Span Subnets Attach/Detach Public or Private
    • 17. Cloudformation – Elastic Beanstalk Q: Is your stack unique?
    • 18. Cloudwatch – Alarms – AutoScaling
    • 19. AMI’sMaintenance is criticalAlternatives: Chef, Puppet, cfn-init, etc.When in doubt: 64-bitReplicate for DR
    • 20. EC2 InstancesConsistent, reliable building block100% API controlledReserved InstancesEBSImmense Fleet Scale
    • 21. Example:a “fork-lifted” app
    • 22. Example:Fault-Tolerant
    • 23. Why mess with all of that?
    • 24. Design For FailureSPOF
    • 25. Copyright © 2011 Amazon Build Loosely Coupled Systems Web ServicesTightCouplingLoose Couplingusing Queues
    • 26. Fault-Tolerant Front-end SystemsAddressing: Route53, EIP Auto Scaling Amazon CloudFrontDistribution: Multi-AZ, ELB, CloudfrontRedundancy: Auto-Scaling Amazon CloudWatch Amazon Route 53 Elastic LoadMonitoring: Cloudwatch Balancer Elastic IP AWS ElasticPlatform: Elastic Beanstalk Beanstalk
    • 27. Fault-Tolerant Data-Tier SystemsTunedPatchedCachedShardedReplicatedBacked UpArchivedMonitored
    • 28. Fault-Tolerant Data-Tier SystemsTunedPatchedCached LOTSShardedReplicated OFBacked Up WORKArchivedMonitored
    • 29. AWS Fault-Tolerant Data-Tier ServicesS3SimpleDB Amazon Relational Database Service Amazon Elastic (RDS) MapReduce Amazon Simple Storage ServiceEMR (S3)DynamoDB Amazon SimpleDB Amazon DynamoDBRDS Amazon ElastiCache
    • 30. RDS Fault-Tolerant FeaturesMulti-AZ DeploymentsRead Replicas RDS DB Instance RDS DB Instance Multi-AZ StandbyAutomated BackupsSnapshots
    • 31. Storage Gateway Your Datacenter Amazon Elastic Compute Cloud (EC2) AWS Storage Gateway VM SSL Clients Internet On-premises Host or Direct AWS Storage Amazon Simple Connect Gateway Service Storage Service (S3)Application Servers Amazon Elastic Block Storage (EBS) Direct Attached or Storage Area Network Disks
    • 32. Test! Use a Chaos Monkey! Prudent Conservative Professional Open source …and all the cool kids are doing it
    • 33. Thank You!