Architecting for the cloud cloud providers


Published on

This is a lecture on cloud providers from the course "Architecting for the Cloud"

Published in: Software
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Architecting for the cloud cloud providers

  1. 1. © Matthew Bass 2013 Architecting for the Cloud Len and Matt Bass Cloud Providers
  2. 2. © Matthew Bass 2013 IaaS Providers • There are several primary providers – Amazon: Amazon Web Services (AWS) – Microsoft: Azure – Google: Google Compute Engine – … • Each of these are set up a bit differently with slightly different internal decisions and associated services
  3. 3. © Matthew Bass 2013 Goals • The goals for this talk is not to give you a definitive how to for each provider • It’s meant to give you just an introduction • The idea is that you’ll see how the concepts that we talked about in the course map to specific providers • We’ll look primarily at Amazon (with some details from others thrown in) • We’ll go through both the overall structure and look at specific services
  4. 4. © Matthew Bass 2013 Amazon Elastic Compute Cloud • Amazon EC2 provides compute capacity in the cloud • You can select the machine image with a given OS and specified capability • You can resize the capacity as needed • Takes minutes to spin up a new VM • You can specify multiple instances and select where they will run – Region & availability zones • You pay per usage/hour depending on the capability of the instance and if it’s a reserved instance (dedicated)
  5. 5. © Matthew Bass 2013 Regions • Amazon has divided their cloud offerings into multiple regions. Each region should be thought of as a separate cloud – I.e. there is no automatic copying of data from one region to another.
  6. 6. © Matthew Bass 2013 Current AWS Regions • North America: – US East (5 availability zones) – US West Oregon (3 availability zones) – US West Northern California (3 availability zones) – USGov Cloud (2 availability zones) • South America – Sao Paulo (2 availability zones) • Europe – Ireland (2 availability zones) • Asia Pacific – Sydney (2 availability zones) – Singapore (2 availability zones) – China (1 availability zone) – Tokyo (3 availability zones)
  7. 7. © Matthew Bass 2013 AWS and Services • Amazon Web Services offers a number of services • These services are things like: – Storage – Database – Network capabilities – Monitoring – … • Not all services are available at all regions – product-services/
  8. 8. © Matthew Bass 2013 Amazon Availability Zones • Amazon has a notion of availability zones • Engineered to be insulated from failures in other availability zones • Availability zones are locations within a region • Amazon has not announced the details of an availability region but presumably they are – Physically separate data centers – Have independent networks – Have independent power delivery – …
  9. 9. © Matthew Bass 2013 Amazon Service Level Agreement • Amazon guarantees 99.95% availability for each region • IaaS consumers are free to deploy their applications: – Within an availability zone – Across availability zones but within a region – Across regions • Amazon does not make any claim about the availability of their availability zones (that I could find)
  10. 10. © Matthew Bass 2013 All-in-one Single Server
  11. 11. © Matthew Bass 2013 Basic 4-server Setup
  12. 12. © Matthew Bass 2013 Multiple Availability Zones
  13. 13. © Matthew Bass 2013 Multiple Regions
  14. 14. © Matthew Bass 2013 Elastic Compute Cloud (EC2) & Redundancy • EC2 supports different levels of redundancy – It is up to the customer to determine how much redundancy they wish to have and how much they wish to pay for it • Redundant elements can be: – Within an availability zone – Across availability zones – Across regions
  15. 15. © Matthew Bass 2013 Microsoft Azure Regions • North America – US Central (Iowa) – US East (Virginia) – US East 2 (Virginia) – US North Central (Illinois) – US South Central (Texas) – US West (California) • Europe – Europe North (Ireland) – Europe West (Netherlands) • Asia Pacific – East (Hong Kong) – Southeast (Singapore) • Japan – Japan East (Saitama) – Japan West (Osaka) • Brazil – Sao Paulo
  16. 16. © Matthew Bass 2013 Fault Domains in Azure • In Azure there is the concept of Fault Domains • A Fault Domain is essentially a rack in a given datacenter • A consumer is not able to define which fault zones the application are distributed to – Unlike an availability zone • As a result the fault zone is really an internal structure
  17. 17. © Matthew Bass 2013 Upgrade Domains in Azure • An upgrade domain is similar to a fault domain • Essentially an upgrade domain will be upgraded at one time – When Microsoft upgrades their internal infrastructure they do so a domain at a time • In order to guard against failures within a fault domains and upgrades you need to replicate across both fault and upgrade domains • This is called an availability set
  18. 18. © Matthew Bass 2013 Azure Availability Sets
  19. 19. © Matthew Bass 2013 Amazon Auto Scaling • Auto Scaling works in conjunction with Cloudwatch (Amazon’s monitoring service) • The idea is the monitoring service monitors the metrics – CPU utilization – Latency – Memory consumption • The Auto Scaling solution establishes the rules – Add instances when utilization exceeds 70% – Remove instances when utilization falls below 10% • You can specify things like a “cooling off” period – Where no action is taken until the system has a chance to stabilize
  20. 20. © Matthew Bass 2013 Amazon Elastic Load Balancer • This is Amazon’s load balancing solution – Recall the push/pull architecture discussion • It tracks the status and location of instances • Routes requests to healthy instances based on criteria that you establish • Can be used in conjunction with Auto Scaling – When new instances are added or removed they are registered with the ELB • Can use in conjunction with Amazon’s DNS (route 53) – You can use DNS failover to move from one region to another – The DNS will route traffic to the ELB in the target region
  21. 21. © Matthew Bass 2013 Amazon Simple Queue Service • SQS is Amazon’s queuing service – Again recall the push/pull architecture discussion • It’s a service that supports message queues • Recall it can be used in conjunction with Auto Scaling to manage the elasticity of your application • Pricing is per million requests handled
  22. 22. © Matthew Bass 2013 Amazon Storage Solutions • Amazon has several storage solutions – Elastic Block Store (EBS) – Simple Storage Solution (S3) – Glacier • These provide raw unmanaged storage • This is useful for: – Disaster recovery – Backup – Archiving – Persistence for your own database solution
  23. 23. © Matthew Bass 2013 Amazon Elastic Block Store Amazon Elastic Block Store (EBS) is Amazon’s data file system. Some of its features are • Data is persisted independently from instances • EBS data is placed in a specific availability zones and can be attached to instances in the same availability zone • EBS data is automatically replicated within availability zone • There are two networks that connect EBS instances – A high speed network to provide coordination among instances and move data between instances. – A lower speed network used as backup for coordination. • $0.05 per million I/O requests
  24. 24. © Matthew Bass 2013 Amazon Simple Storage Solution (S3) • S3 is a scalable storage solution • Good for content storage and distribution • Good for backup, archiving, and disaster recovery • Costs $0.03 per GB of data • More expensive but faster than Glacier • Not as fast for I/O as EBS
  25. 25. © Matthew Bass 2013 Amazon Glacier • Low cost storage solution • Good for off site archival of Enterprise data • Good for backup and data archiving • Good for large volumes of data • Costs $0.01 per GB of data
  26. 26. © Matthew Bass 2013 Amazon Database Solutions • Amazon has a number of fully managed database solutions • These are built on top of one of Amazon’s storage solutions • They include: – DynamoDB – Relational Data Store (RDS) – Redshift – ElastiCache
  27. 27. © Matthew Bass 2013 DynamoDB • Key Value data store • Uses a throughput oriented pricing model (rather than a storage oriented model) • Uses solid state drives • Guarantees single digit read latencies • You pay a flat hourly rate based on capacity that you reserve – Costs $0.0065 per hour for every 10 units of write capacity – Costs $0.0065 per hour for every 10 unites of read capacity
  28. 28. © Matthew Bass 2013 Relational Data Store • A distributed relational web service that provides a relational database for use in applications • It provides access to MySQL, Oracle, SQL Server, or PostgreSQL • It simplifies installation, patching, and backup related issues • Priced per hour according to db type, size, and number
  29. 29. © Matthew Bass 2013 Redshift • Redshift is Amazon’s data warehousing solution • Integrates with other storage solutions • Priced at either $0.25 per hour on the low end • $1000/year per terabyte per year
  30. 30. © Matthew Bass 2013 ElastiCache • A Web Service that enables an in memory data cache • Supports: – Memcached – Redis • Improves latency and throughput for read heavy applications • Prices are per Cache node/hour
  31. 31. © Matthew Bass 2013 Amazon CloudFront • Amazon’s content delivery network • Provides edge services – Competes with companies such as Akamai • This service will allow you to locate content closer to users – Reduces latency • You specify the edge location and point it to the origin • You can route DNS to the edge location if you want
  32. 32. © Matthew Bass 2013 Amazon Elastic IP Addressing • Amazon provides elastic IP addressing • The IP address is associated with your account – not with an instance • You can programmatically map the elastic IP to any instance in your account • In this way you make the deployment configuration transparent to the user/application – Remember the virtual network discussion?
  33. 33. © Matthew Bass 2013 Many Other Services Available • Authentication services • Analytics • Elastic Map Reduce • Real time data streaming and processing • Business process automation services • Email services • Notification services • …
  34. 34. © Matthew Bass 2013 Comparison to Other Providers • Other major providers (Google, Microsoft, Rackspace) offer similar services • Google doesn’t have as many services but has different pricing model – Charges in 10 minute increments rather than one hour increment • Microsoft has similar services • Rackspace also provides comparable options
  35. 35. © Matthew Bass 2013 Outages • In Amazon (and others) there are some kinds of outages that are specific to the structure of the provider • We will now look at some of these outages
  36. 36. © Matthew Bass 2013 Zone Failure • All of the IaaS providers have some notion of an “availability zone” • An availability zone (or fault domain in Azure) has it’s own switch, router, and rack • These availability zones are isolated from each other in a way that nodes within an availability zone are not
  37. 37. © Matthew Bass 2013 Zone Failure Modes • A zone can fail in different ways Zone 1 Zone 2 Zone 3 Region
  38. 38. © Matthew Bass 2013 Complete Failure • If for example you have a power outage you’ll have a complete failure • If you try to route traffic to any of these machines you’ll get a “no route to host” – This happens quickly – fast fail • You’ll know the zone is out • You can then spin up a new zone elsewhere
  39. 39. © Matthew Bass 2013 Zone Failure Modes • You could have a network failure Zone 1 Zone 2 Zone 3 Region
  40. 40. © Matthew Bass 2013 Network Failure • If you have a network failure it’s typically not a complete failure • The machines are still working but the network is having trouble • There is often still a route to host but your data isn’t reaching the host • As a result you don’t get a fast fail – You’ll get long timeouts
  41. 41. © Matthew Bass 2013 Network Failure • With the long timeouts your system will start to back up • It’s difficult to tell the difference between this issue and other issues that result in latency lags • This problem can be intermittent as some of the routers might be down but not all
  42. 42. © Matthew Bass 2013 Zone Failure Modes • You could have a failure of some zone service Zone 1 Zone 2 Zone 3 Region
  43. 43. © Matthew Bass 2013 Zone Service Failure • This is some when a service fails that the zone is dependent on – It could be something that is part of the platform as a service (e.g. EBS) – It could also be a central service in your application • This causes cascading failures • Difficult to figure out what is going on
  44. 44. © Matthew Bass 2013 Region Failure • It’s rare but a Region can fail as well • Both complete and partial failures have happened • Typically this starts with isolated issues that cascade • There might be an issue with a few nodes or with a single availability zone • Other zones become impacted (often due to additional traffic) and fail – It can be difficult to determine the scope of the issue while it’s occurring
  45. 45. © Matthew Bass 2013 Regional Failure Modes • You could loose network access to a region Zone 1 Zone 2 Zone 3 Region
  46. 46. © Matthew Bass 2013 Regional Outage • This is often caused by – a DNS issue – Router issues – Network capacity overload • Causes you to loose access to a region
  47. 47. © Matthew Bass 2013 Regional Failure Modes • Local failures can cause a control plane overload Zone 1 Zone 2 Zone 3 Region
  48. 48. © Matthew Bass 2013 Data Store Failure • As with the other portions of the system the data store can become unresponsive • The remedy for this is typically to mark this node as bad and attempt to bring a new node online • If the issue is more pervasive it can result in: – Disrupted availability – Loss of persistent data
  49. 49. © Matthew Bass 2013 Backup Failure • Systems will often have a backup data mechanism • This is often a key component in disaster recovery • This can also fail – It can become temporarily or permanently unavailable
  50. 50. © Matthew Bass 2013 Upgrades • Cloud providers need to upgrade their software as well • When they do this the nodes that are being upgraded experience an outage • If your software is running on these nodes you might experience an outage as well
  51. 51. © Matthew Bass 2013 Utilizing AWS • You can utilize AWS in many ways – You can host your entire application in the cloud – You can host a specific portion of your application in the cloud – You can use the cloud for a specialized need
  52. 52. © Matthew Bass 2013 Hosting Your Application • You can have a system that is fully deployed in the cloud • You’ll need to figure out how to structure the application to achieve both functional and quality attribute needs • You’ll want to first consider quality attribute concerns such as: – Scalability – Availability – Security – … • Utilize the techniques we talked about to determine the needs – Fault modeling (considering the cloud specific faults) – Threat modeling – Understanding the anticipated load and desired throughput and latency • Come up with a gross structure that achieves your objectives – Think about partitioning of the system to support testing, degraded modes of operation and independent deployment
  53. 53. © Matthew Bass 2013 Partial Hosting • You might want to leverage the cloud for a specific portion of your system e.g. – Supporting mobile applications – Databases – Analytics – Delivery of particular content – Hosting your front end – … • This is typically going to be driven by cost and quality attribute needs (e.g. scalability)
  54. 54. © Matthew Bass 2013 Backup and Recovery • Many organizations utilize the cloud for bulk storage, archiving, or back up and recovery • In the past external services were used for such needs – They often stored data on tape in separate physical locations • It can be cheaper and more convenient to utilize cloud services • As a result many organizations use the cloud for such storage needs
  55. 55. © Matthew Bass 2013 Summary • Many services are available in the cloud – Storage – Network – Compute related services – … • These services provide different levels of service at different pricing levels • Utilizing the cloud appropriately and efficiently takes an explicit understanding of both your needs and the services available