Architecting for the cloud cloud providers

Uploaded on

This is a lecture on cloud providers from the course "Architecting for the Cloud"

This is a lecture on cloud providers from the course "Architecting for the Cloud"

More in: Software
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. © Matthew Bass 2013 Architecting for the Cloud Len and Matt Bass Cloud Providers
  • 2. © Matthew Bass 2013 IaaS Providers • There are several primary providers – Amazon: Amazon Web Services (AWS) – Microsoft: Azure – Google: Google Compute Engine – … • Each of these are set up a bit differently with slightly different internal decisions and associated services
  • 3. © Matthew Bass 2013 Goals • The goals for this talk is not to give you a definitive how to for each provider • It’s meant to give you just an introduction • The idea is that you’ll see how the concepts that we talked about in the course map to specific providers • We’ll look primarily at Amazon (with some details from others thrown in) • We’ll go through both the overall structure and look at specific services
  • 4. © Matthew Bass 2013 Amazon Elastic Compute Cloud • Amazon EC2 provides compute capacity in the cloud • You can select the machine image with a given OS and specified capability • You can resize the capacity as needed • Takes minutes to spin up a new VM • You can specify multiple instances and select where they will run – Region & availability zones • You pay per usage/hour depending on the capability of the instance and if it’s a reserved instance (dedicated)
  • 5. © Matthew Bass 2013 Regions • Amazon has divided their cloud offerings into multiple regions. Each region should be thought of as a separate cloud – I.e. there is no automatic copying of data from one region to another.
  • 6. © Matthew Bass 2013 Current AWS Regions • North America: – US East (5 availability zones) – US West Oregon (3 availability zones) – US West Northern California (3 availability zones) – USGov Cloud (2 availability zones) • South America – Sao Paulo (2 availability zones) • Europe – Ireland (2 availability zones) • Asia Pacific – Sydney (2 availability zones) – Singapore (2 availability zones) – China (1 availability zone) – Tokyo (3 availability zones)
  • 7. © Matthew Bass 2013 AWS and Services • Amazon Web Services offers a number of services • These services are things like: – Storage – Database – Network capabilities – Monitoring – … • Not all services are available at all regions – product-services/
  • 8. © Matthew Bass 2013 Amazon Availability Zones • Amazon has a notion of availability zones • Engineered to be insulated from failures in other availability zones • Availability zones are locations within a region • Amazon has not announced the details of an availability region but presumably they are – Physically separate data centers – Have independent networks – Have independent power delivery – …
  • 9. © Matthew Bass 2013 Amazon Service Level Agreement • Amazon guarantees 99.95% availability for each region • IaaS consumers are free to deploy their applications: – Within an availability zone – Across availability zones but within a region – Across regions • Amazon does not make any claim about the availability of their availability zones (that I could find)
  • 10. © Matthew Bass 2013 All-in-one Single Server
  • 11. © Matthew Bass 2013 Basic 4-server Setup
  • 12. © Matthew Bass 2013 Multiple Availability Zones
  • 13. © Matthew Bass 2013 Multiple Regions
  • 14. © Matthew Bass 2013 Elastic Compute Cloud (EC2) & Redundancy • EC2 supports different levels of redundancy – It is up to the customer to determine how much redundancy they wish to have and how much they wish to pay for it • Redundant elements can be: – Within an availability zone – Across availability zones – Across regions
  • 15. © Matthew Bass 2013 Microsoft Azure Regions • North America – US Central (Iowa) – US East (Virginia) – US East 2 (Virginia) – US North Central (Illinois) – US South Central (Texas) – US West (California) • Europe – Europe North (Ireland) – Europe West (Netherlands) • Asia Pacific – East (Hong Kong) – Southeast (Singapore) • Japan – Japan East (Saitama) – Japan West (Osaka) • Brazil – Sao Paulo
  • 16. © Matthew Bass 2013 Fault Domains in Azure • In Azure there is the concept of Fault Domains • A Fault Domain is essentially a rack in a given datacenter • A consumer is not able to define which fault zones the application are distributed to – Unlike an availability zone • As a result the fault zone is really an internal structure
  • 17. © Matthew Bass 2013 Upgrade Domains in Azure • An upgrade domain is similar to a fault domain • Essentially an upgrade domain will be upgraded at one time – When Microsoft upgrades their internal infrastructure they do so a domain at a time • In order to guard against failures within a fault domains and upgrades you need to replicate across both fault and upgrade domains • This is called an availability set
  • 18. © Matthew Bass 2013 Azure Availability Sets
  • 19. © Matthew Bass 2013 Amazon Auto Scaling • Auto Scaling works in conjunction with Cloudwatch (Amazon’s monitoring service) • The idea is the monitoring service monitors the metrics – CPU utilization – Latency – Memory consumption • The Auto Scaling solution establishes the rules – Add instances when utilization exceeds 70% – Remove instances when utilization falls below 10% • You can specify things like a “cooling off” period – Where no action is taken until the system has a chance to stabilize
  • 20. © Matthew Bass 2013 Amazon Elastic Load Balancer • This is Amazon’s load balancing solution – Recall the push/pull architecture discussion • It tracks the status and location of instances • Routes requests to healthy instances based on criteria that you establish • Can be used in conjunction with Auto Scaling – When new instances are added or removed they are registered with the ELB • Can use in conjunction with Amazon’s DNS (route 53) – You can use DNS failover to move from one region to another – The DNS will route traffic to the ELB in the target region
  • 21. © Matthew Bass 2013 Amazon Simple Queue Service • SQS is Amazon’s queuing service – Again recall the push/pull architecture discussion • It’s a service that supports message queues • Recall it can be used in conjunction with Auto Scaling to manage the elasticity of your application • Pricing is per million requests handled
  • 22. © Matthew Bass 2013 Amazon Storage Solutions • Amazon has several storage solutions – Elastic Block Store (EBS) – Simple Storage Solution (S3) – Glacier • These provide raw unmanaged storage • This is useful for: – Disaster recovery – Backup – Archiving – Persistence for your own database solution
  • 23. © Matthew Bass 2013 Amazon Elastic Block Store Amazon Elastic Block Store (EBS) is Amazon’s data file system. Some of its features are • Data is persisted independently from instances • EBS data is placed in a specific availability zones and can be attached to instances in the same availability zone • EBS data is automatically replicated within availability zone • There are two networks that connect EBS instances – A high speed network to provide coordination among instances and move data between instances. – A lower speed network used as backup for coordination. • $0.05 per million I/O requests
  • 24. © Matthew Bass 2013 Amazon Simple Storage Solution (S3) • S3 is a scalable storage solution • Good for content storage and distribution • Good for backup, archiving, and disaster recovery • Costs $0.03 per GB of data • More expensive but faster than Glacier • Not as fast for I/O as EBS
  • 25. © Matthew Bass 2013 Amazon Glacier • Low cost storage solution • Good for off site archival of Enterprise data • Good for backup and data archiving • Good for large volumes of data • Costs $0.01 per GB of data
  • 26. © Matthew Bass 2013 Amazon Database Solutions • Amazon has a number of fully managed database solutions • These are built on top of one of Amazon’s storage solutions • They include: – DynamoDB – Relational Data Store (RDS) – Redshift – ElastiCache
  • 27. © Matthew Bass 2013 DynamoDB • Key Value data store • Uses a throughput oriented pricing model (rather than a storage oriented model) • Uses solid state drives • Guarantees single digit read latencies • You pay a flat hourly rate based on capacity that you reserve – Costs $0.0065 per hour for every 10 units of write capacity – Costs $0.0065 per hour for every 10 unites of read capacity
  • 28. © Matthew Bass 2013 Relational Data Store • A distributed relational web service that provides a relational database for use in applications • It provides access to MySQL, Oracle, SQL Server, or PostgreSQL • It simplifies installation, patching, and backup related issues • Priced per hour according to db type, size, and number
  • 29. © Matthew Bass 2013 Redshift • Redshift is Amazon’s data warehousing solution • Integrates with other storage solutions • Priced at either $0.25 per hour on the low end • $1000/year per terabyte per year
  • 30. © Matthew Bass 2013 ElastiCache • A Web Service that enables an in memory data cache • Supports: – Memcached – Redis • Improves latency and throughput for read heavy applications • Prices are per Cache node/hour
  • 31. © Matthew Bass 2013 Amazon CloudFront • Amazon’s content delivery network • Provides edge services – Competes with companies such as Akamai • This service will allow you to locate content closer to users – Reduces latency • You specify the edge location and point it to the origin • You can route DNS to the edge location if you want
  • 32. © Matthew Bass 2013 Amazon Elastic IP Addressing • Amazon provides elastic IP addressing • The IP address is associated with your account – not with an instance • You can programmatically map the elastic IP to any instance in your account • In this way you make the deployment configuration transparent to the user/application – Remember the virtual network discussion?
  • 33. © Matthew Bass 2013 Many Other Services Available • Authentication services • Analytics • Elastic Map Reduce • Real time data streaming and processing • Business process automation services • Email services • Notification services • …
  • 34. © Matthew Bass 2013 Comparison to Other Providers • Other major providers (Google, Microsoft, Rackspace) offer similar services • Google doesn’t have as many services but has different pricing model – Charges in 10 minute increments rather than one hour increment • Microsoft has similar services • Rackspace also provides comparable options
  • 35. © Matthew Bass 2013 Outages • In Amazon (and others) there are some kinds of outages that are specific to the structure of the provider • We will now look at some of these outages
  • 36. © Matthew Bass 2013 Zone Failure • All of the IaaS providers have some notion of an “availability zone” • An availability zone (or fault domain in Azure) has it’s own switch, router, and rack • These availability zones are isolated from each other in a way that nodes within an availability zone are not
  • 37. © Matthew Bass 2013 Zone Failure Modes • A zone can fail in different ways Zone 1 Zone 2 Zone 3 Region
  • 38. © Matthew Bass 2013 Complete Failure • If for example you have a power outage you’ll have a complete failure • If you try to route traffic to any of these machines you’ll get a “no route to host” – This happens quickly – fast fail • You’ll know the zone is out • You can then spin up a new zone elsewhere
  • 39. © Matthew Bass 2013 Zone Failure Modes • You could have a network failure Zone 1 Zone 2 Zone 3 Region
  • 40. © Matthew Bass 2013 Network Failure • If you have a network failure it’s typically not a complete failure • The machines are still working but the network is having trouble • There is often still a route to host but your data isn’t reaching the host • As a result you don’t get a fast fail – You’ll get long timeouts
  • 41. © Matthew Bass 2013 Network Failure • With the long timeouts your system will start to back up • It’s difficult to tell the difference between this issue and other issues that result in latency lags • This problem can be intermittent as some of the routers might be down but not all
  • 42. © Matthew Bass 2013 Zone Failure Modes • You could have a failure of some zone service Zone 1 Zone 2 Zone 3 Region
  • 43. © Matthew Bass 2013 Zone Service Failure • This is some when a service fails that the zone is dependent on – It could be something that is part of the platform as a service (e.g. EBS) – It could also be a central service in your application • This causes cascading failures • Difficult to figure out what is going on
  • 44. © Matthew Bass 2013 Region Failure • It’s rare but a Region can fail as well • Both complete and partial failures have happened • Typically this starts with isolated issues that cascade • There might be an issue with a few nodes or with a single availability zone • Other zones become impacted (often due to additional traffic) and fail – It can be difficult to determine the scope of the issue while it’s occurring
  • 45. © Matthew Bass 2013 Regional Failure Modes • You could loose network access to a region Zone 1 Zone 2 Zone 3 Region
  • 46. © Matthew Bass 2013 Regional Outage • This is often caused by – a DNS issue – Router issues – Network capacity overload • Causes you to loose access to a region
  • 47. © Matthew Bass 2013 Regional Failure Modes • Local failures can cause a control plane overload Zone 1 Zone 2 Zone 3 Region
  • 48. © Matthew Bass 2013 Data Store Failure • As with the other portions of the system the data store can become unresponsive • The remedy for this is typically to mark this node as bad and attempt to bring a new node online • If the issue is more pervasive it can result in: – Disrupted availability – Loss of persistent data
  • 49. © Matthew Bass 2013 Backup Failure • Systems will often have a backup data mechanism • This is often a key component in disaster recovery • This can also fail – It can become temporarily or permanently unavailable
  • 50. © Matthew Bass 2013 Upgrades • Cloud providers need to upgrade their software as well • When they do this the nodes that are being upgraded experience an outage • If your software is running on these nodes you might experience an outage as well
  • 51. © Matthew Bass 2013 Utilizing AWS • You can utilize AWS in many ways – You can host your entire application in the cloud – You can host a specific portion of your application in the cloud – You can use the cloud for a specialized need
  • 52. © Matthew Bass 2013 Hosting Your Application • You can have a system that is fully deployed in the cloud • You’ll need to figure out how to structure the application to achieve both functional and quality attribute needs • You’ll want to first consider quality attribute concerns such as: – Scalability – Availability – Security – … • Utilize the techniques we talked about to determine the needs – Fault modeling (considering the cloud specific faults) – Threat modeling – Understanding the anticipated load and desired throughput and latency • Come up with a gross structure that achieves your objectives – Think about partitioning of the system to support testing, degraded modes of operation and independent deployment
  • 53. © Matthew Bass 2013 Partial Hosting • You might want to leverage the cloud for a specific portion of your system e.g. – Supporting mobile applications – Databases – Analytics – Delivery of particular content – Hosting your front end – … • This is typically going to be driven by cost and quality attribute needs (e.g. scalability)
  • 54. © Matthew Bass 2013 Backup and Recovery • Many organizations utilize the cloud for bulk storage, archiving, or back up and recovery • In the past external services were used for such needs – They often stored data on tape in separate physical locations • It can be cheaper and more convenient to utilize cloud services • As a result many organizations use the cloud for such storage needs
  • 55. © Matthew Bass 2013 Summary • Many services are available in the cloud – Storage – Network – Compute related services – … • These services provide different levels of service at different pricing levels • Utilizing the cloud appropriately and efficiently takes an explicit understanding of both your needs and the services available