• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure

on

  • 5,086 views

Everyone knows about eventual consistency properties of the cloud, but do you know how long it will take for a piece of data to become consistent/fresh? Despite the aim of providing infinite ...

Everyone knows about eventual consistency properties of the cloud, but do you know how long it will take for a piece of data to become consistent/fresh? Despite the aim of providing infinite scalability, is there any hard limits on some of the leading cloud platform services? We know cloud platforms aims to provide auto-scaling, but is it really all magic?
We at the University of NSW and National ICT Australia (NICTA) have been evaluating Cloud platforms over the last 18 months. In this session, we will share with the audience some of these (often surprising) evaluation findings, that should be of interest to application architects and developers looking at designing and building solutions using the cloud.
By Anna Liu, Hiroshi Wada, Kevin Lee, National ICT Australia, UNSW

Statistics

Views

Total Views
5,086
Views on SlideShare
5,086
Embed Views
0

Actions

Likes
7
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Presentation Abstract: Everyone knows about eventual consistency properties of the cloud, but do you know how long it will take for a piece of data to become consistent/fresh? Despite the aim of providing infinite scalability, is there any hard limits on some of the leading cloud platform services? We know cloud platforms aims to provide auto-scaling, but is it really all magic? We at the University of NSW and National ICT Australia (NICTA) have been evaluating Cloud platforms over the last 18 months. In this session, we will share with the audience some of these (often surprising) evaluation findings, that should be of interest to application architects and developers looking at designing and building solutions using the cloud.
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Quotas are resource constrains configured by the vendors. You probably can contact the vendors for more resources beyond the quotas, but communication takes time, and it will bring about opportunity cost. Limitations mostly are functions restrictions, you probably can’t go beyond it by making a phone call. Amazon Web Services Manually setup all applications – large maintenance cost and operation cost, including upgrading systems, installing applications and configuration. Maximum 5 GB per file in S3 – e.g. TB magnitude files can not be put into S3 directly. Extra efforts are needed, i.e. It has to be divided into small trunks (5GB each) before storing. Same efforts are also required during retrieval, all retrieved trunks have to be merged manually. Maximum 5 seconds query execution time in SimpleDB – no long time query in SimpleDB. If thousands items are query in SimpleDB, it could be failed due to timeout. Developers need to estimate the query time before hand, and separate a large query into small queries. And combine/merge the query results on client sides. 20 On-Demand or Reserved Instances and 100 Spot Instances by default – You can have more instances by contacting Amazon, but that definitely will increase your opportunity cost, if you need a scale out immediately. 1GB free outgoing bandwidth per month in SimpleDB, S3 and EC2 – Yep, you need to pay for extra usages. Microsoft Windows Azure 2 deployments per service (production and staging) – The two deployments are used for deploying production version and staging version separately, targeting the end-users and test users correspondingly. But it is not efficient enough to run multiple test versions at the same time. .NET, PHP or Java programming language – limited languages for .NET, PHP and Java developers Up to 50 GB for SQL Azure – The maximum size of a single SQL Azure database is 50 GB. If your data is more than 50 GB, then you probably have to consider data partitioning to scale out your database to multiple databases. 20 concurrent small compute instances or equivalent per month – 1 clock hour to an extra large instance equates to 8 small instance hours. Therefore, you can only have 10 TB of total data transfers per month – Probably you can get more if you send a request to Microsoft Up to 750 GB SQL Azure databases per month – For SQL Azure, it originally states 150 Web Edition databases (not sure it is or/and, see http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-us&offer=MS-AZR-0013P) 15 Business Edition databases, since the maximum size for each Web Edition is 5GB and maximum size for each Business Edition is 50GB. I do the simple math, 150*5 or 15*50, calculating the result as 750 GB. Google App Engine Java or Python programming language – PHP developer can do nothing on Google App Engine. Maximum 30 seconds for each request – Each request has to be responded within 30 seconds, otherwise, exceptions will be returned instead of results. In this case, high computational tasks is not applicable in GAE. The alternative is still splitting the task. GAE has made an early experimental release of MapReduce to fulfill the alternative. But only Mapper is implemented at this stage. 1 MB for each Datastore entity – Only 1MB for each data item. You probably will find it hard to store a photo in GAE. And also due to the 30 seconds limitation, your query should also be processed within 30 seconds. Maximum 2 GB per file in Blobstore – The same reason as AWS. Plus: maximum size of Blobstore data that can be read by the app with one API call is only 1 MB. So even you stored 2GB in Blobstore, it is still difficult to manipulate these data in GAE. 10 web applications per user – since the case of bush fire in 2009. I think all the following parameters can be adjusted by Google. 43, 200, 000 requests per day 1 GB (1, 046 GB maximum if billing enabled) incoming/outgoing bandwidth per day 6.5 CPU-hours (1, 729 CPU-hours maximum if billing enabled) per day
  • Reduce cost, reduce complexity
  • Reference – Saaland paper at VLDB
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • IaaS provides Basic Infrastructure Monitoring: such as CPU/RAM/disk/network usage. Need to convert and feed the data into a dashboard system in AMP Usage report and basic billing: Usually one account = one bill Customers’ responsibility Access control to IaaS: password and secret key management. change password and keys regularly to tighten security. access log of IaaS console (not available in EC2) Infrastructure Configuration: Establish VPN. Choose appropriate machine images, or upload machine images. Adding disks to virtual machines OS/Middleware installation/configuration: depending on machine images. Pre-configured machine images reduce workload OS patching: need to perform by customers Antivirus: need to install by customers if not included in a machine image OS backup: IaaS usually allows for taking snapshots of virtual machines OS Monitoring: if a monitoring facility provided by IaaS is not enough, you need to run yours. Feed the data into a dashboard system in AMP Application installation/configuration: as usual Application patching: App data backup: Taking snapshots using IaaS’s functionality. Or do your self such as running rsync. Application monitoring: Feed the data into a dashboard system in AMP OS/application security: such as access control by Active Directory Billing: Need to translate IaaS’s bill into cost center-based bill.
  • Reduce cost, reduce complexity
  • Figure 1 shows a typical set up of the Amazon VPC. This VPC setup allows a company’s infrastructure to be connected with the Amazon EC2 infrastructure via a VPN connection. It requires setting up two VPN gateways (one on each of the local and remote sides). A secure VPN connection is established between the two gateways via the IPsec protocol. EC2 instances on the remote side (Amazon side) are operated within subnets behind the remote VPN gateway. That is, these EC2 instances are isolated from the rest of the EC2 network and only these instances can access the hosts on the local side. Similarly, hosts can be added on the local side behind the customer gateway (local VPN gateway) and only these hosts have access to the remote EC2 instances.   A typical VPC connection meets the following security requirements: Utilise the AES 128-bit encryption function Utilise the SHA-1 hashing function
  • An example business report query took 16min 30sec takes less than 1min in the existing on-premise dev environment Data transfer over SSIS takes 14min (only 42KB/sec of throughput) No bottleneck observed on CPU (3-10%), memory (6G free), disk (low activity) or network (0.03% usage of 1Gbps)  SSIS protocol? ----------------- Done.  It works!  I did the following: 1.  Start an EC2 micro instance outside the VPC and attach an EBS volume to it 2. Copy file from S3 to the EBS volume attached to the micro instance 3. Detach the EBS volume from the micro instance 4. Attach EBS volume to an instance inside the VPC Note that, we did NOT route through NICTA here at all. The file I used for this experiment is ~700MB in size.  Step 2 took 130s (i.e. 5.39MB/s).
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • References: http://aws.amazon.com/ec2/ http://code.google.com/appengine/whyappengine.html#scale http://www.microsoft.com/windowsazure/appliance/
  • An article (with link to his paper) by Huan Liu discussing limitations of load balancers and autoscaling: http://huanliu.wordpress.com/tag/auto-scaling/ http://codecrafter.wordpress.com/2008/10/03/google-app-engine-scalability-that-doesnt-just-work/ An example on scaling in Azure: http://code.msdn.microsoft.com/azurescale/Release/ProjectReleases.aspx?ReleaseId=4167
  • Reduce cost, reduce complexity

10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure 10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure Presentation Transcript

  • From imagination to impact
  • 10 Things You Didn’t Know About Cloud Platforms: Azure, GAE and AWS Dr. Anna Liu, Dr. Hiroshi Wada, Kevin Lee National ICT Australia
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • The Reality of Eventual Consistency in Amazon SimpleDB
    • The probability to read updated data in SimpleDB in US West
      • An application reads data X (ms) after it has written data
    • SimpleDB has two read operations
      • Eventual Consistent Read
      • Consistent Read
    • This pattern is consistent regardless of the time of day
    Eventual Consistent Consistent Read
  • Consistent vs. Eventual Consistent Read
    • SimpleDB’s consistent read guarantees to read updated data
    • What is the cost you need to pay for consistency?
      • RTT is same as that of eventual consistent read
      • Monetary cost (usage fee) is exactly same as eventual consistent read
    •  Trade-off is not clear! We suspect consistent read is less scalable and slower under datacenter failures. However, we’ve not observed any differences
  • Other Commercial NoSQL Databases
    • Google App Engine
      • Offers eventual consistent read and consistent read
      • Behavior of eventual consistent read is completely different from Amazon’s
      • In GAE, both types of reads behave exactly same unless data centers have a failure(s)
    • Windows Azure
      • Offers no options for read
      • Always consistent
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Limitations and Quotas Limitations Quotas Amazon Web Services
    • Manually setup all applications
    • Maximum 5 GB per file in S3
    • Maximum 5 seconds query execution time in SimpleDB
    • 20 On-Demand or Reserved Instances and 100 Spot Instances by default
    • 1GB free outgoing bandwidth per month in SimpleDB, S3 and EC2
    Microsoft Windows Azure
    • 2 deployments per service (production and staging)
    • .NET, PHP or Java programming language
    • Up to 50 GB for a SQL Azure
    • 20 concurrent small compute instances or equivalent per month
    • 10 TB of total data transfers per month
    Google App Engine
    • Java or Python programming language
    • Maximum 30 seconds for each request
    • 1 MB for each Datastore entity
    • Maximum 2 GB per file in Blobstore (per API call manipulate <1MB)
    • 10 web applications per user
    • 43, 200, 000 requests per day
    • 1 GB (1, 046 GB maximum if billing enabled) incoming/outgoing bandwidth per day
    • 6.5 CPU-hours (1, 729 CPU-hours maximum if billing enabled) per day
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Performance Unpredictability in Cloud
    • Performance unpredictability is one of the major obstacles
      • Performance variance of a MapReduce job for a 50-node EC2 cluster and a 50-node local cluster
      • Examples (time as performance metric)
        • Repeatability of results for researchers
        • Time critical tasks for enterprises
  • Benchmark Details Metrics Measurements Benchmark Tools Instance Startup elapsed time from the moment a request for an instance is sent to the moment that the requested instance is available. CPU
      • a single score by executing various concurrent integer and floating point calculations
    Ubench Memory Speed a single score by executing random memory allocations as well as memory to memory copying Ubench Disk I/O sequential reads/writes and random reads block I/O Bonnie++ Network Bandwidth bandwidth, delay jitter and diagram loss Iperf S3 Access uploading a 100 MB file from one unused node of physical cluster at Saarland University to a newly created bucket on S3
  • Benchmark Results in EC2
    • The COV of large instance is higher than the small. However, both are at least by an order magnitude less stable than on a physical cluster.
    • The COV of S3 Access may be influenced by other traffic on the network, showing this experiment just for completeness.
    Reference - Schad, Jo ̈rg, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. 2010. Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance. In Proceedings of the 36th international conference on Very large data bases . Vol. 3. 1. Singapore, Singapore: VLDB Endowment. CPU Memory Sequential Read Random Read Network S3 Access COV in Physical Cluster 0.1% 0.3% 0.6% 1.9% 0.2% COV in Small EC2 21% 8% 17% 9% 19% 54% COV in Large EC2 24% 10% 20% 13%
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Distributed Transactions in Cloud
    • There is now a range of Cloud Database types
        • NOSQL (Azure Table, GAE Datastore, Amazon SimpleDB...)
          • Much more ‘shardable’ architecture; No joins, not full ACID support
        • SQL (Azure SQL, Amazon RDS, Oracle on EC2...)
          • Variable distributed transactional support compared to their traditional RDBMS counterpart
    • Experience with porting PetShop
        • Challenge with porting the data access layer
          • Some JDO interface not supported by App Engine, eg. ‘Join query’
          • No distributed transaction support in Azure SQL atm
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Pricing fluctuates over space and time
    • On demand pricing (hourly, per GB, per ‘000 requests)
    • Reserved instances (1 or 3 year term + unit cost)
    • Spot pricing (typically cheaper in US-East!)
    • Similar pricing schemes observed for GAE and Azure
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Sticky Session Support
    • Autoscaling alone does not guarantee that clients of the same session will always contact the same instance
        • Clients cannot perform a series of connected operations
    • Amazon ELB supports Session Affinity
      • Session affinity allows mapping to be created at the ELB
      • Limitations
        • Session affinity cannot handle HTTPS
        • Autoscaling down an instance with a live session
    • MS Azure advocates stateless sessions
      • If you must – store session state in eg table storage
    • Design issue - Server to remember conversation context? Or for client to remind it every time? How long should it ‘stick’? Too long: compromise server ability to distribute load
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Customers’ Responsibility in IaaS Cloud Infrastructure Configuration (VPN, VMs, Disk, …) OS/Application Security (e.g., Active Directory) OS/Middleware Installation/Configuration OS Patching Application Installation/Configuration Application Patching Billing (Cost Center Charging ) Antivirus OS Backup OS Monitoring App Data Backup Application Monitoring Amazon EC2 (IaaS providers) Infrastructure Monitoring (CPU, Disk, Net, …) Usage Report and Basic Billing Access Control to IaaS Customers’ Responsibility
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Secure Connection to the Cloud
  • Performance Implications
    • Low Security Option – max throughput 5.6MB/sec
    • High Security Option - connection throughput is 4MB/sec
      • Performance hit due to encryption, decryption and firewall
    • Other interesting observations:
      • VPC only available US East-1 and EU-west1
      • in single availability zone only
      • S3 not working well with VPC yet (very slow), EBS is a workaround
      • MS Azure VPN support next year
      • Google Secure Connector
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Time to Getting a New Instance
    • Typically takes minutes to create an instance from its image on EC2
    • Trick to “create” instances quicker
      • Create a pool of instances in advance, and stop (hibernate) them all
        • Pay no instance cost but need to pay for storage cost (for stopped instances)
      • Revive stopped instances if new instances are needed
    Operating System Method Time Windows Create from image 10-15 minutes Linux Create from image 5-10 minutes Windows Revive stopped instance 30 seconds Linux Revive stopped instance 30 seconds
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Autoscaling is Not All Magic
    • Amazon EC2
    • “… your application can automatically scale itself up and down depending on its needs.”
    • Windows Azure
    • “ Optimizd for scale-out applications-designed so that developers can easily build scale-out applications…”
    • Google App Engine
    • “ No matter how many users you have or how much data your application stores, App Engine can scale to meet your needs”
  • Autoscaling is Not All Magical (contd) Provider How to Scale? Limitations Amazon EC2
    • Load balancing with Elastic Load Balancer (ELB)
    • Event processing with Autoscaling API
    • Monitoring through CloudWatch
    • Load balancer is the bottle-neck, hence limited throughput
    • Limited load balancing options (e.g., no hardware load balancer)
    • Limited rule support (e.g. no conjunctions allowed in rules)
    • Limited monitoring support (e.g. limited to minute granularity)
    Windows Azure
    • Load balancing with Azure Queue Storage
    • Event processing with WF rules engine
    • Monitoring through Azure Diagnostics
    • Create/Delete instances with Management API
    • Throughput limited by Azure Queue
    • Limited monitoring support (e.g. billing information not monitored)
    Google App Engine
    • Built-in with App Engine
    • No control over how it scales
    • Number of simultaneous sessions limited by per-minute (burst) quota (500 requests per sec by default), server request time-out (30 secs), etc.
  • The 10 Things are...
    • How long does it take for data in cloud to become consistent
    • Limitation and quotas
    • How unpredictable/variable is the cloud?
    • Distributed transaction support in Cloud
    • Pricing variations over time and space
    • Sticky session support
    • The new matrix of roles and responsibilities for cloud providers, consumers and system integrators
    • Secure connections to the cloud
    • Time to getting a new instance
    • Auto-scaling is not all magic
  • Getting Involved
    • Linkage with National ICT Australia
        • Contract Research, Expert Advisory Services, Architecture Reviews
        • Public and In-house Training Courses
        • Market Surveys, Case Studies
        • Professional in Research Residence
          • Anna.Liu@nicta.com.au, @annaliu
          • http://blogs.unsw.edu.au/annaliu/
  •  
  • Virtual Machine ‘Stolen Time’
    • Using traditional system resource monitoring tools in cloud
      • Measuring system performance within a virtual instance (using tools such as vmstat and top) can give misleading information
      • Example: An EC2 instance (e.g. m1.small with 1 EC2 compute unit) does not go above around 40% CPU load as observed from vmstat
        • Certain percentage (around 50-60%) appears on vmstat as ‘st’
        • “ st – Time stolen from a virtual machine” (from vmstat manpage)
        • Does it mean I am not getting what I paid for? No, not really
          • Amazon instances are measured by EC2 compute units
          • “ One EC2 compute Unit provides the equivalent CPU capacity of a 1.0-1.2GHz 2007 Opteron or 2007 Xeon process”
    • Monitoring system performance in cloud
      • Use Cloud monitoring tools such as CloudWatch and RightScale
  • Limitation of Virtual Private Cloud (VPC)
    • VPC hosts are logically detached from (but physically attached to) the Amazon network
      • No direct connection to and from S3 via the Amazon local network
      • Connection via internet only
    • What happen if we need to transfer data from S3 to a VPC host?
      • E.g. If we ship a removable media to Amazon, it would be uploaded to S3. How do we transfer the data to a VPC host?
      • Option 1: Direct transfer from S3 to VPC host
        • Traffic routes through the remote side and comes back (High latency)
      • Option 2: Transfer to EBS and mount EBS to VPC host
        • Traffic routes through local network (Low latency)
  • How Long You Need to Wait to Get Updated with Eventual Consistent Read?
    • Result of the “5 minutes run” for one week
    • t 1 : the first time to read updated data
    • t 2 : the first time to reach 100% of reading updated
    • t 3 : the last time to read stale data
    •  Mostly updated after 600ms but no guarantee