1. Survey Report (EE-532)
On
“AWS Cloud Including the Platform, Service
Offerings and Reported Real-life
Applications or Benchmark Evaluation
Results”
Submitted by
Praval Panwar (ppanwar) USC id 7202018688
Arjun Sharma (arjuns) USC id 2105050436
2. 2
AWS Cloud Including the Platform, Service Offerings and Reported
Real-life Applications or Benchmark Evaluation Results
By Praval Panwar and Arjun Sharma
Abstract
This report is based on the survey performed on world’s one of most talked about technology i.e. Cloud
Computing. Amazon Web Service (public Cloud) is an evident leader in this era of Cloud computing. It
has emerged, through the technique of virtualization, as a complex computing model to reduce the
need of acquiring and maintaining hardware and large data Center. This report introduces various AWS
services, Infrastructure, Applications and Benchmarking of its services. AWS began offering its
technology infrastructure platform in 2006. Since then with hundreds of thousands of customers across
190 countries are using AWS in every imaginable way. Developing, managing, and operating applications
requires a wide variety of technology services. This report tries to outline various benchmark results to
address the basic questions, implicit in the title, as what are AWS services, their usage and most
importantly how Amazon’s public cloud is so successful ? Is it really a wise decision to eliminate your
hardware implementation and use Public could instead? It requires a thorough study of AWS to answer
these questions.
In this limited scope report we tried to go through various Technical Papers, White Papers, Internet
materials and Books to address those questions on a macroscopic level.
1. Introduction to Cloud Computing
Cloud computing refers to the use of a service from hardware and software resources that's delivered by
a network, which in most cases is the Internet.
The hardware resources are the physical servers and infrastructure, which includes storage, cooling and
networking. The software resources provide the services themselves which is the consumable
product that comes in many different forms. Cloud computing typically has a number of
characteristics. First is the delegation of physical and management overhead. With cloud computing you
can select a service to use. You don't have to purchase, set up, or maintain the servers and
infrastructure. You have outsourced this to the cloud service provider who has already done this ahead
of time.
Now cloud computing is both modular and compartmentalized, which allows for a system to be built
out of smaller distinct and interchangeable components that work together. As a result, these services
are highly elastic, which allows for dynamic allocation of resources; growing and shrinking as needed. An
example of elasticity is providing more databases during peak demand than removing them as traffic
dies down. Additionally, the modularity provides a elastic infrastructure in case of an individual
component failure.
There can be three different modes of Clouds:-
1. Private Cloud:-
In this type of mode facilities are made available specific to an organization. There are fewer
restrictions on network bandwidth and security.
2. Public Cloud:-
It illustrates the basic style of dynamic allocation of resources; “Pay-as-you-go” services come
under this header. Various parameters like Network bandwidth, Security and response time are
pivotal for success of a Public Cloud. Example: - AWS and AZURE.
3. 3
3. Hybrid Cloud:-
It is a combination of Public and Private Cloud and maintained by various internal/ external
Vendors.
2. What is AWS?
Amazon Web Service (AWS) is a Public Cloud, which provides various web services, which are scalable,
reliable and trusted, with high SLA i.e. Service Level Agreement. These Services offer faster, cost
efficient and large computing services than substantial hardware implementation.
Amazon reported $61 billion in revenues for the 2012 fiscal year. The company does not report
revenues for AWS but it is widely believed to be a $3.5 billion business and growing rapidly.
3. AWS Business Model and its advantages over Conventional Data Centers
Before we discuss the AWS successful business model let us review its usefulness and its greater hand
over Conventional Data Center. Generally speaking we can understand cloud as Data center hardware
and software. Now when an organization made its data center and utility services made available to
general public, then it’s known as Public cloud as we have discussed earlier.
AWS stands out as a front runner and as a pioneer in Public could domain, mainly due its robust service
and strong business model i.e. “PAY-as-You-Go”.
For such a dynamic business model three important and vital aspects needed to cover by AWS namely:-
The pseudo appearance of infinite resources available on demand for a user
Abolish the need of a pre-commitment from a user rather allowing companies to start small and
grow with the time
“Elasticity”, i.e. provisioning resources according to the varying needs of a user for example
Processor by hours or storage by days
Buzz word EC2 i.e. Amazon Elastic Compute Cloud service provides highly Elastic Server service. This
uses statistical multiplexing to gain infinite capacity look like architecture. For EC2 using 500 EC2
instances for one hour is same as 1 EC2 instance for 500 hours. It’s not just merely converting CapEx
(Capital Expenditure) to OpEx (Operating Expenditure) but it’s a phenomenon of Usage based pricing.
Some might argue that Pay-as-you-Go is a costly option to go for but studies have shown that over the
time the great benefits of Elasticity and various risks prevention like overprovision (Excess) and under
provision (scarcity), out-rights the cost economics by enlarge.
Figure 1 showing problem of under provision (dotted) and over provision (solid) in conventional DC
4. 4
With help of previous figure I tried to explain two large problems of under and Over Provisioning. Dotted
line exceeds the total available resources whereas solid line never reaches the full capacity though.
Let’s see a numeric example to understand the concept. We need 1000 servers at peak time and 200
servers at mid night, so over all day average is 600. So with conventional Data Centers we have to
provision for 1000 servers then Cost Per day, if $100/hr then total cost would be 1000*24*100=
$2,400,0000. But Actual utilization is of (Average 600 Servers) 100*600*24= $1,440,000. Here Cloud
proves to be the better solution.
Secondly AWS enjoys the benefits of handling periodic and a periodic demand variation. Where as in a
conventional way acquiring and establishing servers in Data Centers cannot be prompt and quick to
handle uneven demands patterns. Below are lists advantages of AWS over conventional DC:-
Infinite Computing Resources
On Demand availability
Pay for usage even on a short term basis
No need of pre commitment of the user
Higher productivity through multiplexing of loads
4. Cloud Architecture and Service Model of Amazon Web Service:-
Software as a Service (SaaS)
Network as a Service(NaaS)
Infrastructure as a Service (IaaS)
Platform as a Service (PaaS)
Figure 2 Service Models of AWS (Recreated, Originally from “A Taxonomy & Survey of cloud
computing Systems” by B K Rimal & E Choi)
Now there are many different types of cloud computing service models, which each of them have been
designated with a different acronym, which can also used as a buzzword. We'll focus on the four primary
service models that have been recognized by the International Telecommunications Union.
I. Software-as-a-Service (Saas): -
The first letter S is for Software as a Service or SaaS, which refers to application software installed and
operated by cloud providers and used by clients. The providers manage the infrastructure and platform
and the clients just use the software.
II. Network as a Service (NaaS):-
Where network and transport capabilities are provided, which is traditionally as a virtual private
network and bandwidth. This is introduced as a distinct service model in 2012, but it's generally not
needed by basic cloud users.
III. Infrastructure as a Service (IaaS):-
It typically provides a hosted virtualized machine where a single physical machine emulates multiple
computing environments that behave like individual computers. In short, it means to get to use an
5. 5
entire computer's resources without needing a physical computer sitting there on your desk or in a rack
somewhere. A VPS or Virtual Private Server is a common example of IaaS.
IV. Platform as a Service (PaaS):-
It is a complete framework that can be used to host applications written by clients. The platform is
typically a web server solution stack, including the operating system, web, and database server and
programming language execution environments where you can now run something that has been
written in a particular language. And Infrastructure as a Service can be configured and packaged as a
platform. An example of this is PHP Application Server.
When we're outsourcing maintenance of our stuff, we're entrusting the cloud service provider user
data, software, and so. So one has to evaluate his security and privacy needs before making a decision
about whether or not cloud services are appropriate.
5. Amazon Web Service Cloud Platform:-
AWS Global infrastructure Amazon Web service cloud has several Availability Zones but it has nine
different Geographic regions, which has nine distinct geographic regions where their servers are
hosted. Northern Virginia, which is the default, Oregon, California, Ireland, Singapore, Tokyo,
Sydney, Sao Paulo and AWS GovCloud for the US government. In addition to those regions, there are
roughly 40 edge locations. Availability Zones are isolated from each other to prevent outages from
spreading between Zones. Several services operate across Availability Zones (e.g., S3, DynamoDB) while
others can be configured to replicate across Zones to spread demand and avoid downtime from failures.
6. Amazon Service Offerings:-
Figure 3 AWS service Offering (Recreated from Up &running AWS by John Peck)
6. 6
Amazon offers large number of services. To analyze them we would divide them into logical group of
products viz. Foundation Services, Application platform services and Management & Administration
Services. On the top of the stack of services there lies our Application. As described in the Figure 3 under
each of the categories further comes much diverse and strong range of services. We will list and discuss
about most of them here.
AWS Foundation Services
Mainly there are four services compute, storage, database and networking.
Elastic Cloud Computing (EC2)
It provides Virtual private servers. EC2 runs instances on its Infra using middle ware Xen. EC2 instances
are fundamental building block for computing needs in AWS cloud. These are created from Amazon
Machine Image (AMI) and choosing proper type. Amazon provides a wide range of instances useful for
different use cases. Instances are grouped into families based on target application.
General Purpose Instances
This family includes the M1 and M3 instance types. These are good for small and mid-size
database operations and data processing. M3 uses Intel Xeon E5-2670 (Sandy Bridge or Ivy
Bridge) processors. These instances offer SSD-based instance storage for fast instance store I/O
performance.
Compute-optimized Instances
This family includes the C1, CC2, and C3 instance types. The popular use cases are High-traffic
web applications, ad serving, batch processing, video encoding, and distributed analytics. Each
virtual CPU (vCPU) on C3 instances is a hardware hyper-thread from a 2.8 GHz Intel Xeon E5-
2680v2 (Ivy Bridge) processor.
GPU Instances
This family includes G2 and CG1 instance types, which provide you with Intel Xeon processors
and high-performance NVIDIA GPUs intended for graphics and general purpose GPU compute
applications. CG1 instances use NVIDIA Tesla M2050 GPUs each with 448 CUDA cores and 3GB
of video memory. Used for Game streaming and 3-d Application streaming.
Memory optimized Instances
This family includes the R3, M2, and CR1 instance types and is used for memory-intensive
applications. R3 support enhanced networking. CR1 run on faster CPUs like (Intel Xeon E5-2670
with NUMA support) with like bandwidth. Used in High performance relational and NoSQL
databases, distributed memory caches, in-memory analytics.
Storage-Optimized Instances
This family includes the High Storage Instances, HS1 and High I/O Instances, I2 and HI1. HI1
instances feature Intel processors - each CPU (vCPU) is a 2.4 GHz. Hi1.4xlarge instances include
60.5 GiB of RAM and 1990 GB of SSD-based instance storage. Used with NoSQL, Dataware
housing and Hadoop.
Micro Instances
This family includes T1. Micro instances are a very low-cost instance option, providing a small
amount of CPU resources. Micro instances may increase the CPU availability in case of periodic.
It is used in Low Traffic Websites etc.
One should choose a type of Instance very carefully, according to their work load requirement.
Application profiling and Load testing used to perform. Feature of EBS optimization can be used too.
7. 7
Auto Scaling allows EC2 instances to be dynamically added or removed in response monitored resource
utilization.
Following is the pricing table for General Purpose Instances family
Type vCPU ECU Memory(Gib) Storage Linux Usage
M3.medium 1 3 3.75 1*4 SSD $0.070 pr/Hr
M3.large 2 6.5 7.5 1*32 SSD $0.140pr/hr
M3.xlarge 4 13 15 2*40 SSD $0.280pr/hr
M3.2xlarge 8 26 30 2*80 SSD $0.560pr/hr
Table 1 pricing table for General Purpose Instances referred from Amazon.com
Amazon Simple Storage Service (S3)
It’s a store which is highly robust for variety of the services sized from 1 byte to 5 TB. S3 is typically used
for images, style sheets, non executable files archives and file storage and other types of static content.
One can take content directly from Amazon S3 or use Amazon S3 as an origin store for pushing content
to your Amazon CloudFront edge locations.
It’s has following advantages, according AWS documentation:-
Secure: - Provides full control over the access to the User. Data can be secured in any state.
Reliable: - Store data with up to 99.999999999% durability, with 99.99% availability. There can
be no single points of failure. All failures must be tolerated or repaired by the system without
any downtime. it is higher standard of reliability.
Scalable: - It uses scale as an advantage: Adding nodes to the system increases, not decreases,
its availability, speed, throughput, capacity, and robustness.
Fast: - Server-side latency must be insignificant relative to Internet latency. Any performance
bottlenecks can be fixed by simply adding nodes to the system.
Inexpensive and Simple
Amazon Elastic Block Store (EBS)
Amazon Elastic Block Store, or EBS, provides persistent storage for EC2 instances which allows
application storage to be separated from the virtual machine. This is useful if any EC2 instance
experiences a failure goes down. The Elastic Block Store can be moved to a different instance to resume
service. EBS is typically used for databases in file systems.
Finally, the AWS Storage Gateway is a service that connects local file servers, such as a Network
Attached Storage, Direct Attached Storage or Storage Area Network to store encrypted files using the S3
service. This provides a secured mechanism for scalable off-site storage and backups.
Amazon Relational Database Service (RDS)
It’s a simple to deploy Database Web Service which gives access to the capabilities of a familiar MySQL,
Oracle, Microsoft SQL Server, or PostgreSQL database engine. So that pre existing code, applications,
and tools could be used with this service. RDS makes it easy to set up, operate, and scale a relational
database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming
database administration tasks, freeing up to focus on applications and business.
The highlights of the service are Simple to deploy, manageable, compatible, Fast, Predictable
Performance, Scalable, Reliable, Secure, Inexpensive and easy design to use with AWS.
8. 8
AWS DynamoDB
If there is no need of relational Database then it is a scalable NoSQL solution that is the fastest-growing
new service. NoSQL is not a relational database format, which means it doesn't support table joins. The
advantage is that it's very, very fast distributed and highly redundant. It is used in user messages and
image metadata.
Amazon SimpleDB
It is a highly available and flexible non-relational data store. Amazon SimpleDB creates and manages
multiple geographically distributed replicas of your data automatically to enable high availability and
data durability. Its benefits are, according to Amazon Documentations, Low touch, highly available,
flexible, and simple to use, secure, inexpensive and easy to use with other AWS services.
Below mentioned are Amazon S3 Object Metadata:-
Data type or format, Dates the object was created, accessed, or modified
Name or location of related objects
User ratings and comments
Subject or category tag and Geo location tag
It is also used for monitoring or tracking, metering, trend of business analysis and auditing.
Amazon Virtual Private Cloud
It is similar to managing a private data center; it supports both private and public subnets.
Amazon Elastic Load Balancing
It provides fault tolerance, directing traffic away from failed servers. It enables you to achieve greater
levels of fault tolerance in your applications, seamlessly providing the required amount of load balancing
capacity needed to distribute application traffic.
Amazon Platform Services
AWS Cloud Front
It comes under Content Distribution service. Amazon Cloud Front is a Content Delivery Network for files
of any size ranging from tiny files, like style sheets and images, to large files like installers or large media,
like movies or audio.
Messaging: Amazon SNS, Amazon SQS and Amazon SES
The Amazon Simple Notification Service or SNS can be used to push messages via a number of protocols,
including HTTP, email and SMS. Amazon Simple Queue Service, or SQS, provides a mechanism for
automating workflow messages between computers.
The Amazon Simple Email Service, or SES, provides bulk transactional emails such as notifications to
users upon events. It can also be used for newsletters and other sorts of large mailings. SES also
provides support for Domain Keys Identified Mail in association with the domain to improve
deliverability and to fight spam.
Amazon Elastic MapReduce (EMR)
It is a hosted Apache Hadoop open source framework for data intensive application for clustered
hardware running on EC2 NS3. EMR makes it easy to quickly and cost-effectively process vast amounts
of data. Amazon EMR is used in a variety of applications, including log analysis, web indexing, data
warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.
Below are steps to use EMR:-
Develop your data processing application
9. 9
Upload your application and data to Amazon S3
Configure and launch your cluster
(Optional) Monitor the cluster
Retrieve the output
Libraries & Software Development Kit
These interact with Amazon Web Services, but aren't services themselves. AWS provides Libraries &
SDKs including Java, PHP, Python, Ruby and .NET.
Tools for management and administration
Web Interface: Management Console
The Management Console is the primary consolidated web interface for managing AWS services. In
addition to the Web Interface, there's also a native mobile application called AWS Console for Android
and a series of stand-alone Commands Line Tools.
Consolidated Billing
It is less of a service and more like a feature. It allows multiple AWS accounts to be billed to a central
account. This is useful in organizations that have multiple individuals or departments with their own
accounts.
Other Important Amazon Cloud Services, these are mentioned on Wikipedia and Amazon.com:-
AWS Elastic Beanstalk provides quick deployment and management of applications in the cloud.
Amazon Identity and Access Management (IAM), an implicit service, the authentication
infrastructure used to authenticate access to the various services.
Amazon CloudFormation provides a file-based interface for provisioning other AWS resources.
Amazon Elastic Transcoder (ETS) provides video trans-coding of S3 hosted videos, marketed
primarily as a way to convert source files into mobile-ready versions.
Amazon Flexible Payments Service (FPS) provides an interface for micropayments.
Amazon Mechanical Turk (Mturk) manages small units of work distributed among many
persons.
Amazon Simple Workflow (SWF) is a workflow service for building scalable, resilient
applications.
AWS OpsWorks for configuration of EC2 services using Chef.
All above listed services gives brief introduction to various AWS services. The selection of a specific
service depends upon the application type, workload and framework. Most of the services support high
level of Service level agreements and few of them are Beta version. AWS is effective virtual machine
based cloud for, SLA aware service applications.
7. Benchmarking on AWS
The cloud performance is based on many factors. Benchmarking defines the parameters on which we
judge the performance based on industry accepted measures with categories ranging from sizing,
scaling , elasticity , availability ,Quality of service and consistency .
There are standard benchmarks like TPC-X or SPEC but these are hard to estimate and they fall short in
providing the record oriented usage of cloud services.
10. 10
Different types of Benchmarking and analysis
Network Benchmarking
Network throughput and latency is another important cloud performance characteristic. WIPS (web-
interactions per second) is generally used to define maximum latency along with Costs ($/WIPS) which
also plays a major role. The network latency and throughput depends upon the location factors and
where the customers are located. Organizations should choose a cloud service that will provide good
throughput and low latency to themselves and their potential users or customers. Table 2 shows a
comparison of various services.
Table 2: Comparing the Throughput, Cost/WIPS and Cost Predictability (Used from Advanced Topics in
IT, by University of Sydney)
Performance Benchmarking
Performance Benchmarking looks into the performance characteristics of the server clouds these are
integral part of the cloud solution and affects the quality of service which is generally in the SLA of the
cloud vendors. These characteristics include CPU, disk IO, memory, and other performance factors,
measured with HPL (LINPACK) performance of mi.small-based virtual clusters. Organizations should
consider these characteristics when selecting a cloud service in order to establish reasonable
expectations and growth planning.
FIGURE 4: Describing the Performance Analysis of EC2 cloud computing services (used from "A
Performance Analysis of EC2 Cloud Computing Services for Scientific Computing” by Simon Ostermann,
Alexandru Iosup , Nezih Yigitbasi, Radu Prodan ,Thomas Fahringer, and Dick Eperna)
11. 11
Uptime Monitoring
Service availability or up-time is a critical cloud performance characteristic. It is often assumed that
cloud providers have measures in place to provide very high availability. To measure uptime, an
organization monitors and follow up with the vendor after an extended outage and append supporting
comments to the uptime report. We also exclude scheduled maintenance. Table 3 and Fig 5 are used
from "A Performance of EC2 Cloud Computing Services for Scientific Computing” by Simon & team.
Table 3 the Amazon EC2 Instance types with ECUs defining CPU Performance unit by AMAZON
The following figures describe overheads associated with resource acquisition and release in EC2. The
total resource acquisition time is the sum of the Install and Boot times. The Release time is the time
taken to release the resource back to EC2.
Figure 5 comparing Resource acquisition and release overheads for acquiring single and multiple
instances
12. 12
Below table compares the HPCC benchmark performance of EC2. EC2 platform has much higher latency,
this relatively low network performance means that the ratio between the theoretical peak
performance and achieved HPL performance increases with the number of instances, EC2 clusters poorly
scalable.
Table 4 the HPCC Benchmark performance for various EC2 platforms (used from "A Performance
Analysis of EC2 Cloud Computing Services for Scientific Computing” by Simon Ostermann, Alexandru
Iosup , Nezih Yigitbasi, Radu Prodan ,Thomas Fahringer, and Dick Eperna)
8. General Issues in cloud performances
We tend to ignore the scalability when planning a cloud based infrastructure, if latency becomes
problem, we add machines which result in unlimited WIPS.
$/WIPS not clearly defined with the different cost models in the cloud.
Fault tolerance and elasticity is not measured with defined guidelines and benchmarking.
CONCLUSION
Remotely delivering services and hiding the complexity and management of the hardware and software
resources, cloud computing can be used to reduce costs and overhead. This allows clients to focus on
developing core products and services, rather than dealing with managing resources.
Over the period of time Amazon Web Services, have become an ever-present and
dependable mechanism for securely delivering quality. These Services are easy to use, scalable,
consistent, resilient, safe and economical services.
As we have seen and understood the various performance measures of the cloud services and the
challenges we face today, the important question is the performance of clouds sufficient with respect to
growing needs of resources, power and ever expanding datacenters. We still fall short of performance
benchmarks which will give us better QOS and cater to the transparent needs of the customers.
Amazon cloud still has one of the most scalable and cost effective performance standards we can scale
the platform from having single to multiple client instances with moderate overheads. With better
Benchmarking measures we will be able to have better analysis and drive this Amazon cloud service to
higher levels.
13. 13
REFERENCES
1. “A Performance Analysis of EC3 cloud Computing service for Scientific Computing” by Simon
Ostermann, Alexandru Iosup' , Nezih Yigitbasi, Radu Prodan ,Thomas Fahringer, and Dick
Eperna
2. “EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications” by
Jiang Dejun, Guillaume Pierre and Chi-Hung Chi
3. “ A view of Cloud Computing” doi:10.1145/1721654.1721672 by Michael Armbrust, Armando
Fox, Rean Griffith and team
4. “ A Taxonomy of Cloud Systems” by Bhaskar Prasad Rimal and Eunmi Choi
5. “ACM Cloud-Performance “ by Prof Kai Hwang and X Bai
6. “Top cloud IaaS providers compared” by By Thoran Rodrigues in The Enterprise Cloud, August
27, 2013.
7. “Fair Benchmarking for Cloud Computing systems” by Lee Gillam, Bin Li and John
8. “An Evaluation of Alternative Architectures for Transaction Processing in the Cloud” by Donald
Kossmann, Tim Kraska and Simon Loesing
9. http://cloudcomputing.ieee.org/
10. “Advanced Topics in IT: Cloud Computing” Dr. Uwe Röhm and Dr. Ying Zhou School of
Information Technologies 2011 at University of Sydney
11. https://aws.amazon.com/elasticmapreduce/
12. http://aws.amazon.com/ec2/
13. http://en.wikipedia.org/wiki/Amazon_Web_Services
14. “Iaas Performance Benchmarks : AWS “by Joe Masters Emison
15. http://cloudharmony.com/services
16. White Paper : “TCO Study for SAP on Amazon Web Services (AWS)” by VMS AG
17. “Up &running AWS” by John Peck
18. “Programming Amazon Web Services S3,EC2,SQS, FPS and SimpleDB” by James Murthy
(O’REILLY)