Architecting for the cloud intro, virtualization, iaa s
Architecting for the Cloud
Len and Matt Bass
– Teaching Professor for Carnegie Mellon in SE Department
– Associate Director of Professional Software Engineering Programs for Alumni and
– Siemens: Member of the Software Architecture Group at Siemens Corporate
– SEI: Resident affiliate at Software Engineering Institute
– SEI: Member of the technical staff
– 15+ years experience as an architect and software engineer
– Domains including Medical, Building Automation, Automotive, …
Personal Introduction II
• Len Bass
• My first computer (1964)
• 25 years at the SEI
– Working on Software
Architecture since ~1990
– Wide variety of domains
• 2 ½ years at NICTA
– Working on problems
associated with operations
• Who are you?
– Current position?
– Expectations for the course?
• Does anyone remember the original etoys.com?
– It has been recently resurrected by ToysRUs
• What kind of company was this?
• What did it take to get the company off the
• What were some of the issues?
• Founded in November 1996 by Toby Lenk
– He was an employee of Disney at the time
• He raised $15 million to found the company
– Remember this was during the dot.com boom
• He used this money to secure advertising deals and
create the initial infrastructure
• The company launched in Oct 1997
– It spent the upfront time building the infrastructure
• eToys had roughly $700,000 in sales in 1997
• By 1998 they had about 100 employees
• In 1998 they had about $30 million in sales
– They were, however, operating at a loss
• In 1999 they had about $150 million in sales
– With about 1000 employees
• Their break even point was about $900 million
• eToys filed for bankruptcy in 2001
• They had $257 million in debt
• One reason for their failure was the high cost
– The supply chain infrastructure was significant
– A large part of the cost, however, was the
A More Recent Example
• How many people have heard of Pinterest?
• How about Instagram?
– Instagram was founded in 2010
– The initial application was developed and
launched by the two founders
– It was purchased 2 years later for $1 Billion
• Instagram had 1 million users within 2 months of
• Within one year they had 15 million users
• By April of 2012 they had 30 million users
– 1 Billion photos uploaded
– 5 million photos per day
– 81 comments per second
• Instagram had 13 employees in September of 2012
• Launched in March 2010
• Had 10,000 users by December 2010
• By December 2011 it had 11 million visits a week
• By March of 2012 it was the 3rd largest social
– Behind Facebook and Twitter
• It had 10 employees at the time
• What are the key differences between these
– What kind of upfront investment was required?
– What technical knowledge was required?
– What resources were needed?
What Enabled This?
• eToys had to build their own infrastructure
– Required a data center
– Built their own order processing capability
• Pinterest and Instagram utilized an existing
– This infrastructure had all of the capabilities to support
• This allowed Pinterest and Instagram to focus
exclusively their primary applications
• The Pinterest and Instagram teams could
focus exclusively on their applications
• The existing infrastructure supported
– Network capacity
• Not only does it require less upfront effort, but
it’s less of an upfront investment
• There is essentially no capitol investment
• There are operational expenses only when the
system is deployed
– The operational expenses are inline with the use
• This situation exemplifies much of the promise of
– We will define cloud computing a bit later
• “Cloud Computing” promises things like:
– Economies of scale
– Reduced capitol investment
– Reduced time to market
– Lower operational costs
The Benefits are Real
• Organizations have in fact seen:
– Increased productivity
– Reduced labor costs
– Reduced infrastructure costs
– Improved agility
It’s Here To Stay…
• Today 8 of 10 companies use some form of
• Estimates for annual revenue from cloud
services range from $20 – $100 Billion1
• In 2011 cloud budgets represented 15% of
worldwide IT spending1
*CompTIA’s third annual trends cloud computing study
What Is Cloud Computing?
“Cloud computing is a model for enabling ubiquitous,
convenient, on-demand network access to a shared
pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and
services) that can be rapidly provisioned and
released with minimal management effort or
service provider interaction.”*
* National Institute of Standards and Technology - Publication 800-145
Five Characteristics – NIST Definition
• On-demand Self-Service
– A consumer can provision computing capabilities without human interaction
• Broad network access
– Computing capabilities are available over the network and accessed through
• Resource pooling
– Provider’s computing resources are pooled to serve multiple consumers with
different resources dynamically assigned according to consumers’ demands
• Rapid elasticity
– Computing capabilities can be rapidly and elastically provisioned to quickly scale
out and rapidly released to scale in
• Measured service
– Resource usage can be monitored, controlled, and reported. Providing
transparency for both the provider and consumer
• With the cloud you pay only for what you use
• This is much like a utility
– Think of electricity, natural gas, or water
• The roots of this notion go back to the 1960’s
– When computers were large and expensive
• Resources are “pooled” to serve multiple customers
• This means you likely don’t have dedicated hardware
– You are sharing physical resources with others
• Gives rise to things like “virtualization” and “multi-
– As we will see later on this results in some of the tradeoffs
that need to be managed
• As demand grows and shrinks computing
capability grows and shrinks
• Infrastructure providers have to be able to
• The ability to provision these resources rapidly
is very important
• Automation is a key component of elasticity
Cloud Service Models
• There are three primary cloud service models
– Software as a Service (SaaS)
– Platform as a Service (PaaS)
– Infrastructure as a Service (IaaS)
Software as a Service
• Software delivery service where software and
associated data is hosted in the cloud
• Most of us use some form of SaaS
– Google Calendar
• The primary “customer” for SaaS is the
• These could be individuals that use email, file
sharing, or social media application
• They could also be business that use customer
relationship management, accounting, or supply
chain management solutions
• The consumer doesn’t worry about the
installation, deployment, or management of the
Platform as a Service
• The vendor provides the computing platform and
solution stack as a service
• This computing platform is hosted in the cloud
• The consumer creates the software using tools
and/or libraries from the provider
• These include things like:
– Application design and development
– Testing and deployment
– Team collaboration
• The developer and administrator is the consumer of
• PaaS is used to develop, deploy, and monitor cloud
• These services typically include things like:
• These services are again not installed or administered
by the consumers
Infrastructure as a Service
• Providers offer computing resources to the
• Consumers can then deploy their applications
on these resources
• Examples include
– Amazon AWS
– Microsoft Azure
Uses of IaaS
• Infrastructure as a service is used by developers,
administrators, and organizations
• IaaS includes basic computing resources
• The consumers don’t have direct access to the
– Instead they get access to virtualized resources (more
about this in the next section)
• Today you’ll see many other “service models”
– Data as a Service (DaaS)
– Business Process as a Service (BPaaS)
• Not sure what they all mean
• This is often more marketing terminology than
Cloud Deployment Models
• Cloud services can be deployed in three
– Public deployment
– Private deployment
• We’ll look briefly at these in turn
• The “public cloud” is infrastructure that is
– This includes both the service and the network
• The resources are from a large resource pool
that is shared by many
• Amazon AWS, Windows Azure, and Google
Compute Engine are all examples of public
• As the name implies a “private cloud” is a
cloud infrastructure that’s available only to a
• The infrastructure lives within the boundaries
of the organization
– It utilizes the same technical approach as the
• Allows organizations to maintain control over
data and infrastructure
• A hybrid cloud is a set of cloud services that
live on both a public and private cloud
• It’s not uncommon for organizations to keep
some portion of their system and data in-
• They might, however, deploy another portion
in the public cloud
• Today there are many vertical segments with
similar specialized needs e.g.
– Government entities
• Providers have started to establish infrastructures
to meet the needs of given communities
– For example Amazon now has a region only for
Why aren’t all Systems in the Cloud?
• Given all the options and benfits, why aren’t all
systems in the cloud?
• Issues do exist, however, such as:
– Loss of control
– Privacy concerns
– Difficulty complying with regulations
– Security concerns
– Performance problems
– Availability issues
– Licensing issues
• Isn’t this taken care of in the cloud?
– In short, no …
• Many think the cloud will automatically provide the
scalability & availability needed
– While the cloud infrastructure can scale that doesn’t mean
your application will scale
• The cloud is built from faulty components
– At any point in time some parts of the cloud are not
• This morning (July 14th, 2014) I looked up
– Outages in China over the last few weeks
– Outages in the US last week in search
– Outages currently in Canada and Australia
• Amazon and Microsoft have similar reports
• The performance of the cloud is notoriously
• The infrastructure is shared
– This means that the demand imposed is beyond
• As a result you might not always get the
Designing for the Cloud
• Does this mean you can’t achieve properties such as
availability and scalability in the cloud?
– Again, no
• It does mean, however, that they need to be
designed into the system
– In order to achieve the objectives of the organization you
need to be explicit about designing the system to promote
• Know what the cloud is
• Understand the basic structure of the cloud
• Understand the implications of the decisions taken in
• Know what options exist when designing a system for
• Know how to evaluate the impact of specific decisions
What This Course Is NOT
• A course that focuses on the architecture process
– We have another course that focuses on this
• A course that teaches you how to use specific
– We are agnostic with respect to technologies, service hosts, and
– We don’t talk about implementation level details
Focus of the Course
• Architectural concepts
• Structure of the cloud
– Major components
– Design decisions
• Options for achieving desired properties
• The course is split into three sections
– Architecting for the Cloud
• In order to understand the decisions and related
impact we need to understand some basic concepts
• In order to make sure we all have the same
understanding we’ll be talking about:
• In this section we will describe the
infrastructure of the cloud itself
• We’ll discuss:
– The key concepts
– The key design decisions
– The benefits of these decisions
– The tradeoffs associated with these decisions
Architecting For The Cloud
• We will then talk about architecting for the cloud
• We will discuss what options exist for achieving
• We will talk about the tradeoffs and
considerations when selecting these options
• We’ll also look at operational concerns
• Double click on machine description
• Brings up Ubuntu with Firefox and Libre
• Point Firefox to the NICTA web site.
• Middle of the screen is the Ubuntu image in
the virtual machine. The rest is Windows
What do we have?
• Fully functional Ubuntu system is running within
• Downloading other machine image would result
in different OS or different software.
• IP addresses:
– Windows 2402:1800:1:2801:4492:2f34:dd2e:1079
– Ubuntu: 08:00:27:51:7f:09
• Ubuntu system is “sandboxed” from Windows
– Cannot import or export files or data directly.
– Could probably import/export through file sharing,
e.g. Google Drive.
• Computer – “virtualized”
• Machine image – set of bits that are loaded
into the virtualized computer
• Result gets an IP address that is distinct from
the IP address of the host.
How is this different from a VM in the
• Not much.
• The “cloud” is a publically accessible platform with
100000s of computers.
• My Windows host is 1 computer.
• You interact with VirtualBox through desktop
– VirtualBox directly paints on the screen
• You interact with the cloud through http.
– Could be through a browser
– Could be through an app on your device (desktop, laptop,
Virtual Memory address translation
• Hardware enables trapping instructions that are outside of current
Hypervisor and Virtual Machine
• Target address goes
through two different
page tables to fetch the
• First points to virtual
machine page table
• Second points to
address of next
• Hardware is set up to
support this process
• Hypervisor is supervisory
program that manages
page tables and
scheduling of Virtual
Target address of
Host Page table
points to VM
• Computer with bare
• Instruction set is the same as
the host computer
• Address space is guaranteed
private from other virtual
machines (through the
• Available memory may be
less than that in the host
• Processor is shared across all
virtual machines on a single
Virtual machine images
• Bare (virtual) hardware may be all that is necessary for some
uses. E.g. operating system revisions.
• For other uses it is useful to have an operating system and
possibly some applications. Application licensing is, typically,
by virtual machine.
• The cloud infrastructure provides the capability to preload a
virtual machine with an image. This image can be from a
library or from something created by the user on a previous
visit to the cloud. Sample image might be LAMP – Linux,
Apache Server, MySQL, PhP
• Furthermore, it might be that a memory image is saved by an
application to allow for restart in the case of failure.
• Virtual machine
• Virtual network
• Other related topics
• DNS server
• IP addresses
• IP messages
• IP management
Domain Name Server (DNS)
Domain Name Server
Client sends URL to DNS
DNS takes as input a URL and returns an IP address
Client uses IP address to send message to a site
• In reality, messages being transmitted from one
computer to another is more complicated.
• The picture showed a single DNS server.
– There are multiple DNS servers
– There is a hierarchy of DNS servers.
• The picture showed a single line from client to
– There is a network for routers to transmit messages
– Shares load
– Hierarchy based on IP number.
• Consider URL www.nicta.com.au
– If one server held all DNS -> IP mappings, it would
both get overloaded and hold over 200 million
• DNS is arranged as a hierarchy.
• There is an “authorative” name server that
holds all of the final suffixes (e.g. .au, .edu,
• It is replicated for performance reasons
• The final suffix DNS has the IP of the .au DNS.
• The .au DNS has the IP of the .com.au DNS
• The .com.au DNS has the IP of nicta.com.au
• The nicta.com.au DNS, in turn, has IP for various
local DNSs that are under NICTA’s control.
• This allows NICTA to change the IP of the various
local DNSs without changing anything up the
• This becomes important when we discuss
business continuity options.
IPv4 and IPv6
• An IP (Internet Protocol) address is a numerical label that
identifies a “device” on the internet.
• IPv4 is 32 bits long and gives a four digit sequence -
• 32 bits is insufficient and so IPv6 was created in 1995 and it
has 128 bits.
• For legacy reasons, IPv6 has had a very slow adoption. IPv4
numbers have been exhausted. This is causing more
conversion to IPv6. June 8, 2011 was designated as world
IPv6 day where top websites and internet providers
provided a 24 hour test of IPv6 infrastructure. This test was
• Google publishes statistics for percentage of users that
access Google over IPv6. It is now around 3.25%
Assigning IP addresses
• Every “device” on the internet includes virtual
machines in a cloud.
• Every VM gets an IP address when it is created. This IP
address can be
• Private and not seen outside of the cloud.
• Public and directly addressable from outside of the cloud.
• An IP message has a header and a payload. The header
– IP address of the source
– IP address of the destination
Private and Public IP addresses
Private IP addresses:
– If IPA sends message to IPB, i.e., IPA+payload -> IPB , a gateway can make
it look like the message comes from the gateway. i.e. IPgateway+payload
– In this case the gateway must maintain a table so that it can
manage the response from IPB
Public IP addresses
The VM manager is given a range of IP addresses that it can assign to VM
An assignment only lasts as long as the instance does, then it can be re-
Messages from the instances come from the assigned IP address and
recipient can respond directly to instance.
What does this have to do with DNS servers?
Getting a message to VM inside the
• Virtual machine is allocated space on host
• This is local disk available to the VM.
• More extensive disk space is available through
other features. We return to this when we
discuss various file options.
• One physical computer hosts multiple virtual
• Messages to/from virtual machine go through
• Host machine’s disk is shared among
hypervisor and VMs hosted on that machine.
• Multi-tenancy has implications with respect to
performance and security.
• Set of bits that are constitute execution environment.
• As with VirtualBox could be
– Operating system
– Operating system + middleware
– Operating system + middle ware + application
– Operating system + middleware + application + data
• Why put data in a machine image?
– Configuration values such as image ID
– History such as where image came from
– Location of other configuration parameters
• Cloud management system
– Chooses which physical host has capacity for new VM
– Assigns IP address and keeps mapping of IP address to
physical host in internal routing table.
– Tells hypervisor to allocate new VM and sends
hypervisor IP address and pointer to machine image.
• Hypervisor on chosen physical host
– Creates page table for new VM
– Allocates disk space
– Keeps internal mapping from IP address to VM
– loads machine image into allocated VM.
• Instance removal is a matter of undoing the steps
involved in allocating an instance.
• Local disk should be cleared so that information
stored on it is no longer available
• Public IP addresses may cause a problem since IP
address may be reallocated to a different VM.
– Amazon allows you to map an IP address to different instances
under program control.
• Clients of the application that know the IP
address may use it to send messages that will
arrive at a different VM.
Applications running in the cloud require hundreds
or thousands of configuration parameters.
– Hadoop has 206 options
– Hbase has 64
• Place configuration parameters in persistent storage
• Build knowledge of location into application
• We will return to configuration parameter issue when we
discuss deployment pipeline.
• An environment for a system consists of
– The system +
– its configuration parameters +
– The external systems with which it interacts
• Now suppose all external systems are defined
through configuration parameters
• Then the environment can be changed by
changing the configuration parameters
• These are architectural decisions that need to be
• Keep all configuration parameters in a
database read at system initialization
• Then moving from one environment to
another is a matter of changing the database
from which the configuration parameters are
Testing Environment Production Environment
Test database Production database
E.g, Moving from test environment to
production environment is a matter of
changinga single database pointer
Using the cloud
• The environments in which a system lives include
– Development (usually on your desktop)
– Integration (in the cloud)
– Staging for performance and user acceptance
• Keeping all configuration parameters in databases
and reading the relevant database at initialization
allows for easy movement from one environment
IaaS Issues – 1
– IaaS providers provide sophisticated reliability mechanisms.
– Instances may fail
– Consumers must perform risk analysis to determine the extent to
which they wish to supplement providers reliability mechanisms with
– Multi-tenancy impacts performance compared to individual machines
because of the sharing of the CPU and the overhead of the
– Allocating additional instances for scaling is the responsibility of the
consumer either explicitly or through setting rules for allocation.
– All access to the cloud is through the internet introducing latency
delays over when the data is stored locally.
IaaS issues - 2
– Normal types of attacks through the internet are no
different in the cloud.
– Customers must trust the IaaS provider to respect the
privacy of data and computations.
– Multi-tenancy allows for other types of attacks based
on information leakage. E.g. a side channel attack can
use cache timing information to detect keys.
– Each cloud provider has their own set of interfaces
and standards. This introduces significant risk of
vendor lock in.
IaaS issues - 3
• Law/regulations with respect to data location.
– Some jurisdictions require that data not leave their
jurisdiction – e.g. EU has different privacy laws than
the US. Following scenarios cause concern:
• EU data stored in a US data center – disallowed by EU law?
• The same data stored in two different locations may mean
that one set of data is available to a government entity
• Disclosure laws when someone accesses protected data
differ in different jurisdictions.
• Some jurisdictions require that the cloud provider make
available keys and passwords.
• Internet as a service has compelling economic
• The architecture for IaaS is based on having
– virtual machines,
– virtual networks, and
– virtual file systems.
managed by a cloud management system
• The concept of an environment for a system simplifies
moving a system from development to production
• IaaS platform has different set of issues from local
platforms and architect must be aware of these issues.