The Lean Cloud for Startups with AWS - Architectural Best Practices & Automating your Infrastructure

Architectural Best practices &
Automating your Infrastructure

Architectural Best practices &
Automating your Infrastructure

Rules for ‘just making things work’

What your users want…

Fast, performant
experience

Always on,
Fast, performant
accessible
experience
anywhere

Always on,
Fast, performant
accessible
experience
anywhere

Personalized and
rich application

Always on,
Fast, performant
accessible
experience
anywhere

Lots of new
Personalized and
features all of the
rich application
time

Always on,
Fast, performant
accessible
experience
anywhere

Powerful web applications

Lots of new
Personalized and
features all of the
rich application
time

Building powerful web applications

Rule 1: Service all web requests

Rule 2: Service requests as fast as possible

Rule 3: Handle requests at any scale

Rule 4: Simplify architecture with services

Rule 5: Automate operational management

Rule 6: Leverage unique cloud properties


Or
“you aren’t doing anything if you
don’t answer the door…”

a) Make sure requests get to your ‘front door’

DNS Application Data


Request DNS Application Data



Clients can’t resolve …then this is
you? irrelevant


Feature Details
Global Supported from AWS global edge locations for fast and reliable domain
name resolution
“100% Scalable Automatically scales based upon query volumes
Available” Route53
Latency based routing Supports resolution of endpoints based upon latency, enabling multi-
SLA region application delivery
Integrated Integrates with other AWS services allowing Route 53 to front load
http://aws.amazon.com/route53/sla balancers, S3 and EC2
Secure Integrates with IAM giving fine grained control over DNS record access

b) Make sure you open the door when they arrive


Route53


Region

Availability Zone

Elastic load balancing
Route53 Availability Zone Multi-availability zone
Multi-region
Availability Zone

Elastic
Load
Balancer Availability Zone
Region

c) Have the data to form a response

Region

Availability Zone

Route53 Availability Zone

Availability Zone

Elastic
Load
Region

c) Have the data to form a response

Region

Multi-AZ RDS Availability Zone

(Master-slave)
Route53 Availability Zone
Inter-region
replication
Availability Zone
Read-replicas
Elastic
Load
Region



Or
“I’m not hanging around, so be
quick…”

a) Choose the fastest route

Request Route53

Region Region B
A


Request Route53

16ms 92ms

Region Region B
A


Request Route53
Region A DNS entry

16ms

Region Region B
A

b) Offload your application servers

CloudFront 3 Served from S3
World-wide content distribution network /images/*

Easily distribute content to end users with low
latency, high data transfer speeds, and no
commitments.

London 2 Served from EC2
*.php

Paris

1 Single CNAME
NY
www.mysite.com


Without CloudFront
EC2 webservers/app servers loaded by user
requests


With CloudFront
Load of user requests pushed into
CloudFront, EC2 cluster can scale
down

Offload
Scale
Down


No CDN CDN for CDN for
Static Static &
Content Dynamic
Content

Offload
Scale
Down
Response Time
Response Time

Response Time
Server Load

Server
Server
Load

Load

c) Cache it if you can

ElastiCache
Memcached compatible caching
layer

Serve frequently requested & slow
changing data from scalable cache
clusters

Reduce load on database and other
servers

d) Single digit latencies where it matters
Database Query Performance

Desired consistency, predictability

Scale



Actual
degraded
performance
with scale
Scale



Management problems
Data sharding
Data caching
Actual Provisioning
degraded Cluster management
performance Fault management
with scale
Scale


Dynamo DB Query Performance DynamoDB
Low latency
Large scale
Zero admin
Predictable performance

Relational
Database
Query
Performance
Scale


Dynamo DB Query Performance DynamoDB
Low latency
Large scale
Average single-digit milliseconds server side Zero admin
latencies Predictable performance

Runs on solid state drives, and is built to
maintain consistent, fast latencies at any scale

Scale




Or
“When they come, they REALLY
come…”

a) Scale up

Vertical Scaling
From $0.02/hr

Scale up with Elastic Compute Cloud (EC2)
Basic unit of compute capacity
Range of CPU, memory & local disk options
14 Instance types available, from micro through cluster
compute to SSD backed

a) Scale up
b) Scale out

as-create-auto-scaling-group MyGroup
Trigger
auto-scaling --launch-configuration MyConfig
policy --availability-zones eu-west-1a
--min-size 4
--max-size 200

Auto-scaling
Automatic re-sizing of compute clusters based upon demand

a) Scale up
b) Scale out

Manually By Schedule
Send an API call or use CLI to Scale up/down based on date and time
launch/terminate instances – Only need
to specify capacity change (+/-)

By Policy Auto-Rebalance
Scale in response to changing conditions, Instances are automatically
based on user configured real-time launched/terminated to ensure the
monitoring and alerts application is balanced across multiple
Azs

a) Scale up
b) Scale out

Manually By Schedule
Preemptive manual scaling of
Send an API call or use CLI to Regular scaling up and down of
Scale up/down based on date and time
launch/terminate instances – Only need
capacity instances
e.g. before a marketing event add(+/-)
to specify capacity change 10 more e.g. scale from 0 to 2 to process SQS
instances messages every night or double capacity
on a Friday night

By Policy Auto-Rebalance
Scale in response to changing conditions, Instances are automatically
Dynamic scale based upon
based on user configured real-time
Maintain capacity across
launched/terminated to ensure the
monitoringmetrics
custom and alerts application is balancedzones multiple
availability across
e.g. SQS queue depth, Average CPU load, e.g. Instance availability maintained in
Azs
ELB latency event of AZ becoming unavailable

a) Scale up
b) Scale out
c) Dial it up

Elastic Block Store DynamoDB
Provisioned IOPS up to 1000 per EBS Provisioned read/write performance per
volume table
Predictable performance for Predictable high performance scaled via
demanding workloads such as console or API
databases

“AWS gave us the flexibility to bring a massive
amount of capacity online in a short period of
DynamoDB: time and allowed us to do so in an operationally
over 500,000 writes per straightforward way.
second
AWS is now Shazam’s cloud provider of choice,”
Amazon EMR:
more than 1 million writes Jason Titus,
per second CTO




Or
“Building new stuff is fun, but other
peoples software is a drag”


30% 70%

On-Premise Your Managing All of the
Infrastructure Business “Undifferentiated Heavy Lifting”


30% 70%

On-Premise Your Managing All of the
Infrastructure Business “Undifferentiated Heavy Lifting”

AWS
More Time to Focus on Configuring Your
Cloud-Based
Your Business Cloud Assets
Infrastructure

70% 30%


Relational Database Service
Use RDS for databases Database-as-a-Service
No need to install or manage database instances
Scalable and fault tolerant configurations

DynamoDB Use DynamoDB for
Provisioned throughput NoSQL database high performance key-
Fast, predictable performance
value DB
Fully distributed, fault tolerant architecture


Amazon SQS Reliable message
Processing results Reliable, highly scalable, queue service
queuing without
for storing messages as they travel
Amazon SQS between instances
additional software

1
Processing
task/processing
trigger 2

Push inter-process Simple Workflow Task A

workflows into the Reliably coordinate processing steps
Task B 3
across applications
cloud with SWF (Auto-scaling)

Integrate AWS and non-AWS resources
Manage distributed state in complex
systems Task C

Document
Server
Cloud Search
Don’t install search Elastic search engine based upon

software, use Amazon A9 search engine
Fully managed service with
CloudSearch Search
sophisticated feature set
Server
Scales automatically

Results

Elastic MapReduce
Elastic Hadoop cluster
Process large volumes
Integrates with S3 & DynamoDB of data cost effectively
Leverage Hive & Pig analytics scripts with EMR
Integrates with instance types such as
spot

“Amazon CloudSearch is a game-changing
product that has allowed us to deliver powerful
new search capabilities. Our customers can now
find what they are looking for faster and more
easily than ever before…

….We saved many months of re-architecture
and development time by going with Amazon
CloudSearch”

Don MacAskill
CEO & Chief Geek
SmugMug

Or
“Run it from the iPhone…”



a) Everything is programmable

Access everything Achieve the highest levels
via CLI, API or Compute of automation
Console Security Scaling sophistication with ease
CDN Backup
DNS Database
Storage Load Balancing
Workflow Monitoring
Networking
Messaging

b) Think disposable, one click deployments

Cloud Formation
Automate creation of ‘stacks’ in a repeatable way
Scripting framework for AWS resource creation
Feature Details
Platform support Support for AWS resources from EC2 to IAM

Resource creation Creates AWS resources behind the scenes and reports
on progress
Declarative Specify stacks in JSON format and source control your
environments
Customizable Drive stack creation with paramaters

c) Design for failure, implement self healing

Bootstrapping Auto-scaling Cloud Watch

Customize instance Maintain capacity of Know what’s going on,
startup instances take automated actions
Get instances to ask ‘who am Using a minimum pool Use CloudWatch standard and
I?’ question on startup and be size will maintain custom metrics to create
configured dynamically upon capacity in the event of alarms.
being asnwered instance failures Respond with automated
administration actions

c) Design for failure, implement self healing

Or
“Do awesome things, fast…”


a) Optimize costs with instance types

Hi-Mem 4XL 68.4 GB
26 ECUs
8 virtual cores Cluster Compute 8XL 60.5 GB
88 ECUs
8 core 2 x Intel Xeon
Hi-Mem 2XL 34.2 GB
13 ECUs
4 virtual cores

Cluster Compute 4XL 23 GB
33.5 ECUs
Hi-Mem XL 17.1 GB 8 Nehalem virtual cores
6.5 ECUs
2 virtual cores
Cluster GPU 4XL 22 GB
Extra Large 15 GB 33.5 ECUs
8 ECUs 8 Nehalem virtual cores
4 virtual cores 2 x NVIDIA Tesla “Fermi”
M2050 GPUs

Large 7.5 GB High-CPU XL 7 GB
4 ECUs 20 ECUs
2 virtual cores 8 virtual cores

Medium 3.75 GB
Small 1.7 GB, 2 ECUs
1 ECU 1 virtual cores
1 virtual core
High-CPU Med 1.7 GB
Micro 613 MB 5 ECUs
Up to 2 ECUs (for 2 virtual cores
short bursts)


On-demand instances Reserved instances Spot instances

Unix/Linux instances start at 1- or 3-year terms Bid on unused EC2 capacity
$0.02/hour
Pay low up-front fee, receive significant hourly Spot Price based on supply/demand,
Pay as you go for compute power discount determined automatically

Low cost and flexibility Low Cost / Predictability Cost / Large Scale, dynamic workload handling

Pay only for what you use, no up-front Helps ensure compute capacity is available
commitments or long-term contracts when needed
Use Cases:
Use Cases:
Use Cases: Applications with flexible start and end times
Applications with short term, spiky, or
unpredictable workloads; Applications with steady state or predictable Applications only feasible at very low compute
usage prices
Application development or testing
Applications that require reserved capacity,
including disaster recovery


7000

6000 Spot

5000

4000 On Demand

3000

2000

Reserved Instances
1000

0

b) Get insight fast with Elastic MapReduce

Elastic MapReduce Feature Details
Managed, elastic Hadoop cluster Scalable Use as many or as few compute instances running
Hadoop as you want. Modify the number of
Integrates with S3 & DynamoDB
instances while your job flow is running
Leverage Hive & Pig analytics scripts
Integrates with instance types such as spot
Integrated with Works seamlessly with S3 as origin and output.
other services Integrates with DynamoDB

Comprehensive Supports languages such as Hive and Pig for
defining analytics, and allows complex definitions
in Cascading, Java, Ruby, Perl, Python, PHP, R, or
C++

Cost effective Works with Spot instance types

Monitoring Monitor job flows from with the management
console


S3 + DynamoDB Input data

Code Elastic Name Output
MapReduce node S3 + SimpleDB

Queries
HDFS
+ BI
Via JDBC, Pig, Hive
Elastic cluster

Features powered by Amazon Elastic
MapReduce:
People Who Viewed this Also Viewed
Review highlights
Auto complete as you type on search
Search spelling suggestions
Top searches
Ads

200 Elastic MapReduce jobs per day
Processing 3TB of data

“With AWS, our developers can now do things they
couldn’t before…

…Our systems team can focus their energies on other
challenges.”

Dave Marin
Search and data-mining engineer

c) Create a supercomputer backend when you need it

Cluster compute instances Network placement groups
Implement HVM process execution Cluster instances deployed in a ‘Placement Group’ enjoy low
Intel® Xeon® E5-2670 processors latency, full bisection 10 Gbps bandwidth
10 Gigabit Ethernet

80 EC2
Compute Units

60GB RAM

3TB Local
Disk
Cluster
Compute 10Gbps

With AWS

Elastic utility
capacity
✔ Always on,
accessible
anywhere

Lots of new
Personalized and
features all of the
rich application
time

With AWS

Elastic utility
capacity
✔ Highly available
global coverage
✔

Lots of new
Personalized and
features all of the
rich application
time

With AWS

Elastic utility
capacity
global coverage
✔

✔
Agility &
Personalized and
automated
rich application
operations

With AWS

Elastic utility
capacity
global coverage
✔

✔ ✔
Agility & Cost effective
automated storage, big data &
operations analytics

The Lean Cloud for Startups with AWS - Architectural Best Practices & Automating your Infrastructure

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Lean Cloud for Startups with AWS - Architectural Best Practices & Automating your Infrastructure

Similar to The Lean Cloud for Startups with AWS - Architectural Best Practices & Automating your Infrastructure (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

The Lean Cloud for Startups with AWS - Architectural Best Practices & Automating your Infrastructure