How can you take an existing monolith to the cloud with very minimal effort? In this talk we will explore an architecture that can help you to achieve that while focusing on scalability and resilience.
4. We are business focused
technologists that deliver.
Ā | Ā |
Accelerated Serverless AI as a Service Platform Modernisation
We are hiring
do you want to ?
work with us
loige 4
5. I co-host a podcast about AWS with !
@eoins
@loige
awsbites.com Pleaze, subscribe š
5
10. Business Summary
@loige
SaaS CMS for legal practices
1 founder + 1 developer
šøBootstrapped business
šGood MVP, getting attention in the market
šŖStarted a TRIAL with a big customer
10
11. Current problems
@loige
šThe company is growing
ābut the technology does not scale!
š¦1 monolithic server
š„Frequent failures = š¤¬unhappy customers
š„The business is at risk!
11
12. Desired State
@loige
āMore reliable & scalable infrastructure
šMinimal amount of change required*
* the team is not skilled with the cloud & containers, we need to keep cognitive load low
12
13. @loige
š¤
"I heard that the cloud is great but we don't
have the time and the skills to re-architect
everything as micro-services!"
13
16. Example use cases
@loige
A user logs in the application and they should be able to see
all their previously uploaded legal documents
A user can upload new documents and organize them by
providing speciļ¬c tags (client id, case number, etc.)
A user might search for documents containing speciļ¬c
keywords or tags
16
28. Region
@loige
A physical location around the world (e.g. North
Virginia, IrelandĀ or Sydney)Ā where AWS hosts a group
of data centers.
Ā
Regions help to provision infrastructure that is closer
to the customers, so that our applications can have
low latency and feel responsive.
28
29. Availability Zone (AZ)
@loige
Discrete data center with redundant power,
networking, and connectivity in an AWS Region.
Ā
Data centers in diļ¬erent availability zones are
disjointed from one another, so if thereās a serious
outage, thatās rarely aļ¬ecting more than one
availability zone at the same time.
29
30. Availability Zone (AZ)
@loige
Itās good practice to spread redundant infrastructure
across diļ¬erent availability zones in a given region to
guarantee high availability.
30
31. VPC
@loige
A virtual (private) network provisioned in a given
region for a given AWS account.
Ā
It is logically isolated from other virtual networks in
AWS.
Ā
Every VPC has a range of private IP addresses
organised in one or more subnets.
31
32. Subnet
@loige
A range of IPs in a given VPC and in a given availability
zone that can be used to spin up and connect
resources within the network.
Ā
Subnets can be public or private.
Ā
A public subnet can be used to run instances that can
have a public IP assigned to them and can be
reachable from outside the VPC itself.
32
33. Subnet
@loige
Itās good practice to keep front-facing servers (or load
balancers) in public subnets and keep everything else
(backend services, databases, etc.) in private subnets.
Ā
Traļ¬c between subnets can be enabled through
routing tables to allow for instance a load balancer in a
public subnet to forward traļ¬c to backend instances
in a private subnet.
33
34. Quick Recap
@loige
Region: physical location with data centers
Availability Zone: data center in a region
VPC: a virtual private network in a region
Subnet: range of IPs in a VPC in a given AZ
Ā
34
40. Application Load Balancer (ALB)
@loige
The entry point to all the application traļ¬c.
Ā
Layer 7 Load Balancer (HTTP, HTTPS, WebSocket,
gRPC).
Ā
Highly available: replicated in all our public
subnets.
40
41. Application Load Balancer (ALB)
@loige
Scalable: can handle millions of request per
second.
Ā
Managed service: we don't need to conļ¬gure the
OS or install software patches.
Ā
Can be integrated with ACM (AWS Certiļ¬cate
Manager) to support HTTPS.
41
43. Application Load Balancer (ALB)
@loige
Target group
š„
/health
ā
ā
/health
/health
ā
Unhealty targets
won't get any traļ¬c
43
44. Application Load Balancer (ALB)
@loige
Targets can be added dynamically.
Ā
We can scale targets automatically using
autoscaling groups.
Ā
E.g. Add or remove instances based on num
requests in-ļ¬ight or on avg CPU of the current
instances.
44
45. How does it scale?
@loige
Being a managed service, scalability is mostly
handled out of the box by AWS.
45
46. Resiliency
@loige
A load balancer can distribute traļ¬c to multiple
AZs, so if one of them becomes unavailable it will
keep distributing traļ¬c to the remaining ones.Ā
46
48. EC2 - Virtual Machine
@loige
Virtual machine running all the necessary
software for the service (Nginx, Node.js, app code,
etc.)
Ā
They need to use Security Groups (allow traļ¬c)
and IAM Roles (allow them to access other AWS
resources like S3).
48
49. EC2 - Virtual Machine
@loige
We will need to provision multiple machines
dynamically.
Challenges:
Consistency
š®Cattle vs šPet mindset
Stateless applications
49
50. Consistency
@loige
All our virtual machines have to be the same: we
need to build an AMI (Amazon Machine Image).
Ā
An AMI contains OS, libraries, software and source
code.
Ā
You can use an AMI to start a new instance.
50
51. Consistency
@loige
While we can build an AMI manually, it's better to
use tools to automate the work:
Hashicorp Packer
EC2 Image Builder
51
52. š®Cattle vs šPet mindset
@loige
Once an instance has been launched we shouldn't
change it anymore (e.g update the OS, install new
softare, update the code, etc.)
Ā
If we need to change something, we build a new
image and deploy new instances.
Ā
Instances are disposable!
52
53. Stateless
@loige
We are load balancing traļ¬c so a user might be
served by diļ¬erent instances during their session.
Ā
A single instance should not store any state (e.g.
user sessions, uploaded ļ¬les, etc.)
Ā
State should be stored outside instances
(ElastiCache, S3, RDS, etc).
53
54. Stateless
@loige
Making an application stateless might require a
good amount of code change.
Ā
A shortcut to this might be to enable
in the ALB, but it's not recommended for
scalability and resiliency.Ā
sticky
sessions
54
55. How does it scale?
@loige
Every instance will be able to handle a certain
number of requests per second.
Ā
We can scale by adding more instances when the
traļ¬c grows.
55
56. Resiliency
@loige
We should have at least 1 instance per
availability zone.
Ā
If there is an AWS outage, the instances on the
healthy availability zone will keep handling
requests.
Ā
We can use an autoscaling group to make sure
that unhealthy instances are replaced.
56
58. Simple Storage Service (S3)
@loige
One of the very ļ¬rst AWS services and (probably)
the most famous one.
Ā
Object storage service: Allows you to store any
amount of data durably.
Ā
You need to use the SDK to read and write data.
58
59. Simple Storage Service (S3)
@loige
Data can be organised in logical containers called
Buckets.
Ā
Key/value model: Inside a bucket you can store
data by providing a key and the content.
59
61. Simple Storage Service (S3)
@loige
Too much code to change?
A ļ¬rst migration could be done by using a
something like Ā to create a "virtual
ļ¬lesystem" that allows you to read/write to S3
seamlessly.
s3fs-fuse
61
62. How does it scale?
@loige
S3 is a managed service which automatically
scales to thousands of read/write operations per
second.
62
63. Resiliency
@loige
S3 is provisioned in multiple AZs by default and it
makes multiple copies of your data.
Ā
All of this happens transparently, no special
conļ¬guration required.
63
65. Relational Database Service (RDS)
@loige
Managed relational database service for MySql,
PostgreSQL, MariaDB, Oracle & SQL Server.
Ā
Being a managed service, AWS takes care of most
common concerns like backups and updates
(conļ¬gurable).
65
66. How does it scale?
@loige
RDS PostgreSQL supports Read Replicas: you can
provision additional instances to which you can
distribute heavy read-only queries.
Ā
Ā
66
67. Resiliency
@loige
RDS PostgreSQL can be conļ¬gured to work in
Multi-AZ mode: this means that there will be one
or two standby copies of the database in diļ¬erent
AZs.
Ā
If the primary DB instance or the primary AZ have
an outage, one of the standby copies are
promoted to become "the primary" instance.
Ā
67
68. Resiliency
@loige
Failover is fast but not instantaneous (60-120
seconds), so we need to make sure to plan for
possible connectivity failures in your app and
show clear error messages to the users.
68
70. ElastiCache
@loige
Managed in-memory caching service supporting
Redis and Memcached.
Ā
Meant to be used for use cases that don't require
durability like data cache, session stores, gaming
leaderboards, streaming, and analytics.
Ā
AWS takes care of maintenance.
70
71. How does it scale?
@loige
A single instance of Redis (with enough memory)
can scale to signiļ¬cant amounts of traļ¬c.
Ā
If you need more, you can run ElastiCache Redis in
Cluster Mode and shard your data across
multiple Redis instances.
71
72. Resiliency
@loige
ElastiCache Redis can operate in Multi-AZ mode.
Ā
Similarly to RDS, in case of failures, there might be
some downtime while the new master is
promoted.
Ā
We need to make sure the app accounts for Redis
connection failures.
72
74. Route53
@loige
Highly available and scalable cloud DNS service.
Ā
Can be used to direct traļ¬c on a given domain to
our Application Load Balancer.
74
76. Infrastructure as Code (IaaC)
@loige
We could provision everything "manually" from
the web console, but...
Ā
It will be hard to create consistent
environments for development and QA
It will be hard to change things incrementally
How would we test and review changes before
applying them in production?
76
77. Infrastructure as Code (IaaC)
@loige
It's better to deļ¬ne all the infrastructure using code.
There are several tools that can help us with that:
CloudFormation
Hashicorp Terraform
Cloud Development Kit (CDK)
Pulumi
77
78. {
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "AWS CloudFormation Sample Template EC2InstanceWithSecurityGroupSample: Create an Amazon EC2 instance running the A
"Parameters" : {
"KeyName": {
"Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance",
"Type": "AWS::EC2::KeyPair::KeyName",
"ConstraintDescription" : "must be the name of an existing EC2 KeyPair."
},
"InstanceType" : {
"Description" : "WebServer EC2 instance type",
"Type" : "String",
"Default" : "t2.small",
"AllowedValues" : [ "t1.micro", "t2.nano", "t2.micro", "t2.small", "t2.medium", "t2.large", "m1.small", "m1.medium", "m1.large"
,
"ConstraintDescription" : "must be a valid EC2 instance type."
},
"SSHLocation" : {
"Description" : "The IP address range that can be used to SSH to the EC2 instances",
"Type": "String",
"MinLength": "9",
"MaxLength": "18",
"Default": "0.0.0.0/0",
"AllowedPattern": "(d{1,3}).(d{1,3}).(d{1,3}).(d{1,3})/(d{1,2})",
"ConstraintDescription": "must be a valid IP CIDR range of the form x.x.x.x/x."
}
},
"Mappings" : {
"AWSInstanceType2Arch" : {
"t1.micro" : { "Arch" : "HVM64" },
"t2.nano" : { "Arch" : "HVM64" },
"t2.micro" : { "Arch" : "HVM64" },
@loige
Example of CloudFormation template
78
84. Streamlined data migration
@loige
AWS Database Migration service allows
you to replicate all the data from the old
database to the new one.
Ā
It will also keep the 2 Databases in sync
during the switch over!
84
85. Switching traffic
@loige
Request a new certiļ¬cate using AWS
Certiļ¬cate Manager (ACM).
Ā
Can be validated by email or DNS.
Ā
Point your DNS to the new Load Balancer
in AWS!
85
88. New opportunities š
@loige
We can scale dynamically!
As the team grows and the system gets more
complicated we can start to think about micro-
services.
We can start to play with other AWS services (E.g.
SQS + Lambda for background task processing).
88
91. šøCost
@loige
Cost estimates are always a bit of a "gamble"...
I selected some arbitrary instance sizes (EC2, RDS, ElastiCache).
I am not accounting for auto-scaling.
I am not accounting for network traļ¬c.
Better to look at cost in production and try to optimise when
needed.
Rule of thumb: try to balance cost with your revenue.
Rule of thumb (2): consider the !
total cost of ownership
91
92. ā Create an AWS Account
ā Select a tool for IaaC
ā Create and conļ¬gure a VPC in a region (3 AZs, Public /
Private subnets)
ā Create an S3 bucket
ā Update the old codebase to save every new ļ¬le to S3
ā Copy all the existing ļ¬les to S3
ā Spin up the database in RDS (Multi-AZ)
ā Migrate the data using Database Migration Service
ā Provision the ElastiCache Redis Cluster (Multi-AZ)
āBonus: a TODO list for the migration
@loige
ā Create an AMI for the application
ā Create a security groups and an IAM policy for EC2
ā Make the application stateless
ā Create an health check endpoint
ā Create an autoscaling group to spin up the instances
ā Create a certiļ¬cate in ACM
ā Provision an Application Load Balancer (public subnets)
ā Conļ¬gure Https, Targets and Health Checks
ā Conļ¬gure Route53
ā Traļ¬c switch-over through DNS š¤
š Great guide to cloud migrations: 6 strategies for migrating applications to the cloud 92
93. The cloud is a journey
not a destination
The cloud is a journey
not a destination
@loige
93