This document discusses the process of refactoring a monolithic backup and recovery application into microservices using AWS Step Functions and Lambda functions. Key points:
1. The application was originally a monolithic SaaS application for backing up and recovering cloud infrastructure.
2. Recovery procedures were long-running and needed to be decomposed and orchestrated in a maintainable way.
3. AWS Step Functions was chosen as it allows modeling workflows as state machines using tasks, choices, and waits.
4. Lambda functions were used to implement tasks, with business logic separated into layers accessing data and AWS services.
5. The recovery workflow for an RDS instance is shown as an example, using
SQL Database Design For Developers at php[tek] 2024
Decompose Monolith into AWS Step Functions Serverless Workflow
1. Eric Villa
DevOps Engineer at beSharp
Decompose the monolith
into AWS Step Functions
Serverless On Stage /
Serverless User Group
April 23th, 2019
2. where the journey began
Noovolari Smart Backup
Noovolari Smart Backup is a SaaS
application designed to backup and recover
your cloud infrastructure through an easy
to use interface.
Key Features:
• Backup of EC2 instances, RDS instances
and RDS Aurora clusters;
• One Click Recovery of EC2 instances,
RDS instances and RDS Aurora clusters;
• File Level Recovery.
First of all, we focused on refactoring One Click Recovery procedures!
3. Problem: orchestrate each phase of
these long running procedures in a
smart and maintainable way, taking into
account execution state, retries and
rollbacks.
where the journey began
Noovolari Smart Backup
but we could
keep it ins…
Shut up and look
for an orchestration
tool!!!
4. Who comes to the rescue?
AWS Step Functions
It lets you model complex and long running workflows as state machines.
5. How does it work?
AWS Step Function are designed using a JSON-based specification language, called
Amazon States Language, that allows you to specify States, Input and Output processing,
Error Handling.
7 types of States available: Task, Wait, Choice, Pass, Succeed, Fail, Parallel.
"TaskState": {
"Type": "Task",
"Resource": <RESOURCE-ARN>,
"Next": "NextState",
"Retry": [
{
"ErrorEquals": [ "ErrorA", "ErrorB" ],
"IntervalSeconds": 1,
"BackoffRate": 2.0,
"MaxAttempts": 2
},
{
"ErrorEquals": [ "ErrorC" ],
"IntervalSeconds": 5
}
],
"Catch": [
{
"ErrorEquals": [ "ErrorA", "ErrorB", "ErrorC" ],
"Next": "RecoveryState"
},
{
"ErrorEquals": [ "States.ALL" ],
"Next": "TerminateMachine"
}
]
}
Input and Output processing Error Handling
State Input and State Output are JSON files.
6. Task state
A Task state performs work by using an activity or an AWS Lambda function, or by passing
parameters to the API actions of other services.
• fine grained services, relatively simple in complexity;
• high scalability;
• simplified infrastructure management.
We needed something with the following characteristics:
That’s why we opted for AWS Lambda function tasks.
7. λ function task architecture
Logic layer: it encapsulates
step function's execution state
management and Recovery
business logic.
AWS services access layer: it
encapsulates
AWS SDK for Ruby wrappers,
which provide high-level functions
suitable for Noovolari Smart
Backup needs.
Data access layer: it encapsulates
object-relational mapping,
provided by Ruby’s ActiveRecord
library.
MySQL
Data access
layer
AWS
services
access layer
Logic layer
8. Access layers
MySQL
Data access
layer
AWS
services
access layer
Logic layer
Instead of duplicating access layers’
code, we decided to create two
Ruby gems (libraries), suitable by
each logic layer.
Each gem is hosted and versioned
on a CodeCommit Git repository.
EC2 instance RDS instance RDS Aurora cluster
recovery logics
AWS services
access gem
Data
access gem
9. Logic Layer - Step 1
MySQL
Data access
layer
AWS
services
access layer
Logic layer
Extraction of atomic services from the
monolithic recovery procedures.
These atomic services are best suited to
be mapped to AWS Step Functions’ Tasks.
monolithic recovery
procedure
+
+
+
atomic tasks
10. Logic Layer - Step 2
MySQL
Data access
layer
AWS
services
access layer
Logic layer
Design of a Ruby object that represents the
Step Function’s execution state.
Step Function’s execution state:
• Input
Step Function’s input.
• Context
Data suitable during all the execution,
derived from the input and gathered
from MySQL and MongoDB data
sources.
• State Outputs
They represent the output of lambda
function tasks that compose the
recovery procedure.
• Counters
They represents the # of iterations
reached by a given Waiter (we’ll see it
later).
11. Logic Layer - Step 3
MySQL
Data access
layer
AWS
services
access layer
Logic layer
Design of a Ruby abstract object that
represents a Task, providing a template that
allows the invocation of a specific service and
the update of the execution state.
Template Method Design Pattern
run() defines the interaction between
invoke_service() and update_state()
methods.
invoke_service() and update_state()
should be implemented for each Task that take
part of the recovery procedure.
Task
+invoke_service()
+update_state()
+run()
ConcreteTask1
+invoke_service()
+update_state()
12. λ function handler
The handler represents the entry point for the Lambda.
Our handler is responsible to:
1. deserialize step function execution’s state object;
2. instantiate and invoke the right Task object (factory);
3. serialize Task object response and generate λ function output.
to summarize…
"RenameDbInstance": {
"Type": "Task",
"Resource": <LAMBDA_FUNCTION_ARN>,
"Next": “ConfigureRenameDbInstanceCounter",
"Parameters": {
"action": "rename_db_instance",
"data.$": "$.data"
}
}
λ function
handler
Task sw
object
invoke_service()
update_state()
this parameter is necessary to
determine the Task object that
has to be invoked
13. Orchestration
the music we want the orchestra to play
rename RDS instance
restore RDS instance from snapshot
START
stop original RDS instance
END
14. Orchestration
the music we want the orchestra to play
rename RDS instance
restore RDS instance from snapshot
START
stop original RDS instance
END
let’s focus on this phase!
16. Orchestration
RenameDbInstance task
"RenameDbInstance": {
"Type": "Task",
"Resource": "arn:aws:lambda:eu-west-1:111111111111:function:bernie-dev-rdsRecovery",
"Next": “ConfigureRenameDbInstanceCounter",
"Parameters": {
"action": "rename_db_instance",
"data.$": "$.data"
},
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "CloseSession",
"ResultPath": "$.error-info"
}]
}
λ function’s input is specified by Parameters.
Catch, together with States.ALL wildcard, allows you to catch any kind of
error and route the State Machine to rollback.
ConfigureRenameDbInstanceCounter, WasDbInstanceRenamed and
IncrementRenameDbInstanceCounter Task states are implemented in the same way.
17. Orchestration
CheckWasDbInstanceRenamed Choice
"CheckWasDbInstanceRenamed": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.data.state_outputs[0].output",
"BooleanEquals": true,
"Next": "RetrieveDbSnapshotIdentifier"
},
{
"Variable": "$.data.state_outputs[0].output",
"BooleanEquals": false,
"Next": “IncrementRenameDbInstanceCounter"
}
]
}
$.data is the path to the Step Function’s execution state.
Last Task’s output corresponds to the first element of the state_outputs array.
$.data.state_outputs[0].output
BooleanEquals comparison operator is used to route the step function to the next step
given the last boolean output.
19. Deploy
what are we going to deploy?
EC2 instance Recovery
Step Function
λ function
RDS instance Recovery RDS Aurora cluster Recovery
AWS Services Access and Data Access λ layer
invokes
pulls in
Step Function
λ function
Step Function
λ function
20. Deploy
what is a λ layer?
It is a .zip file that contains dependencies which you λ function relies on.
It lets you keep your deployment package small since you only have to
reference dependencies.
λ layers are extracted to the /opt directory in the function execution environment.
Each runtime looks for dependencies in a different path under /opt.
Ruby 2.5.0 runtime looks for dependencies in /opt/ruby/gems/2.5.0