Journey towards serverless infrastructure

From the trenches of opintopolku.fi
Journey towards serverless infrastructure

Briefly about me
• Project management and ops-oriented software
engineering at Gofore since 2013
• Amazon Web Services since 2010
• Cloud, server automation, CI/CD build pipelines
• Favorite tool is Ansible
LEAD CONSULTANT, CLOUD SERVICES
Ville Seppänen @Vilsepi

Opintopolku.fi / Studyinfo.fi
• Applicants search and apply to further education (after basic
comprehensive school)
• Education providers manage applications and promote their offering
• …and much more, with ~30 integrations to e.g. Kela
Why AWS?
• Need for several, short-term testing environments
• Need for faster software development
• Need for lower infrastructure and operating costs

Big bang problem
• 50 services with plenty of
dependencies and low-latency
calls between them
• Big bang migration 

Big bang problem
• Applications open all year
round, student results are
calculated and institutions plan
future applications
• Never a good moment 

Big bang problem
• Long leap from traditional
infrastructure to ideal cloud-
native one
• Double operating costs until
migration is done
• As time passes, the
environments deviate
• We need to hurry 

Big bang solution
• Cloud-native core
infrastructure that lasts time
• Lift & shift satellite services
that might be obsoleted
• “Let’s get there first, and fix it
later”
ELB
ALB
WordPress
RDS Postgres
S3 buckets
ElastiCache Redis
Grafana
Prometheus
CloudWatch
MongoDB
Shibboleth
ECS cluster
Nginx

Containerize
• Create base image: A base image with Packer based on Alpine Linux
• Build containers: Create a parallel Bamboo build pipeline that builds
service containers and pushes them to ECR
• Refactor services: Local disk usage to S3, caching, logging, port usage,
memory limits
• Iterate memory limits: “Hello OutOfMemory, my old friend”
All this while being backwards-compatible
with the old environments

Infrastructure as code
• CloudFormation templates generated with Troposphere
• Troposphere allows for-loops, complex logic, “compile-time” checks
• Same templates for all environments, parametrized
• Conditionals allow skipping parts of infrastructure
Code
Global params
Templates
Env params
Stack

CloudFormation stack structure
• ECS & ALB produce lots of resources per service (listener rules, task
definition…), while a stack has a hard-limit of 200 resources
services
base
front
service
service
service
base
front
Stack per infra
component type
End-to-end
stack per service
Hybrid

Knee-deep in nested stacks
• Nesting = CloudFormation stack has substacks as resources
• “Nested stacks should be updated via parent”
• Obscure change sets: “All your substacks will change somehow”
• Pending stack updates/rollbacks block all operations to siblings
• Blast radius made us nervous

Replace nesting with tooling
./cloudformation.py pallero log create-change-set
ADD AWS::Logs::LogGroup CloudWatchLogGroupwordpress
MODIFY AWS::Lambda::Function LogexportLambdaFunction
MODIFY AWS::Events::Rule LogexportRule
MODIFY AWS::Lambda::Permission LogexportRulePermission
./cloudformation.py pallero log execute-change-set
Waiting for stack pallero-log...
CloudFormation complete. Some operations may still be
running, check AWS Console for more information.

Replace nesting with tooling
# Update all service stacks
./cloudformation.py hahtuva services create-change-set
# Update just one service stack
./cloudformation.py hahtuva services -s koodisto create-
change-set

7 envs, 50 services
• “What do we have running?”
• We are building a radiator that
shows how environments
deviate
• First tried storing metadata in
stack tags, but that caused every
resource to change, so we
moved it to SSM

CloudWatch Logs metric filters

Towards serverless
• Only infra-related Lambda functions for now: backups, log exports,
SES reputation, radiators, service discovery…
• Environment-specific Lambdas with the uniform CF process
• Account-level Lambdas with Serverless framework
• AWS SAM with recent additions might fit our CF tools more easily

Ways of scheduling
1. Pure Lambda: small simple operations (log export)
2. Lambda launches temporary instance: file operations (backups)
3. ECS Scheduled task: semi-complex business logic (e-mail)
4. ECS Service handles scheduling internally with DB locks: legacy
business logic, frequently running schedules

Fargate = Serverless containers
• No more EC2 Auto Scaling Groups and instances to manage in ECS/EKS
• All “server” configuration is done in the ECS Task Definition
• What about CloudWatch file agent? EC2 Reserved instances?
• ”Deploying Fargate services using CloudFormation - The guide I wish I
had” blog.devopspro.co.uk
• CloudFormation Templates for AWS Fargate deployments
GitHub nathanpeck/aws-cloudformation-fargate

It was worth it
• Most problems are auto-healed with container restarts
• Developers have better visibility on how their code is running
• A single person can spin up an environment in few hours
• LessOps: email, DNS, SSL certificates, databases as a service
In the future
• Get rid of all non-ASG EC2 instances, assess Fargate
• Split into further AWS accounts and ECS clusters

Big thanks to
• Finnish National Agency for Education
• Gofore
• Cybercom
• Reaktor
• CSC
• Siili
• Comiq
• Nixu

Questions?
• Design scalable architecture
• Avoid a big bang migration
• Containerize early
Ville Seppänen @Vilsepi

Journey towards serverless infrastructure

More Related Content

What's hot

Similar to Journey towards serverless infrastructure

More from Ville Seppänen

Recently uploaded

Journey towards serverless infrastructure