From the trenches of opintopolku.fi
Journey towards serverless infrastructure
Briefly about me
• Project management and ops-oriented software
engineering at Gofore since 2013
• Amazon Web Services since 2010
• Cloud, server automation, CI/CD build pipelines
• Favorite tool is Ansible
LEAD CONSULTANT, CLOUD SERVICES
Ville Seppänen @Vilsepi
Opintopolku.fi / Studyinfo.fi
• Applicants search and apply to further education (after basic
comprehensive school)
• Education providers manage applications and promote their offering
• …and much more, with ~30 integrations to e.g. Kela
Why AWS?
• Need for several, short-term testing environments
• Need for faster software development
• Need for lower infrastructure and operating costs
Big bang problem
• 50 services with plenty of
dependencies and low-latency
calls between them
• Big bang migration 
Big bang problem
• Applications open all year
round, student results are
calculated and institutions plan
future applications
• Never a good moment 
Big bang problem
• Long leap from traditional
infrastructure to ideal cloud-
native one
• Double operating costs until
migration is done
• As time passes, the
environments deviate
• We need to hurry 
Big bang solution
• Cloud-native core
infrastructure that lasts time
• Lift & shift satellite services
that might be obsoleted
• “Let’s get there first, and fix it
later”
ELB
ALB
WordPress
RDS Postgres
S3 buckets
ElastiCache Redis
Grafana
Prometheus
CloudWatch
MongoDB
Shibboleth
ECS cluster
Nginx
Containerize
• Create base image: A base image with Packer based on Alpine Linux
• Build containers: Create a parallel Bamboo build pipeline that builds
service containers and pushes them to ECR
• Refactor services: Local disk usage to S3, caching, logging, port usage,
memory limits
• Iterate memory limits: “Hello OutOfMemory, my old friend”
All this while being backwards-compatible
with the old environments
Infrastructure as code
• CloudFormation templates generated with Troposphere
• Troposphere allows for-loops, complex logic, “compile-time” checks
• Same templates for all environments, parametrized
• Conditionals allow skipping parts of infrastructure
Code
Global params
Templates
Env params
Stack
CloudFormation stack structure
• ECS & ALB produce lots of resources per service (listener rules, task
definition…), while a stack has a hard-limit of 200 resources
services
base
front
service
service
service
base
front
Stack per infra
component type
End-to-end
stack per service
Hybrid
Stack overflow
Knee-deep in nested stacks
• Nesting = CloudFormation stack has substacks as resources
• “Nested stacks should be updated via parent”
• Obscure change sets: “All your substacks will change somehow”
• Pending stack updates/rollbacks block all operations to siblings
• Blast radius made us nervous
Replace nesting with tooling
./cloudformation.py pallero log create-change-set
ADD AWS::Logs::LogGroup CloudWatchLogGroupwordpress
MODIFY AWS::Lambda::Function LogexportLambdaFunction
MODIFY AWS::Events::Rule LogexportRule
MODIFY AWS::Lambda::Permission LogexportRulePermission
./cloudformation.py pallero log execute-change-set
Waiting for stack pallero-log...
CloudFormation complete. Some operations may still be
running, check AWS Console for more information.
Replace nesting with tooling
# Update all service stacks
./cloudformation.py hahtuva services create-change-set
# Update just one service stack
./cloudformation.py hahtuva services -s koodisto create-
change-set
7 envs, 50 services
• “What do we have running?”
• We are building a radiator that
shows how environments
deviate
• First tried storing metadata in
stack tags, but that caused every
resource to change, so we
moved it to SSM
Staying aware with Grafana
CloudWatch Logs metric filters
Towards serverless
• Only infra-related Lambda functions for now: backups, log exports,
SES reputation, radiators, service discovery…
• Environment-specific Lambdas with the uniform CF process
• Account-level Lambdas with Serverless framework
• AWS SAM with recent additions might fit our CF tools more easily
Ways of scheduling
1. Pure Lambda: small simple operations (log export)
2. Lambda launches temporary instance: file operations (backups)
3. ECS Scheduled task: semi-complex business logic (e-mail)
4. ECS Service handles scheduling internally with DB locks: legacy
business logic, frequently running schedules
Fargate = Serverless containers
• No more EC2 Auto Scaling Groups and instances to manage in ECS/EKS
• All “server” configuration is done in the ECS Task Definition
• What about CloudWatch file agent? EC2 Reserved instances?
• ”Deploying Fargate services using CloudFormation - The guide I wish I
had” blog.devopspro.co.uk
• CloudFormation Templates for AWS Fargate deployments
GitHub nathanpeck/aws-cloudformation-fargate
It was worth it
• Most problems are auto-healed with container restarts
• Developers have better visibility on how their code is running
• A single person can spin up an environment in few hours
• LessOps: email, DNS, SSL certificates, databases as a service
In the future
• Get rid of all non-ASG EC2 instances, assess Fargate
• Split into further AWS accounts and ECS clusters
Big thanks to
• Finnish National Agency for Education
• Gofore
• Cybercom
• Reaktor
• CSC
• Siili
• Comiq
• Nixu
Questions?
• Design scalable architecture
• Avoid a big bang migration
• Containerize early
Ville Seppänen @Vilsepi

Journey towards serverless infrastructure

  • 1.
    From the trenchesof opintopolku.fi Journey towards serverless infrastructure
  • 2.
    Briefly about me •Project management and ops-oriented software engineering at Gofore since 2013 • Amazon Web Services since 2010 • Cloud, server automation, CI/CD build pipelines • Favorite tool is Ansible LEAD CONSULTANT, CLOUD SERVICES Ville Seppänen @Vilsepi
  • 4.
    Opintopolku.fi / Studyinfo.fi •Applicants search and apply to further education (after basic comprehensive school) • Education providers manage applications and promote their offering • …and much more, with ~30 integrations to e.g. Kela Why AWS? • Need for several, short-term testing environments • Need for faster software development • Need for lower infrastructure and operating costs
  • 5.
    Big bang problem •50 services with plenty of dependencies and low-latency calls between them • Big bang migration 
  • 6.
    Big bang problem •Applications open all year round, student results are calculated and institutions plan future applications • Never a good moment 
  • 7.
    Big bang problem •Long leap from traditional infrastructure to ideal cloud- native one • Double operating costs until migration is done • As time passes, the environments deviate • We need to hurry 
  • 8.
    Big bang solution •Cloud-native core infrastructure that lasts time • Lift & shift satellite services that might be obsoleted • “Let’s get there first, and fix it later” ELB ALB WordPress RDS Postgres S3 buckets ElastiCache Redis Grafana Prometheus CloudWatch MongoDB Shibboleth ECS cluster Nginx
  • 9.
    Containerize • Create baseimage: A base image with Packer based on Alpine Linux • Build containers: Create a parallel Bamboo build pipeline that builds service containers and pushes them to ECR • Refactor services: Local disk usage to S3, caching, logging, port usage, memory limits • Iterate memory limits: “Hello OutOfMemory, my old friend” All this while being backwards-compatible with the old environments
  • 10.
    Infrastructure as code •CloudFormation templates generated with Troposphere • Troposphere allows for-loops, complex logic, “compile-time” checks • Same templates for all environments, parametrized • Conditionals allow skipping parts of infrastructure Code Global params Templates Env params Stack
  • 11.
    CloudFormation stack structure •ECS & ALB produce lots of resources per service (listener rules, task definition…), while a stack has a hard-limit of 200 resources services base front service service service base front Stack per infra component type End-to-end stack per service Hybrid
  • 12.
  • 13.
    Knee-deep in nestedstacks • Nesting = CloudFormation stack has substacks as resources • “Nested stacks should be updated via parent” • Obscure change sets: “All your substacks will change somehow” • Pending stack updates/rollbacks block all operations to siblings • Blast radius made us nervous
  • 14.
    Replace nesting withtooling ./cloudformation.py pallero log create-change-set ADD AWS::Logs::LogGroup CloudWatchLogGroupwordpress MODIFY AWS::Lambda::Function LogexportLambdaFunction MODIFY AWS::Events::Rule LogexportRule MODIFY AWS::Lambda::Permission LogexportRulePermission ./cloudformation.py pallero log execute-change-set Waiting for stack pallero-log... CloudFormation complete. Some operations may still be running, check AWS Console for more information.
  • 15.
    Replace nesting withtooling # Update all service stacks ./cloudformation.py hahtuva services create-change-set # Update just one service stack ./cloudformation.py hahtuva services -s koodisto create- change-set
  • 16.
    7 envs, 50services • “What do we have running?” • We are building a radiator that shows how environments deviate • First tried storing metadata in stack tags, but that caused every resource to change, so we moved it to SSM
  • 17.
  • 18.
  • 19.
    Towards serverless • Onlyinfra-related Lambda functions for now: backups, log exports, SES reputation, radiators, service discovery… • Environment-specific Lambdas with the uniform CF process • Account-level Lambdas with Serverless framework • AWS SAM with recent additions might fit our CF tools more easily
  • 20.
    Ways of scheduling 1.Pure Lambda: small simple operations (log export) 2. Lambda launches temporary instance: file operations (backups) 3. ECS Scheduled task: semi-complex business logic (e-mail) 4. ECS Service handles scheduling internally with DB locks: legacy business logic, frequently running schedules
  • 21.
    Fargate = Serverlesscontainers • No more EC2 Auto Scaling Groups and instances to manage in ECS/EKS • All “server” configuration is done in the ECS Task Definition • What about CloudWatch file agent? EC2 Reserved instances? • ”Deploying Fargate services using CloudFormation - The guide I wish I had” blog.devopspro.co.uk • CloudFormation Templates for AWS Fargate deployments GitHub nathanpeck/aws-cloudformation-fargate
  • 22.
    It was worthit • Most problems are auto-healed with container restarts • Developers have better visibility on how their code is running • A single person can spin up an environment in few hours • LessOps: email, DNS, SSL certificates, databases as a service In the future • Get rid of all non-ASG EC2 instances, assess Fargate • Split into further AWS accounts and ECS clusters
  • 23.
    Big thanks to •Finnish National Agency for Education • Gofore • Cybercom • Reaktor • CSC • Siili • Comiq • Nixu
  • 24.
    Questions? • Design scalablearchitecture • Avoid a big bang migration • Containerize early Ville Seppänen @Vilsepi