AWS GitLab runner scaling with Terraform

AWS Techniques and
lessons writing a minimal
cost gitlab runner
February 2023

● Principal Engineer for Digio
● Focus on platform engineering
● Background in development
● 10 years AWS experience
● Worked ~2 years each in Azure
and GCP
● 12 years Infrastructure as Code
● Passion for automating things
● 4 years Terraform experience
● Terraform associate certified
● Previous AWS associate
certified but now I’m lazy

Overview of
Digio & Mantel
Group

Digio and Mantel Group
Melbourne
Sydney
Brisbane
Auckland
Queenstown
Magnetic Island
Perth
Adelaide
We’re an Australian-owned, Principle based technology-
led consulting business founded in Melbourne.
Digio is Australia’s Premier Digital Services provider from concept to
production, continually evolving alongside technologies and method.
We are a dynamic business established in November 2017 and have
grown to a team of over 200 across Australia and New Zealand.
We are part of the broader Mantel Group currently comprised of 9
technology brands and a total team size of over 800. As a group we
have been recognised in the AFR’s 2020 fastest growing companies,
achieved #1 Best Place to Work for 2021 and 2022 in the Great Place
to Work Survey and awarded AWS 2022 Services Partner and
Migration Partner of the year.
Hobart

Mantel Group Brands
Working with Mantel Group not only enables access to expertise within Digio, but across all current and future brands.
A broad end-to-end capability that is vendor agnostic, yet has deep specialisations…
Software
Engineering (API)
Software
Engineering (QA)
Platform
Enablement
Software
Engineering (.NET)
Security & Identity
Managed Services
Data & Analytics
Data Strategy
Analytics & BI
Advanced Analytics
Platform Agnostic
Data Engineering
Technology
Strategy & Advisory
Software
Engineering (Web)
Application
Modernisation
Capabilities
Capabilities Capabilities
Cloud Native
Migration
Security
Data & Analytics
Managed Services
Digital Workplace
Capabilities
Automation &
DevOps
Cloud Computing
Analytics &
Machine Learning
Security & Identity
MarTech
Collaboration &
Productivity
Capabilities
Training &
Certification
Application
Transformation
Capabilities
Pursuit Model
Discovery Sprints
Rapid Prototyping
Service Design
ML Engineering
UX/UI Design
Software
Engineering (Mobile)
Capabilities
Platform
Enablement
Data Engineering
Data Architecture
Training &
Certification
Capabilities
Native Mobile
Technology
Strategy
Native Mobile
Product / Design
Strategy
Software
Engineering
(Android)
Software
Engineering (iOS)
Delivery & Method
Advanced Analytics
Capabilities
Data Engineering
Data Architecture
Data Strategy
Analytics & BI
Coaching & Training

https://github.com/cmdlabs/terraform-aws-gitlab-runner-scale
https://registry.terraform.io/modules/cmdlabs/gitlab-runner-scale/aws

Function URL vs EventBridge with polling
The webhook is:
● Faster to respond to events as
it runs ~instantly
● Zero AWS cost to enable
● Cheaper if the repository /
runner activity is low
● Could be abused via third
parties executing the function
without security permissions.
EventBridge is:
● More predictable in terms of
AWS spend
● 14 millions free invocations
● Slow to respond
● Lower cost if the GitLab
project activity is high

● Make use of:
○ CloudWatch metrics and CloudWatch alarms
○ Triggers on auto-scale group
○ Scale policies to determine how many instances to scale
Scaling Out

● Requires multiple inputs and considerations
○ Avoid churn of runners
● Scale down based on load (number of active runners and jobs in the queue)
● Make use of a premature transition to states (see Avoiding premature
transitions to alarm state)
○ AWS alarms include logic to try to avoid false alarms
○ CloudWatch waits the full N periods before alarming
○ Any time metric above the threshold the alarm "timer" is effectively reset.
● The tradeoff longer idle time with additional cost
Scaling In

Cost Estimation
Lambda
● Running the lambda via (In the ap-southeast-2
region):
○ x86 architecture
○ 1 request per minute
○ 2000ms duration
○ 128mb memory allocated
○ 512mb ephemeral storage (default)
● Free tier cost $0.00 a month.
● Without the free tier $0.19 USD (43,800
invocations)
Runner (EC2)
● t3.medium spot instance(s) 5 hours over the
month at the average price of $0.0158 is
$0.079 a month
● A t3.medium on demand instance(s) 5 hours
over the month at the average price of
$0.0528 is $0.264 a month

● Trade off speed to respond
due to runner startup
● Likely not ideal for high
activity pipelines
● Small pipelines that trigger
after hours
Cost Estimation vs Docker machine
● Install and register GitLab
Runner for autoscaling with
Docker Machine
○ ~$10 a month for a pilot instance
running 24/7
● Patching and maintenance
● Verification
● Troubleshooting
● Internally we had issues with
SSH access
● Overall cost becomes a lot
higher
●Nice to just have it work

● Diagrams and pictures
● Working examples
● Example why, not what

Auto generated Terraform docs - https://terraform-docs.io/

● Variable validation
● Ensure we pass in valid data
● Can never be sure what users will pass in

● Sort attributes alphabetically
○ Reduces cognitive load
● Order resources logically
○ If the same resource, alphabetically

● Multiple tf files
● Split via high level resource type
● Reduces cognitive load
● Reduces visual complexity

●Reduce
duplication with
locals
●Move complex
operations into
locals
●Magic strings

● Infer data where possible
● Reduces input requirements
● Reduces possible mistakes
○ VPC and subnets not aligned

AWS GitLab runner scaling with Terraform

Recommended

Recommended

More Related Content

Similar to AWS GitLab runner scaling with Terraform

Similar to AWS GitLab runner scaling with Terraform (20)

Recently uploaded

Recently uploaded (20)

AWS GitLab runner scaling with Terraform

Editor's Notes