Scaling a conventional CI
infrastructure in the public
cloud
Mikhail Advani
Who am I
● DevOps Lead Architect at EdCast
● IaC practitioner
● Security enthusiast
Historical architecture
● Single VM jenkins master in AWS
● Multiple slaves which due to lack of
maintenance were terminated - but
not cleaned up
Problems with historical architecture
● Resource starvation when multiple builds run in parallel
● Continuously running build agents even when jobs weren’t running
● UI driven job definitions without change tracking
● Risky upgrades
● Single point of failure
● Manual maintenance like disk cleanups
Solution principles
● Immutable infrastructure
● Self healing
● On demand scaling up & down
Technology stack
● AWS
● Terraform for AWS automation
● Docker for workspace tooling configuration
● Kubernetes with helm for container orchestration
● Jenkins
Architecture: Node distribution
Architecture: Isolation
Solution: Job definition & templatization
● Put job definitions in version control
● Define once & re-use multiple times
● Centralised control for job definitions
Solution: Master & slave management
● Helm chart to deploy master
● Multi-container pod templates as per build requirements
● Implicit slave sidecar responsible for master-slave communication
Solution: Master + plugin configuration
● Plugin installation & upgrade management
Solution: Credentials provisioning
● Kubernetes secrets
● Git-crypt
Solution: Artifact storage
● S3 backed artifact repository
Scaling to multiple masters
● Team specific jenkins masters
● One Kubernetes namespace per master
● Shared node-pools
Gotchas
● Exposing docker in docker
● Multi-AZ ASGs & cluster-autoscaler
Peripheral services
● Cluster autoscaling
● Traffic management
● DNS management
● TLS certificate management
● Monitoring
Thank you
https://linkedin.com/in/mikhailadvani
https://github.com/mikhailadvani

Scaling a conventional CI infrastructure in the public cloud

Editor's Notes