Building Cloud Tools for Netflix with Jenkins

Building and Deploying Netﬂix in the Cloud

@bmoyles @garethbowles #netﬂixcloud

Who Are These Guys?
Brian Moyles Gareth Bowles

What We Build

Large number of loosely-coupled Java Web Services
Common code in libraries that can be shared across
apps
Each service is “baked” - installed onto a base Amazon
Machine Image and then created as a new AMI ...
... and then deployed into a Service Cluster (a set of
Auto Scaling Groups running a particular service)

Build Pipeline
Artifactory yum

libraries
Jenkins
CBF steps
resolve compile publish report

sync check build test

source

Perforce
GitHub

build.xml
<project name="helloworld">
<import file="../../../Tools/build/webapplication.xml"/>
</project>

ivy.xml
<info organisation="netflix" module="helloworld">
<publications>
<artifact name="helloworld" type="package"
e:classifier="package" ext="tgz"/>
<artifact name="helloworld" type="javadoc"
e:classifier="javadoc" ext="jar"/>
</publications>
<dependencies>
<dependency org="netflix" name="resourceregistry"
rev="latest.${input.status}" conf="compile"/>
<dependency org="netflix" name="platform"
rev="latest.${input.status}" conf="compile" />
...

Jenkins Statistics
1600 job deﬁnitions, 50% SCM triggered
2000 builds per day
Common Build Framework updates trigger 800 rebuilds;
by scaling up to 20 cloud slaves we can complete the
ﬂood of new builds in 30 minutes
2TB of build data

Jenkins Architecture
x86_64 slave 11
x86_64 slave 1
x86_64 slave
buildnode01 1
x86_64 slave
Standard
buildnode01 custom slaves
buildnode01
buildnode01 custom slaves
custom slaves
slave group misc. architecture
custom slaves
misc. architecture
misc. architecture
custom slaves
Amazon Linux Single Master misc. architecture
m1.xlarge misc. architecture
Ad-hoc slaves
Red Hat Linux
2x quad core x86_64 misc. O/S & architectures
26G RAM

x86_64 slave 11
x86_64Custom
x86_64slave 1
slave
buildnode01
~40 custom slaves
buildnode01
slave group
buildnode01 maintained by product
Amazon Linux teams
various

us-west-1 VPC Netflix data center Netflix data center and
office

Other Uses of Jenkins
Monitoring of our test and production Cassandra clusters
Automated integration tests, including bake and deploy
Production bake and deployment
Housekeeping of the build / deploy infrastructure:
Reap unreferenced artifacts in Artifactory
Disable Jenkins jobs with no recent successful builds
Mark Jenkins builds as permanent if they are used by
an active deployment in prod or test
Alert owners when slaves get disconnected

Jenkins Scaling Challenges
Flood of simultaneous builds can quickly exhaust all build
executors and clog the pipeline
Flood of simultaneous builds can hammer rest of the
infrastructure (especially Artifactory)
Making global changes to all jobs
Some plugins don’t scale to our number of jobs / builds
Hard to test every job before upgrading master or plugins
Large amount of state encapsulated in build data makes
restoring from backup time consuming

Netﬂix Extensions to Jenkins

Job DSL plugin: allow jobs to be set up with minimal
deﬁnition, using templates and a Groovy-based DSL.
Housekeeping and maintenance processes
implemented as Jenkins jobs, system Groovy scripts

The
DynaSlave
Plugin
Our cloud-based
army of build nodes

The DynaSlave Plugin
Genesis
Original build ﬂeet: 15 VMs on datacenter hardware, 8G
RAM, single vCPU, 2 executors per node
Many jobs build on SCM change. Changes to our
common build framework create massive thundering
herd since everything depends on it.
Ask for more VMs? Modify CBF less frequently?

What We Wanted

Leverage our extensive AWS infrastructure, tooling, and
experience
No manual ﬁddling with machines once they launch
Quick and easy to maintain a ﬁxed pool of slave nodes
that can grow/shrink to meet build demand

What We Have
Exposes a new endpoint in Jenkins that EC2 instances
in VPC use for registration
Allows a slave to name itself, label itself, tell Jenkins
how many executors it can support
EC2 == Ephemeral. Disconnected nodes that are gone
for > 30 mins are reaped
Sizing handled by EC2 ASGs, tweaks passed through
via user data (labels, names, etc)

What’s Next
Dynamic resource management: have Jenkins respond
to build demand and manage its own slave pools
Slave groups: Allows us to create specialized (and
isolated from the genpop) pools of build nodes
Refresh mechanism for slave tools (JDKs, Ant versions,
etc)
Enhanced security/registration of nodes
Give it back to the community (watch
techblog.netﬂix.com!)

Further Reading

http://techblog.netflix.com
http://www.slideshare.net/adrianco
@netflixoss
https://github.com/netflix

http://jobs.netflix.com

Thank you

@bmoyles @garethbowles

Thank you
Questions?

@bmoyles @garethbowles

Building Cloud Tools for Netflix with Jenkins

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Building Cloud Tools for Netflix with Jenkins

Similar to Building Cloud Tools for Netflix with Jenkins (20)

Recently uploaded

Recently uploaded (20)

Building Cloud Tools for Netflix with Jenkins

Editor's Notes