Find out how Netflix’s DevProd organization concentrates on the local development loop and ways to shrink the time spent purely waiting. Points of discussion will include:
From enabling to managing constant change
Keeping projects up-to-date with latest opinions and practices
Validating changes before shipping to consumers
Improving the build performance and consistency experience
Dependency resolution time reduction
Build & Artifact caching
Remote test execution
Improving self-serviceability
Enhanced build insights for a given build
Flaky test detection
What’s in the horizon
Predictive Test Selection
Container experience for Integration Testing
IDE experience Enhancements
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
DPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
1. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
A More Integrated
Build and CI to
Accelerate Builds at
Netflix
With Aubrey Chipman and Roberto Perez Alcolea
2. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
AUBREY CHIPMAN
Senior Software Engineer
JVM Ecosystem
ROBERTO PEREZ ALCOLEA
Senior Software Engineer
JVM Ecosystem
3. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Agenda
1. Introduction
2. Common problems on local and CI tool
experience
3. From enabling to managing constant
change
4. Improving the build performance and
consistency experience
5. Improving self-serviceability
6. What’s in the horizon
7. Q&A
4. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Introduction
1
5. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
The practice of continuous
integration
Notify failure or success
Continuous Integration
Server
Source Control Server
Fetch
changes
Test
Build
Succeed
/ fail
Commit changes
6. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Netflix JVM landscape
● ~3.2k repositories
● Microservices with fat clients
● Hundreds/Thousands of
Engineers
● ~191k JVM Builds per week
● ~63k artifacts (JARs) published
per week
7. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Netflix Jenkins landscape
● 35 Jenkins controllers
● ~45k job definitions
● ~600k builds per week
● 650-1500 agents
● 1-100 executors per agent
8. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Common
problems on
local and CI tool
experience
2
9. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Slow feedback loop
● Slow builds
○ Slow tests
○ Dependency downloading
● Flaky tests
● Compute Resources vary per
engineer
○ IDE
○ Local machine
○ CI machine
○ Distributed machine
○ Different architectures (Linux, OSX, Windows)
10. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Test times
Prior to rolling out Gradle Enterprise
& associated performance benefits,
we saw in a large project:
- 88% of all build time was spent in
tests
11. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Inconsistencies across
environments
● Missing tools
● Network Access
● Security Policies
● Docker Availability
● OS/Architecture related issues
12. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Constant change
● Keep projects up to date with
Platform recommended
practices
● Security vulnerabilities are out
there everyday
● How can we migrate a project
from A to B without so much
friction to the project owner?
13. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
From enabling
to managing
constant
change
3
14. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Build hygiene
15. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
In the past…
● Each project got its own
jenkins jobs for Gradle
and Dependency Updates
●
● ~6.5k Jenkins Jobs with
no centralized view
16. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
We introduced Rocket CI
● Rocket CI is a custom
Jenkins plugin and
Bitbucket plugin that
combines a suite of tools
for continuous integration
● Every JVM based project
at Netflix gets a set of
rocket enrollments
● Aspects of configuration
as code
17. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Rocket CI sends Pull Requests
18. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
User gets Pull Requests
19. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
As platform team, we have more
visibility
20. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Validating changes
even before making a
PR
21. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Validating changes pre-PR
22. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Improving the
build
performance and
consistency
experience
4
23. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Shared dependency
cache on CI agents
24. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Distributing dependency cache
25. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
CI build with shared
dependency cache
Local build with new
Gradle home
Cache usage examples
26. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Build & artifact
caching
27. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Varnish
Problems to solve:
- Lighten load on artifact
storage
- Caching 404s
Using this for…
- Fast dependency
cache!
- Build task output
cache!
28. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Varnish
Challenges:
- Cross-region
replication
- Cache eviction with
broadcasting PURGE
- Evicting many
build-specific cache
keys
29. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Artifact Cache Effectiveness
30. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Remote test execution
31. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Remote test execution
Takes your existing test suites
and distributes them across
remote agents to execute them
faster
The tests and their supporting
files are transferred to each
agent and executed, with their
logging and results streamed
back to the build in real time.
32. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Achieved benefits
● Reduce usage of local compute resources (local
development)
● Test behavior consistency across environments (local/CI)
● Faster tests execution by doing better parallelization using
available remote agents
33. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
One project before remote
test execution
34. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
One project with remote test
execution
35. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Improving
self-serviceability
5
36. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Enhanced build
insights
37. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Enhanced build insights
1
2
3
38. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Enhanced build insights
Easy access to build scans from Jenkins
39. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Enhanced build insights
Custom tags make for more filtering options
40. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Exported data & insights
Nebula is our wrapper over Gradle
41. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Exported data & insights
Where are builds running?
42. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Exported data & insights
IntelliJ version skew
43. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Flaky tests
44. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Flaky tests: what are they?
Test report view
45. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Flaky tests: aggregated view
Sunday - Saturday view: Oct 16-Oct 22, 2022
46. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Flaky tests: individual
Real projects
47. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Flaky tests: our asks
- We ask teams to take a look at their own flaky tests to
address these at their own cadence.
- Don’t just leave these flaky forever :)
48. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
What’s in the
horizon
6
49. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Predictive test
selection
50. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Predictive Test Selection
Increases developer
productivity by
automatically and
intelligently selecting and
executing the subset of
tests that are most
relevant to a code change,
providing faster feedback.
51. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
When do we use Predictive
Test Selection?
● Local Build that is not executing a publishing task
● CI builds that are Pull Requests
● CI where the branch is not the default/main branch and it is
not a release branch
52. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Simulations are promising
53. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
One project with predictive
test selection
54. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Container experience
for Integration Testing
55. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Container Experience for Integration Testing
● Explore AtomicJar Testcontainers cloud to reduce
operational complexity
○ Architecture differences between Local and CI
■ ARM vs x64
○ IPV6 support
○ Compute resources for Docker
■ Memory
■ Disk Space
○ Cleanup Docker containers and images
○ Remove VM based Remote Test Execution Agent Pools
■ Slower to scale up/down
■ No more Docker management
56. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
IDE enhancements
57. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
IDE Enhancements
- Shared indexes for internal projects
- Manage IDE versions in use
- Team plugins testing against latest IDE versions
58. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Key takeaways
- Avoid work at every level
- Read-only dependency caches
- Task caches
- Streamline work
- Test distribution
- Test selection
- There’s always low hanging fruit
- Local and CI experience should be consistent
59. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Come see us at these
other talks as well!
“How we Built a Distributed Testing
Platform” with Marc Philipps and Roberto
Perez Alcolea at 03:00 pm in Room Ride
Live DPE Showdown with Aubrey Chipman
at 04:00 pm in Room Ride
60. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Thank you!
61. A MORE INTEGRATED BUILD AND CI
TO ACCELERATE BUILDS AT NETFLIX
Q&A
7