Building a Distributed
Build System
at Scale
Aysylu Greenberg
September 17, 2016
Aysylu Greenberg
@aysylu22
Building
Distributed Build System
at Google Scale
Building
Distributed Build System
at Google Scale
What does a build system do?
Photo by smemon / CC BY
Build Scenario
Project with dependencies
Makefile
main.o : main.c defs.h
cc -c main.c
main.c :
curl –O http://remote/main.c
Build Scenario
Project with dependencies
Find dependencies
Makefile
main.o : main.c defs.h
cc -c main.c
main.c:
curl –O http://remote/main.c
Build Scenario
Project with dependencies
Find dependencies
Build project with the dependencies
Download build artifacts
Test Scenario
Project with dependencies
Find dependencies
Build project with the dependencies
Download build artifacts Run the test
Get the results of the test
Building
Distributed Build System
at Google Scale
Building
Distributed Build System
at Google Scale
Scale
• Engineers: >30,000 developers in 40+ offices
Scale
• Engineers: >30,000 developers in 40+ offices
• Commits: 15K by humans + 30K by robots/day
Scale
• Engineers: >30,000 developers in 40+ offices
• Commits: 15K by humans + 30K by robots/day
• Source code: 2 billion LOC
Scale
• Engineers: >30,000 developers in 40+ offices
• Commits: 15K by humans + 30K by robots/day
• Source code: 2 billion LOC
• Builds and tests: 5M per day through BuildRabbit
Scale
• Engineers: >30,000 developers in 40+ offices
• Commits: 15K by humans + 30K by robots/day
• Source code: 2 billion LOC
• Builds and tests: 5M per day through BuildRabbit
• Petabytes of output artifacts
Scale
• Engineers: >30,000 developers in 40+ offices
• Commits: 15K by humans + 30K by robots/day
• Source code: 2 billion LOC
• Builds and tests: 5M per day through BuildRabbit
• Petabytes of output artifacts
• 1 repository
Working in One Repository
Photo by flyingblogspot / CC BY SA
Repository of Artifacts vs Building from Source
Photo by flyingblogspot / CC BY SA
Working in One Repository
• Linear revision history
Working in One Repository
• Linear revision history
• Extensive code sharing & reuse
• Everything can be cross-referenced
• Large-scale refactoring for codebase modernization
• Easier collaboration across teams
Working in One Repository
• Linear revision history
• Extensive code sharing & reuse
• Simplified dependency management
• No confusion about authoritative version of the file
• No forking of shared libraries
• Components for library releases = Git subtree or Git
subcomponents to separate release from WIP versions
Working in One Repository
• Linear revision history
• Extensive code sharing & reuse
• Simplified dependency management
• Predictable, repeatable builds from source
Working in One Repository
• Linear revision history
• Extensive code sharing & reuse
• Simplified dependency management
• Predictable, repeatable builds from source
• Optimizations to avoid compiling same artifacts
Working in One Repository
• Linear revision history
• Extensive code sharing & reuse
• Simplified dependency management
• Predictable, repeatable builds from source
• Optimizations to avoid compiling same artifacts
• Decouple each team's processes as much as
possible
Building
Distributed Build System
at Google Scale
Building
Distributed Build System
at Google Scale
Build System Story
Photo by imanka / CC BY
Towards Distributed Build System
Towards Distributed Build System
Towards Distributed Build System
Towards Distributed Build System
Towards Distributed Build System
Towards Distributed Build System
BuildRabbit:
Distributed Build System
BuildRabbit: Distributed Build System
IN THE CLOUD
What does BuildRabbit do?
Continuous
integration
system
BuildRabbit
Source
System
Google
engineers &
teams
Release
infrastructure
Integration
testing
infrastructure
Build
artifact
storage
Blaze
Building
Distributed Build System
at Google Scale
Building
Distributed Build System
at Google Scale
From Push to Pull: Client / Server
Client for
User
BuildRabbit
Scheduler
BuildRabbit
Worker
From Push to Pull: Build Service
User
Persistent
Queue
BuildRabbit
Worker
Build Artifacts
Build Progress
Info
event
RPC
stream
From Push to Pull: Build Service
User
Persistent
Queue
BuildRabbit
Worker
Build Artifacts
Build Progress
Info
event
RPC
stream
Client
for
User
BuildRabbit
Scheduler
BuildRabbit
Worker
Client
for
User
BuildRabbit
Scheduler
BuildRabbit
Worker
User
Persistent
Queue
BuildRabbit
Worker
Build Progress
InfoBuild Artifacts
Client
for
User
BuildRabbit
Scheduler
BuildRabbit
Worker
User
Persistent
Queue
BuildRabbit
Worker
Build Progress
InfoBuild Artifacts
RESILIENCE
LOGIC
RESILIENCE LOGIC
Continuous
integration
system
BuildRabbit
Source
System
Google
engineers &
teams
Release
infrastructure
Integration
testing
infrastructure
Build
artifact
storage
Blaze
Migration with Zero Downtime
Photo by moonjazz / CC BY SA
MIGRATE BACKENDS FIRST
Client
for
User
BuildRabbit
Scheduler
BuildRabbit
Worker
User
Persistent
Queue
BuildRabbit
Worker
Build Progress
InfoBuild Artifacts
FOCUS ON MIXED MODE
MIGRATE BACKENDS FIRST
TARGET LAUNCH-FRIENDLY CLIENTS
FOCUS ON MIXED MODE
MIGRATE BACKENDS FIRST
Client
for
User
BuildRabbit
Scheduler
BuildRabbit
Worker
User
Persistent
Queue
BuildRabbit
Worker
Build Progress
InfoBuild Artifacts
DECOUPLE LAUNCH OF SERVICES
TARGET LAUNCH-FRIENDLY CLIENTS
FOCUS ON MIXED MODE
MIGRATE BACKENDS FIRST
Client
for
User
BuildRabbit
Scheduler
BuildRabbit
Worker
User
Persistent
Queue
BuildRabbit
Worker
Build Progress
InfoBuild Artifacts
MAXIMUM VISIBILITY INTO SYSTEM STATE
DECOUPLE LAUNCH OF SERVICES
TARGET LAUNCH-FRIENDLY CLIENTS
FOCUS ON MIXED MODE
MIGRATE BACKENDS FIRST
PRACTICE THE LAUNCH
MAXIMUM VISIBILITY INTO SYSTEM STATE
DECOUPLE LAUNCH OF SERVICES
TARGET LAUNCH-FRIENDLY CLIENTS
FOCUS ON MIXED MODE
MIGRATE BACKENDS FIRST
PRACTICE THE LAUNCH, A LOT
MAXIMUM VISIBILITY INTO SYSTEM STATE
DECOUPLE LAUNCH OF SERVICES
TARGET LAUNCH-FRIENDLY CLIENTS
FOCUS ON MIXED MODE
MIGRATE BACKENDS FIRST
HAVE A SOLID ROLLBACK PLAN
PRACTICE THE LAUNCH, A LOT
MAXIMUM VISIBILITY INTO SYSTEM STATE
DECOUPLE LAUNCH OF SERVICES
TARGET LAUNCH-FRIENDLY CLIENTS
FOCUS ON MIXED MODE
Distributed System Design Patterns
Photo by oslointhesummertime / CC BY
Distributed vs Centralized Control Plane
Photo by oslointhesummertime / CC BY
Distributed Design Patterns: Distributed Control
Design implementation is scaled from 1 machine
Bigger overhead during evolution
Distributed Design Patterns: Centralized Control
Distributed design from the start
Bigger overhead during design phase
Building a Distributed
Build System
at Scale
Aysylu Greenberg
@aysylu22

Building A Distributed Build System at Google Scale (StrangeLoop 2016)

Editor's Notes

  • #9  Package and deploy if necessary
  • #15  1/2 of the codebase changes daily
  • #19  Rachel Potvin’s talk @Scale
  • #21  1 source of truth, Atomic changes
  • #22  Increased visibility, flexible team boundaries
  • #23  No painful cross-repository merging of copied code
  • #42  utilization + routing, client has to have retry logic, 1 connection to get progress info and outputs, if connection broken, retry
  • #43  API built to last, using old facade at first; Handle retries for the user; Smarter scheduling + incrementality + utilization
  • #44  API built to last, using old facade at first; Handle retries for the user; Smarter scheduling + incrementality + utilization
  • #47  2nd: runs the same version; CENTRALIZED control over build system components