SlideShare a Scribd company logo
1 of 26
Download to read offline
Build Systems at Scale
Andrey Mokhov, Neil Mitchell,
Simon Peyton Jones, Simon Marlow
Systems Research Challenges Workshop, January 2017
Build systems
Build system
Dynamic
dependencies
Cloud
builds
Supports non-
determinism
Skips some
intermediates
Bazel (Google) ✓
Buck (Facebook) ✓
A new build system by
another big company
✓ ✓ ✓
Jenga (Jane Street) ✓ ✓
Shake (originally by
Standard Chartered)
✓ ✓
Make (Bell Labs, 1976) ✓
Meet Hadrian:
a new build
system for the
Glasgow Haskell
Compiler (GHC)
The GHC and the build system
Glasgow Haskell Compiler:
– 25 years old
– 100s of contributors
– 10K+ source files
– 1M+ lines of Haskell code
– 3 build stages
– 18 build ways
– 30 build programs: alex, ar,
gcc, ghc, ghc-pkg, happy, …
The current build system:
– Non-recursive Make
– Fourth major rewrite
– 200 makefiles
– 10K+ lines of code
– 3 build phases
– Highly user-customisable
– And it works! But…
The result of 25 years of development
$1/$2/build/%.$$($3_osuf) : $1/$4/%.hs $$(LAX_DEPS_FOLLOW) 
$$$$($1_$2_HC_DEP) $$($1_$2_PKGDATA_DEP)
$$(call cmd,$1_$2_HC) $$($1_$2_$3_ALL_HC_OPTS) -c $$< -o $$@ 
$$(if $$(findstring YES,$$($1_$2_DYNAMIC_TOO)), 
-dyno $$(addsuffix .$$(dyn_osuf),$$(basename $$@)))
$$(call ohi-sanity-check,$1,$2,$3,$1/$2/build/$$*)
Make uses a global namespace of mutable string variables
– Numbers, arrays, associative maps are encoded in strings
– No encapsulation and implementation hiding
– Variable references are spliced into Makefiles: avoid spaces/colons
– To expand a variable use $; to get $ use $$; to get $$ use $$$$…
There are other problems
1. A global namespace of mutable string variables
2. Dynamic dependencies
3. Build rules with multiple outputs
4. Concurrency reduction
5. Fine-grain dependencies
6. Computing command lines, essential complexity
Solution: use FP to design scalable abstractions
– To solve 1-5: we use Shake, a Haskell library for writing build systems
– To solve 6: we develop a small EDSL for building command lines
Accidental
complexity
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
Run Shake with
default options
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
A list of top-level
targets and build rules.
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
Build foo.o target
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
Pattern rule to build object
files *.o from C sources *.c
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
Action to build
the file obj
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
Compute the source file
name (replace extension)
(-<.>) :: FilePath -> String
-> FilePath
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
Declare a dependency
and build it if required
need :: [FilePath] -> Action ()
Shake
import Development.Shake
import System.FilePath
main :: IO ()
main = shake shakeOptions $ do
want ["foo.o"]
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
Build obj from src by
running gcc with
appropriate arguments
From Make to Shake
"*.o" %> obj -> do
let src = obj -<.> "c"
need [src]
run "gcc" ["-c", src, "-o", obj]
%.o : %.c
gcc -c $< -o $@
Make:
Shake:
Yes, Make works really great for such simple build systems!
Shake is more verbose in this case, but scales much better
Quick wins with Shake
Dynamic dependencies
Post-use & order-only dependencies
Polymorphic/fine-grain dependencies
Concurrency reduction
Tracking file contents
Avoiding external tools
…
Read the paper!
Mokhov, Mitchell, Peyton Jones, Marlow: “Non-recursive Make
Considered Harmful: Build Systems at Scale”, Haskell Symposium, 2016
Dynamic dependencies
Build target t:
– Lookup t‘s dependencies
{d1, …, dn} in the database
– If the lookup fails
or t doesn’t exist
or t has changed
or some dk is not up to date
then:
• Find the build rule matching t
• Run the action, recording need’s
• Update the database with newly
recorded dependencies
Problem 6: Computing command lines
Build command lines introduce a lot of complexity, e.g.:
Hard to extend: file-specific flags require WAY_p_file_HC_OPTS
Solution: develop an EDSL for computing command lines
WAY_p_HC_OPTS = -static –prof
...
base_stage1_HC_OPTS = -this-unit-id base
...
$1_$2_$3_HC_OPTS = $$(WAY_$3_HC_OPTS) 
$$($1_$2_HC_OPTS) 
...plus 30 more configuration patterns
Computing command line for a target
preludeTarget = Target
{ context = Context Stage1 base profiling
, builder = Ghc Stage1
, inputs = ["libraries/base/Prelude.hs"]
, outputs = ["build/stage1/libraries/base/Prelude.p_o"] }
Given preludeTarget how to compute the build command for it?
inplace/bin/ghc-stage1 -O2 -prof -c libraries/base/Prelude.hs
-o build/stage1/libraries/base/Prelude.p_o
commandLine :: Target -> Action [String]
Describing command lines compositionally
ghcArgs :: Args
ghcArgs = builder Ghc ? mconcat
[ arg "-O2"
, way profiling ? arg "-prof"
, arg "-c", arg =<< getInput
, arg "-o", arg =<< getOutput ]
args :: Args
args = mconcat [ ghcArgs, ..., userArgs ]
Results
Qualitative analysis:
– We studied 11 common use-cases of GHC build system, such as
“edit a source file and rebuild”, “add a new build command line
argument and rebuild”, “git branch and rebuild”, etc.
– The old build system performs a lot of unnecessary rebuilds in
many cases, whereas Hadrian correctly handles most cases.
Quantitative benchmarks: Hadrian is faster
– Zero build: 2.2s vs 2.0s (Linux), 12.3s vs 2.1s (Windows)
– Full build: 649s vs 578s (Linux), 1266s vs 737s (Windows)
What’s next?
Towards Cloud Build
Systems with Dynamic
Dependency Graphs
Challenges at large scale
Dynamic dependencies:
– Neil Mitchell’s Shake: http://shakebuild.com/
– Jane Street’s Jenga: https://github.com/janestreet/jenga
Large-scale build systems require sharing of build results:
– Facebook’s Buck: https://buckbuild.com/
– Google’s Bazel: https://www.bazel.io/
Can we have both? Not today. (Although there is an extension of nix
that can apparently do both. I haven’t checked that yet.)
Research opportunity?
Formalise the semantics of build systems:
– Dynamic dependencies
– Cloud builds
– Non-determinism
– Skipping intermediate build products
Applied for Royal Society Industry Fellowship:
https://blogs.ncl.ac.uk/andreymokhov/cloud-and-dynamic-builds/
Build systems-at-scale

More Related Content

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Featured

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming LanguageSimplilearn
 

Featured (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

Build systems-at-scale

  • 1. Build Systems at Scale Andrey Mokhov, Neil Mitchell, Simon Peyton Jones, Simon Marlow Systems Research Challenges Workshop, January 2017
  • 2. Build systems Build system Dynamic dependencies Cloud builds Supports non- determinism Skips some intermediates Bazel (Google) ✓ Buck (Facebook) ✓ A new build system by another big company ✓ ✓ ✓ Jenga (Jane Street) ✓ ✓ Shake (originally by Standard Chartered) ✓ ✓ Make (Bell Labs, 1976) ✓
  • 3. Meet Hadrian: a new build system for the Glasgow Haskell Compiler (GHC)
  • 4. The GHC and the build system Glasgow Haskell Compiler: – 25 years old – 100s of contributors – 10K+ source files – 1M+ lines of Haskell code – 3 build stages – 18 build ways – 30 build programs: alex, ar, gcc, ghc, ghc-pkg, happy, … The current build system: – Non-recursive Make – Fourth major rewrite – 200 makefiles – 10K+ lines of code – 3 build phases – Highly user-customisable – And it works! But…
  • 5. The result of 25 years of development $1/$2/build/%.$$($3_osuf) : $1/$4/%.hs $$(LAX_DEPS_FOLLOW) $$$$($1_$2_HC_DEP) $$($1_$2_PKGDATA_DEP) $$(call cmd,$1_$2_HC) $$($1_$2_$3_ALL_HC_OPTS) -c $$< -o $$@ $$(if $$(findstring YES,$$($1_$2_DYNAMIC_TOO)), -dyno $$(addsuffix .$$(dyn_osuf),$$(basename $$@))) $$(call ohi-sanity-check,$1,$2,$3,$1/$2/build/$$*) Make uses a global namespace of mutable string variables – Numbers, arrays, associative maps are encoded in strings – No encapsulation and implementation hiding – Variable references are spliced into Makefiles: avoid spaces/colons – To expand a variable use $; to get $ use $$; to get $$ use $$$$…
  • 6. There are other problems 1. A global namespace of mutable string variables 2. Dynamic dependencies 3. Build rules with multiple outputs 4. Concurrency reduction 5. Fine-grain dependencies 6. Computing command lines, essential complexity Solution: use FP to design scalable abstractions – To solve 1-5: we use Shake, a Haskell library for writing build systems – To solve 6: we develop a small EDSL for building command lines Accidental complexity
  • 7. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj]
  • 8. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Run Shake with default options
  • 9. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] A list of top-level targets and build rules.
  • 10. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Build foo.o target
  • 11. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Pattern rule to build object files *.o from C sources *.c
  • 12. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Action to build the file obj
  • 13. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Compute the source file name (replace extension) (-<.>) :: FilePath -> String -> FilePath
  • 14. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Declare a dependency and build it if required need :: [FilePath] -> Action ()
  • 15. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Build obj from src by running gcc with appropriate arguments
  • 16. From Make to Shake "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] %.o : %.c gcc -c $< -o $@ Make: Shake: Yes, Make works really great for such simple build systems! Shake is more verbose in this case, but scales much better
  • 17. Quick wins with Shake Dynamic dependencies Post-use & order-only dependencies Polymorphic/fine-grain dependencies Concurrency reduction Tracking file contents Avoiding external tools … Read the paper! Mokhov, Mitchell, Peyton Jones, Marlow: “Non-recursive Make Considered Harmful: Build Systems at Scale”, Haskell Symposium, 2016
  • 18. Dynamic dependencies Build target t: – Lookup t‘s dependencies {d1, …, dn} in the database – If the lookup fails or t doesn’t exist or t has changed or some dk is not up to date then: • Find the build rule matching t • Run the action, recording need’s • Update the database with newly recorded dependencies
  • 19. Problem 6: Computing command lines Build command lines introduce a lot of complexity, e.g.: Hard to extend: file-specific flags require WAY_p_file_HC_OPTS Solution: develop an EDSL for computing command lines WAY_p_HC_OPTS = -static –prof ... base_stage1_HC_OPTS = -this-unit-id base ... $1_$2_$3_HC_OPTS = $$(WAY_$3_HC_OPTS) $$($1_$2_HC_OPTS) ...plus 30 more configuration patterns
  • 20. Computing command line for a target preludeTarget = Target { context = Context Stage1 base profiling , builder = Ghc Stage1 , inputs = ["libraries/base/Prelude.hs"] , outputs = ["build/stage1/libraries/base/Prelude.p_o"] } Given preludeTarget how to compute the build command for it? inplace/bin/ghc-stage1 -O2 -prof -c libraries/base/Prelude.hs -o build/stage1/libraries/base/Prelude.p_o commandLine :: Target -> Action [String]
  • 21. Describing command lines compositionally ghcArgs :: Args ghcArgs = builder Ghc ? mconcat [ arg "-O2" , way profiling ? arg "-prof" , arg "-c", arg =<< getInput , arg "-o", arg =<< getOutput ] args :: Args args = mconcat [ ghcArgs, ..., userArgs ]
  • 22. Results Qualitative analysis: – We studied 11 common use-cases of GHC build system, such as “edit a source file and rebuild”, “add a new build command line argument and rebuild”, “git branch and rebuild”, etc. – The old build system performs a lot of unnecessary rebuilds in many cases, whereas Hadrian correctly handles most cases. Quantitative benchmarks: Hadrian is faster – Zero build: 2.2s vs 2.0s (Linux), 12.3s vs 2.1s (Windows) – Full build: 649s vs 578s (Linux), 1266s vs 737s (Windows)
  • 23. What’s next? Towards Cloud Build Systems with Dynamic Dependency Graphs
  • 24. Challenges at large scale Dynamic dependencies: – Neil Mitchell’s Shake: http://shakebuild.com/ – Jane Street’s Jenga: https://github.com/janestreet/jenga Large-scale build systems require sharing of build results: – Facebook’s Buck: https://buckbuild.com/ – Google’s Bazel: https://www.bazel.io/ Can we have both? Not today. (Although there is an extension of nix that can apparently do both. I haven’t checked that yet.)
  • 25. Research opportunity? Formalise the semantics of build systems: – Dynamic dependencies – Cloud builds – Non-determinism – Skipping intermediate build products Applied for Royal Society Industry Fellowship: https://blogs.ncl.ac.uk/andreymokhov/cloud-and-dynamic-builds/