Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Build systems-at-scale

284 views

Published on

Systems Research Challenges Workshop, 16th-17th January

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Build systems-at-scale

  1. 1. Build Systems at Scale Andrey Mokhov, Neil Mitchell, Simon Peyton Jones, Simon Marlow Systems Research Challenges Workshop, January 2017
  2. 2. Build systems Build system Dynamic dependencies Cloud builds Supports non- determinism Skips some intermediates Bazel (Google) ✓ Buck (Facebook) ✓ A new build system by another big company ✓ ✓ ✓ Jenga (Jane Street) ✓ ✓ Shake (originally by Standard Chartered) ✓ ✓ Make (Bell Labs, 1976) ✓
  3. 3. Meet Hadrian: a new build system for the Glasgow Haskell Compiler (GHC)
  4. 4. The GHC and the build system Glasgow Haskell Compiler: – 25 years old – 100s of contributors – 10K+ source files – 1M+ lines of Haskell code – 3 build stages – 18 build ways – 30 build programs: alex, ar, gcc, ghc, ghc-pkg, happy, … The current build system: – Non-recursive Make – Fourth major rewrite – 200 makefiles – 10K+ lines of code – 3 build phases – Highly user-customisable – And it works! But…
  5. 5. The result of 25 years of development $1/$2/build/%.$$($3_osuf) : $1/$4/%.hs $$(LAX_DEPS_FOLLOW) $$$$($1_$2_HC_DEP) $$($1_$2_PKGDATA_DEP) $$(call cmd,$1_$2_HC) $$($1_$2_$3_ALL_HC_OPTS) -c $$< -o $$@ $$(if $$(findstring YES,$$($1_$2_DYNAMIC_TOO)), -dyno $$(addsuffix .$$(dyn_osuf),$$(basename $$@))) $$(call ohi-sanity-check,$1,$2,$3,$1/$2/build/$$*) Make uses a global namespace of mutable string variables – Numbers, arrays, associative maps are encoded in strings – No encapsulation and implementation hiding – Variable references are spliced into Makefiles: avoid spaces/colons – To expand a variable use $; to get $ use $$; to get $$ use $$$$…
  6. 6. There are other problems 1. A global namespace of mutable string variables 2. Dynamic dependencies 3. Build rules with multiple outputs 4. Concurrency reduction 5. Fine-grain dependencies 6. Computing command lines, essential complexity Solution: use FP to design scalable abstractions – To solve 1-5: we use Shake, a Haskell library for writing build systems – To solve 6: we develop a small EDSL for building command lines Accidental complexity
  7. 7. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj]
  8. 8. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Run Shake with default options
  9. 9. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] A list of top-level targets and build rules.
  10. 10. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Build foo.o target
  11. 11. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Pattern rule to build object files *.o from C sources *.c
  12. 12. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Action to build the file obj
  13. 13. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Compute the source file name (replace extension) (-<.>) :: FilePath -> String -> FilePath
  14. 14. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Declare a dependency and build it if required need :: [FilePath] -> Action ()
  15. 15. Shake import Development.Shake import System.FilePath main :: IO () main = shake shakeOptions $ do want ["foo.o"] "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] Build obj from src by running gcc with appropriate arguments
  16. 16. From Make to Shake "*.o" %> obj -> do let src = obj -<.> "c" need [src] run "gcc" ["-c", src, "-o", obj] %.o : %.c gcc -c $< -o $@ Make: Shake: Yes, Make works really great for such simple build systems! Shake is more verbose in this case, but scales much better
  17. 17. Quick wins with Shake Dynamic dependencies Post-use & order-only dependencies Polymorphic/fine-grain dependencies Concurrency reduction Tracking file contents Avoiding external tools … Read the paper! Mokhov, Mitchell, Peyton Jones, Marlow: “Non-recursive Make Considered Harmful: Build Systems at Scale”, Haskell Symposium, 2016
  18. 18. Dynamic dependencies Build target t: – Lookup t‘s dependencies {d1, …, dn} in the database – If the lookup fails or t doesn’t exist or t has changed or some dk is not up to date then: • Find the build rule matching t • Run the action, recording need’s • Update the database with newly recorded dependencies
  19. 19. Problem 6: Computing command lines Build command lines introduce a lot of complexity, e.g.: Hard to extend: file-specific flags require WAY_p_file_HC_OPTS Solution: develop an EDSL for computing command lines WAY_p_HC_OPTS = -static –prof ... base_stage1_HC_OPTS = -this-unit-id base ... $1_$2_$3_HC_OPTS = $$(WAY_$3_HC_OPTS) $$($1_$2_HC_OPTS) ...plus 30 more configuration patterns
  20. 20. Computing command line for a target preludeTarget = Target { context = Context Stage1 base profiling , builder = Ghc Stage1 , inputs = ["libraries/base/Prelude.hs"] , outputs = ["build/stage1/libraries/base/Prelude.p_o"] } Given preludeTarget how to compute the build command for it? inplace/bin/ghc-stage1 -O2 -prof -c libraries/base/Prelude.hs -o build/stage1/libraries/base/Prelude.p_o commandLine :: Target -> Action [String]
  21. 21. Describing command lines compositionally ghcArgs :: Args ghcArgs = builder Ghc ? mconcat [ arg "-O2" , way profiling ? arg "-prof" , arg "-c", arg =<< getInput , arg "-o", arg =<< getOutput ] args :: Args args = mconcat [ ghcArgs, ..., userArgs ]
  22. 22. Results Qualitative analysis: – We studied 11 common use-cases of GHC build system, such as “edit a source file and rebuild”, “add a new build command line argument and rebuild”, “git branch and rebuild”, etc. – The old build system performs a lot of unnecessary rebuilds in many cases, whereas Hadrian correctly handles most cases. Quantitative benchmarks: Hadrian is faster – Zero build: 2.2s vs 2.0s (Linux), 12.3s vs 2.1s (Windows) – Full build: 649s vs 578s (Linux), 1266s vs 737s (Windows)
  23. 23. What’s next? Towards Cloud Build Systems with Dynamic Dependency Graphs
  24. 24. Challenges at large scale Dynamic dependencies: – Neil Mitchell’s Shake: http://shakebuild.com/ – Jane Street’s Jenga: https://github.com/janestreet/jenga Large-scale build systems require sharing of build results: – Facebook’s Buck: https://buckbuild.com/ – Google’s Bazel: https://www.bazel.io/ Can we have both? Not today. (Although there is an extension of nix that can apparently do both. I haven’t checked that yet.)
  25. 25. Research opportunity? Formalise the semantics of build systems: – Dynamic dependencies – Cloud builds – Non-determinism – Skipping intermediate build products Applied for Royal Society Industry Fellowship: https://blogs.ncl.ac.uk/andreymokhov/cloud-and-dynamic-builds/

×