A Scalable Software Build Accelerator

833 views
724 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
833
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Scalable Software Build Accelerator

  1. 1. A Scalable Software Build Accelerator: Break the Build Bottleneck with Faster, More Accurate Builds John Ousterhout Founder and Chairman John Graham-Cumming Founder
  2. 2. Executive Summary For organizations that depend on software innovation, a slow software build process can be a bottleneck for the entire company. Slow build times not only impact engineering efficiency, they also affect product quality and company agility. Furthermore, diagnosing build problems is difficult or impossible: cryptic output in Make log files can be hard to decipher and is difficult to relate to the individual build steps in a large build. Electric Clouds core products, ElectricAccelerator and ElectricInsight, solve these problems by reducing software build times dramatically and providing graphical insight into the performance and structure of builds at a level impossible with existing tools. The solution to the build speed problem is at a first glance simple: create a distributed version of industry standard build tools (such as GNU Make or Microsoft NMAKE) that distributes individual job steps in parallel to a cluster of inexpensive servers. Over the years, many attempts have been made to create “parallel” or “distributed” build systems. However, none are in widespread use due to dependency issues and distributed computing problems that lead to broken builds. Electric Cloud, Inc. has developed an automated dependency management system that makes parallel builds safe, scalable, and efficient. ElectricAccelerator is a software build accelerator that significantly reduces software build times by distributing the build over a large cluster of inexpensive servers. ElectricAccelerator uses its patented dependency management system to identify and fix problems in real time that would break traditional parallel builds. ElectricAccelerator plugs seamlessly into existing Make- or Visual Studio-based infrastructures, and includes Web-based reporting and management tools. ElectricAccelerator has been proven in some of the most demanding software organizations and against large open-source projects. In one organization, a product took four and a half hours to build on a single system. It now builds 20x faster on a 30-node ElectricAccelerator cluster, finishing in less than 13 minutes. 2. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.
  3. 3. In another organization, a product took 3 hours and 12 minutes to build on a single system. It now builds 16x faster on a 30-node ElectricAccelerator cluster, finishing in less than 12 minutes. The open-source Samba file and print server takes 16 minutes to build on a single processor, but builds 16x faster, in 58 seconds, on a 20-node ElectricAccelerator cluster. The open-source MySQL database takes over 23 minutes to build on a single processor. It builds 12x faster, in 1 minute 54 seconds, on a 20-node ElectricAccelerator cluster.ElectricAccelerator improves the software development process byreducing build times, so development teams can reduce costs, shortentime-to-market, and improve quality and customer satisfaction.ElectricInsight is Electric Cloud’s build visualization tool. A companiontool to ElectricAccelerator, it takes advantage of structural informationrecorded by ElectricAccelerator and provides a graphical display of thisinformation that makes it easy to understand the behavior of builds, tuneperformance, and quickly debug broken builds.3. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  4. 4. The Real Impact of Slow Builds Every software engineer has experienced the frustration and delay caused by slow software builds. Every VP of Engineering has dreamed of rapid, accurate builds that meet the needs of overextended QA teams, enable rapid innovation to acquire new customers and allow timely turnaround for critical bug fixes. For most organizations such builds are quite simply dreams. Although the widespread Make-based build infrastructure used by many large software organizations has been around for over 20 years, it has lagged behind advanced IDEs, new languages, and template libraries that churn out more and more lines of code and result in slower and slower builds. As these projects grow, they are typically partitioned into a deep hierarchy of directories with dependencies hidden in the arcane language of Make. Trying to untangle a complex Makefile is a black art; it creates a vast legacy of code that any solution must support without change. Additionally, these deeply recursive Makes lead to brittleness as dependencies between directories are often implicitly defined by the order in which jobs are run without taking advantage of Make’s dependency mechanism. We have met with hundreds of commercial software development teams and very few have production build times less than two hours. More than half of the projects had build times in the 5-10 hour range, and a few organizations reported that build times had reached 40 hours or more at some point. Furthermore, the build issues are compounded since most organizations must simultaneously support multiple platforms and product versions. A large tangible cost is engineering team productivity: the time engineers spend waiting for their builds to complete. Most developers spend at least two hours per week waiting for builds to complete, and in some organizations developers spend as many as 10 hours per week waiting for builds during some development phases. Engineers are often forced to 4. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.
  5. 5. switch back to a bug fix checked in the day before because an overnight build has finally shown that there was a problem. Another cost is those times in the software development process where the team is constrained by the build process. Typically, this is during the “integration storm” phase of a release. Integration storms are periods of instability that occur several times during a release cycle when developers synchronize their changes into the main code line. Inevitably there are interactions between the changes made by different developers, causing broken builds and incorrect product behavior. It can take anywhere from several days to several weeks to iron out all the problems; during this period virtually the entire engineering organization is tied up fixing problems or waiting for the code line to stabilize. If builds take overnight, the organization may not be able to fix more than one or two problems per day. Long builds can also impact product quality. If builds take too long, developers don’t have the resources to quickly do a complete re-build of their product before they check in. It’s not uncommon for their changes to break the build, but that problem is not discovered until the nightly build runs. If that build was intended for the QA team, then after the problem has been identified and fixed, the team is forced to wait for the next nightly build. This means one less day testing that build. Since the period between QA drop builds is often fixed, there is not enough time to execute all of the scenarios on that build. If this happens near the end of a release cycle, it’s possible for a bug to make it out to the field, where customers encounter it.Traditional Approaches to Improving Build Performance There have been numerous attempts to improve the performance of Make over the last two decades. They fall into two general classes: “faster” approaches that execute pieces of the build in parallel, and “smarter” approaches that avoid work entirely. 5. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.
  6. 6. SMP HardwareOne solution to build speed is to buy a large multiprocessor machine anduse GNU Make’s -j switch to force it to run multiple jobs in parallel on thesame machine.Although this approach gives some speedup (typically 2-4x), it does notscale well because of the high per-CPU cost in a multiprocessor machineand because incomplete dependencies (especially in hierarchical Makes)become the build’s Achilles’ Heel. With incomplete dependencies, theparallel build tends to reorder build steps in ways that break the build,leading to unpredictable and inaccurate builds.ElectricAccelerator uses its patented dependency management system toidentify and fix problems in real time that would break traditional parallelbuilds. With this perfect information, ElectricAccelerator can achievespeedups of up to 20x.In addition, builds using GNU Make’s -j switch produce log output thatdiffers with each run, which makes it difficult to verify and debug builds.Electric Make ensures that the build log is written in the same order everytime.Distributed BuildsA variation of the parallel build approach is distributed builds, wherebuilds are run in parallel using a cluster of machines instead of amultiprocessor.In addition to all of the build ordering problems of parallel buildsdescribed in the previous section, this approach is fraught with difficulties.The clocks on the remote machines must be synchronized to ensure thatMake’s time stamp-based dependencies work correctly. All of themachines must be mounted on a reliable shared file system. Any failureon an individual node will cause the build to fail. Incomplete dependencyinformation can still cause inaccurate build results. Furthermore, the timetaken invoking a job (e.g. with ‘rsh’) can be high in traditionalapproaches, limiting the performance benefits.The ElectricAccelerator architecture eliminates several distributed-systemsissues that threaten the correctness or robustness of distributed builds.6. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  7. 7. Electric Make manages timestamps centrally to avoid clocksynchronization problems. It communicates with the nodes providing allfiles through a reliable protocol that can self-heal if a node fails, thuseliminating the need for a mounted file system and ensuring buildaccuracy every time regardless of hardware or operating system failure.Additionally, Electric Make uses a fast binary protocol to send jobs tonodes to reduce overhead and help achieve massively distributed builds.Manually Partition MakefilesSome organizations have taken the extreme step of manually breaking abuild up into a small number of steps that are run in parallel on differentmachines. This difficult and error-prone task requires detailed knowledgeof Makefile internals and typically yields only small speedups aspartitioning the build into smaller and smaller steps requires enormouseffort to ensure correct results.ElectricAccelerator completely automates the parallelization of builds atthe lowest level possible: individual job steps. It merges hierarchicalbuilds into a single build and uses a multi-threaded architecture to run asmany jobs as possible at the same time.Build AvoidanceAnother approach for improving build performance is to reduce theamount of work that must be done, either by doing better incrementalbuilds or by sharing results between independent builds. This is typicallydone by trying to rely on incremental builds rather than complete builds.Very few build organizations are willing to do incremental builds for theirproduction software; instead they rely on complete builds for QA andrelease. The risk of a broken build and complexity of ensuring that abuild is accurate across multiple product components leads mostorganizations to rely on full clean builds.SummaryIn summary, each of the approaches described above offers the potentialfor speeding up builds, but each makes the build process more brittle byincreasing the risk that a build will fail or that it will be inconsistent withthe sources.7. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  8. 8. None of the organizations we have talked with has been able to achieve more than a 6x speedup reliable enough for production builds, and only a very few have achieved even a 3x speedup after significant investments of time and resources. Most organizations run their builds completely sequentially or with only a small speedup, in order to keep the process as reliable as possible.ElectricAccelerator: a Highly Scalable Solution ElectricAccelerator is a software build accelerator that takes advantage of the abundant parallelism available in builds and capitalizes on recent technology improvements in inexpensive servers and fast networks. Instead of running a build sequentially on a single processor, ElectricAccelerator executes pieces of the build in parallel on a large cluster of inexpensive servers. (see Figure 1). ElectricAccelerator has four main software components: Electric Make, a new version of Make that reads Makefiles, analyzes dependencies, and coordinates activities on the nodes. Electric Make also acts as a file server for the nodes in the build cluster. Electric File System, a special-purpose file system driver that runs on the nodes in the cluster. It monitors every file access to provide the complete dependency information that allows ElectricAccelerator to automatically detect and correct out-of-order build steps. Electric Agent, a user-level component that runs on the nodes serves as an intermediary between Electric Make and Electric File System, and runs jobs at Electric Makes request. Cluster Manager, a Web server that allocates nodes for individual builds and provides reporting and management tools. To a user, ElectricAccelerator appears identical to other versions of Make or Visual Studio. Electric Make can be invoked anywhere that other versions of Make might be invoked, such as engineer workstations or dedicated build machines. Electric Make can be invoked interactively or 8. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.
  9. 9. as part of build scripts. The use of a cluster for the builds is invisible to the Electric Make user, except that the builds run much faster.Build Machine Cluster Manager Electric Make HTTP Web Server File Server Scheduler DB TCP/IP HTTP NodeCluster Node Agent Node Node Agent Agent Agent User Level Kernel Electric File System Electric File System Electric File System Electric File System Figure 1: The ElectricAccelerator Architecture Massively Distributed Builds To achieve build speedups of up to 20x, ElectricAccelerator couples a cluster of servers running as many jobs in parallel as possible with the kernel-level Electric File System. The Electric File System monitors every file access to compute dependencies automatically and ensure that build results are perfect every time. When a build starts, Electric Make reads existing Makefiles and determines the list of jobs that need to be executed. Electric Make then communicates with the Cluster Manager component. Cluster Manager controls access to the cluster of nodes and allocates a collection of nodes to Electric Make. The Cluster Manager controls access, but also adjudicates between competing builds. If multiple builds are requested simultaneously on a build cluster, the Cluster Manager is able to fairly allocate cluster nodes to the builds taking into account the available build resources, the requirements of a specific build and build priorities and related policies set in the Cluster Manager. 9. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.
  10. 10. For example, the Cluster Manager might be configured to allow generaluse of the cluster with automatic sharing and allow a build manager totake over the entire cluster when a build must be rapidly produced (forexample, for address a critical bug fix for a customer). Cluster Managerwill ensure that a build manager is able to use the entire cluster withoutterminating other running builds. Low priority builds would be placed ina wait state until the top priority build was completed and then would beautomatically continued. After continuing, normal sharing of the clusterwould resume.Electric Make then instructs each node to perform jobs on its behalf. Anode running a job reads and writes a variety of files (such as source andobject files) which are passed dynamically across the network via ElectricMake using a fast binary protocol developed by Electric Cloud. When eachjob completes, it sends its results (such as files written and log output) toElectric Make for final storage on disk. File data is cached on nodes forthe life of a build, in order to minimize network traffic for files that arereused.In addition the Electric File System running on the nodes captures everysingle file access performed by jobs and provides that information back toElectric Make. Using that information, coupled with the Makefiles’dependency information, Electric Make is able to determine the exactrelationship between jobs and files and fix any missing dependencies inreal time.If Electric Make detects that two jobs were performed in the wrong orderbecause missing Makefile dependency information made them appearindependent, it will automatically reschedule them for completion in thecorrect order and make a note of the missing dependency.Electric Make saves this missing dependency information to increase theperformance of subsequent runs. Successive builds use the additionaldependency information to run the build steps in the correct order, furtherimproving performance. Electric Make updates this missing dependencyinformation after every build, so that even as Makefiles evolve,performance is automatically maintained. Other approaches to parallelbuilds require a large on-going investment in Makefile maintenance to10. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  11. 11. keep the build from breaking, and to ensure that parallel performance ispreserved.At the end of the build, Electric Make communicates build results to theCluster Manager where they are made available through a web-basedinterface for reporting and management. ElectricAccelerator’s Cluster ManagerAccurate Incremental BuildsElectricAccelerator uses its perfect information about dependencies formore than just safe parallel builds. It also uses the dependencyinformation in a feature called eDepend, which enables accurateincremental builds. Historically, incremental builds have been unreliable:they only work if perfect dependency information is available, so thatMake knows which subset of files must be regenerated after a particularfile is changed. Since dependencies were not perfect, incremental buildswould sometimes fail to regenerate files affected by a change, so the onlysafe approach was to do a complete rebuild.Electric Make uses dependency information collected during previousbuilds to decide what to regenerate during incremental builds. Thismakes incremental builds accurate and reliable, and eliminates the needfor clumsy, slow ‘make depend’ steps. eDepend is superior todependency searching techniques (such as the free tool makedepend)because it is completely language agnostic. eDepend detects11. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  12. 12. dependencies at the file level without the need to search within files andunderstand individual languages. This language independence meansthat eDepend also detects dependencies between object files (forexample, eDepend can automatically detect a dependency between anexecutable and a library that the executable is linked with).Hierarchical BuildsAnother patented technique enables Electric Make to achieve massivelydistributed builds by flattening deep hierarchies of nested Makes. In atypical recursive build, a top-level Make executes a set of Makes insubdirectories to make individual components.Electric Make treats this hierarchy as a single Make by mergingdependency information from the complete hierarchy and identifying jobsthat can be safely run in parallel. In this way it achieves massiveconcurrency with the largest number of jobs running at the same time,even from different Makes.To ensure that the build is 100% accurate, Electric Makes eDependfeature also analyzes file usage information from the Electric File Systemto automatically detect dependencies between recursive Makes that werenever specified in the Makefiles.Plug CompatibleElectric Cloud’s replacement for the Make program, known as ElectricMake, is compatible with GNU Make, Microsoft NMAKE and Visual Studio.It understands the complete GNU Make and Microsoft NMAKE languagesand has identical command-line options.Starting to use Electric Make is a simple matter of changing invocations ofgmake or nmake to eMake with an appropriate command-line optionspecifying the emulation mode. For the ultimate in slot-in deployment, ifthe eMake program is renamed to gmake or nmake it automaticallydetermines whether it is implementing GNU Make or Microsoft NMAKE.Because Electric Make requires no Makefile changes it’s easy to deployand it’s even easier to verify. In addition to reading standard Makefilesand accepting standard command-line options, Electric Make producesidentical log file output. For example, verifying that Electric Make is doing12. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  13. 13. the same work as an existing Make is a simple matter of running the diffprogram. Electric Make even produces identical error messages in theevent of a broken build step.The only difference is that Electric Make runs the build as much as 20xfaster.Because Electric Make looks just like Make, existing build scripts (such asPerl wrappers) can be run without change. Electric Make plugs right intothe existing build system.RobustElectricAccelerator’s three user-level components (Electric Make, ClusterManager and Electric Agent) are in constant communication ensuring thatall are operating correctly so that failures are detected rapidly and fixed inreal time.As a build is running, Electric Make keeps track of the files and jobsrunning on each machine on the cluster. In the event that a cluster nodefails, Electric Make automatically detects the failure and reruns theincomplete job on another node.Electric Make communicates node failures to the Cluster Manager forreporting and management purposes, and continues the build maintaininghigh speed and build accuracy.As well as being able to handle cluster failures, ElectricAccelerator’sunique Electric File System automatically makes up for deficiencies inMakefiles where missing dependencies can cause traditional parallelizationmethods to create incorrect builds.Multi-PlatformAlthough some organizations have the luxury of working on a singleplatform, many face the realities of a heterogeneous world. Electric Cloudis no different. All of the ElectricAccelerator components work onMicrosoft Windows, Sun Solaris and Linux. Electric Make emulates thepopular GNU Make and Microsoft NMAKE programs.13. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  14. 14. Real World Performance ElectricAccelerator has been tested in some of the most demanding enterprise software organizations and against large open-source projects. In one organization, a product took four and a half hours to build on a single system. It now builds 20x faster on a 30-node ElectricAccelerator cluster, finishing is less than 13 minutes. In another organization, a product took 3 hours and 12 minutes to build on a single system. It now builds 16x faster on a 30-node ElectricAccelerator cluster, finishing is less than 12 minutes. The open-source Samba file and print server takes 16 minutes to build on a single processor, but it builds 16x faster, in 58 seconds, on a 20-node ElectricAccelerator cluster. The open-source MySQL database takes over 23 minutes to build on a single processor. It builds 12x faster, in 1 minute 54 seconds, on a 20- node ElectricAccelerator cluster. 14. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.
  15. 15. Build Visualization Complementing ElectricAccelerator is ElectricInsight, a graphical tool that mines extensive build information generated when a build is run with Electric Make to provide unprecedented information about the structure of a large build. With ElectricInsight its easy to get an overview of the running of a long build. This screen shot shows an example build that last 4,380s (about 1h 13m) and consists of 1,457 jobs. In the terminology of ElectricInsight, a job is an individual step from a Makefile such as a compilation or link. The ElectricInsight display shows the name of the machine running the build (in this case node0) and a horizontal bar chart. Each bar represents a single job and the bars length is proportional to that jobs running time. The bars are ordered from left to right in the same order as the jobs were executed by make. Just glancing at the display gives instant information about the build. In addition to the figures displayed in the bottom right hand corner (where you can find the total number of jobs executed and running time in seconds), the bar chart shows that: • There are a number of very long running jobs. Very early on in the build theres a single job (shown as a large patch of blue) which takes up about 15% of the build. • Areas of black are when many small jobs are running in succession with tight packing. Zooming on the display will reveal the actual jobs in detail. 15. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.
  16. 16. • Theres a gap about half way through the build when no job is running. Thats clearly a waste of time that needs investigating. • One of the large jobs is highlighted (by hovering the cursor over it) in pink and the bottom half of the ElectricInsight display shows the details for the job. In this case the job built build/motor/output/sharea5mass/debug/a2a5mass.so and spent 219s (or about 5% of the total time) on it.One job in the example build is taking 15% of the total build time.Investigating the job is trivial with a graphical tool like ElectricInsight.First, its obvious which jobs are taking up most of the time, and hoveringover the longest one reveals information about it.The name of the binary being built is revealed:build/bin/output/binaryserver/ debug/a1binaryserver.so. And itstaking 639.21s; over 10 minutes to compile! ElectricInsight shows thatthis one job consumes 14.59% of the entire build.If the name of the binary isnt enough to track down in the Makefile thejob thats running, a double click on the blue bar brings up an additionalwindow containing detailed information on the job. Here area1binaryserver.sos details.16. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  17. 17. The Output pane shows the actual output in the Make log thatsassociated with this one job. It reveals that the job consists of deletingfour files using the rm command, followed by compiling the object usingcompile. All the arguments and options for each command can be clickingthe Show Commands check box.The Job Details display also helps narrow down the job even further bygiving the name of the Makefile (in this case simply Makefile) and the linenumber within the Makefile (in this case 1953) where the rule for this jobis defined.ElectricInsight also provides valuable information when a build is run inparallel with the ElectricAccelerator system. The same example buildwhen run against an ElectricAccelerator build cluster of ten nodes isshown here.17. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  18. 18. With ElectricAccelerator, this build has dropped to running in 20m. If thebuild manager wants to reduce the build time even more, theElectricInsight display shows areas for further optimization: • The same long running jobs (a1binaryserver.so, a2motor.so, etc.) still dominate the build and optimizing them would bring down the overall time. • There are a number of large gaps meaning that parallelism isnt perfect. The build runs initially for about 50% of the time with jobs on every node consuming the CPU resources. After that, a number of large jobs block the rest of the build. This blocking effect occurs because the jobs that run at the end of the build are waiting for the large jobs to complete. Typically the blockage is caused by explicit dependency information in the Makefile (i.e. the final jobs are not permitted to run until specific objects have been built).In summary, optimizing the long jobs will make this build faster (andmuch faster in the parallel case).After optimizing the longest running jobs (perhaps by adding morememory to the machines they are running on, or getting developer help inbreaking the project apart), ElectricInsight can be used to visualize theresult.18. v2007.06whitepaper ©Electric Cloud, Inc. All rights reserved.
  19. 19. From the ElectricInsight diagram its clear that all the nodes in the ElectricAccelerator cluster are being used fully to run the build, leading to a much reduced build time.Conclusion Fast, reliable builds are now a reality. Electric Clouds solution provides enterprise-class software that accelerates the time consuming and costly software build process by as much as 20x and removes the guess work from keeping builds running accurately and quickly. ElectricAccelerator transforms inexpensive servers into highly scalable clusters so that eight-hour builds can finish in 30 minutes. ElectricInsight provides never-before-seen information about the structure, timing and dependencies in every build. And eDepend means the end of long dependency generation steps and perfect incremental builds every time. 19. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.
  20. 20. With Electric Cloud, development teams can reduce costs, shorten time- to-market, and improve overall product quality.About Electric Cloud Electric Cloud is the leading provider of software production management solutions that automate, accelerate, and analyze the software development tasks that follow the check-in of new code. These include the software build, package, test and deploy processes. The companys patented and award-winning solutions improve productivity in the face of increasing product complexity and time-to-market pressures for software delivery. In addition to ElectricAccelerator and ElectricInsight, Electric Cloud offers the only enterprise-class build and release management solution, called ElectricCommander. Leading companies such as Qualcomm, Intuit, and Expedia rely on Electric Clouds Software Production Management solutions to change software production from a liability to a competitive advantage. For customer inquiries please contact Electric Cloud at (650) 968-2950 or www.electric-cloud.com. Electric Cloud, ElectricInsight, ElectricAccelerator, ElectricCommander and Electric Make are trademarks of Electric Cloud. Other company and product names may be trademarks of their respective owners. 20. v2007.06 whitepaper ©Electric Cloud, Inc. All rights reserved.

×