My name is Viktor Adam and today I’d like to tell you about how we’ve reduced the build times on our developers’ laptops by almost 50% using Splunk.
I’m a software engineer at Atlassian since last year, previously I worked as a technical architect at a financial company in the UK. You have probably heard of Atlassian already, or you may already use some of our products, like Jira, Confluence, Bitbucket or Trello for example.
Our mission at Atlassian is to unleash the potential in every team. We want to help all teams everywhere to achieve great things, and my team in particular focuses on our own internal teams at Atlassian.
We look after the development experience and developer satisfaction in Jira Cloud. Part of this is the experience of building the application on developer machines while adding new features or making changes for example.
First, I’d like to talk about our growth in teams and in our codebase in Jira and some of the challenges that came with this expansion. I’ll talk about our initial approach to the problem and how Splunk helped us get on top of this. I’ll show some of our most helpful dashboards and some practical tips on how we got them to reveal the information we needed. I’ll finish with what we’ve achieved, where we failed and what we learned - something you can hopefully take away from this talk.
We have done a great job in growing our teams, and recruiting talented people is one of our top priorities at Atlassian. Armed with more developers, we’ve started producing increasingly more code that power Jira features.
In the last 2 and a half years, we’ve more than doubled the lines of Java code in our main Jira code repository. This contains the majority of the backend code, but of course there are many other kinds of sources, which have also seen significant growth.
In this repository, we use Maven as our build system. If you’re not familiar with Maven, it manages internal and external dependencies of a project, and has tasks to execute for each of the modules, like compiling the source code or packaging that into a runnable binary.
It arranges internal dependencies into a directed acyclic graph, then executes the modules in order, which have all their dependencies already processed. In this made-up example,
B and C can both start building once A has finished, and they can execute in parallel if we want them to. E would have to wait until both B and C have finished. Similarly, G needs to wait until all of D, E and F have finished completely. An interesting case is I, because if our end goal is to get to J, I doesn’t even have to be processed, since the end result doesn’t depend on it.
Hopefully, I will be able to show you how these connections were really important in our case.
Today we have about 2 and a half times more Maven modules in a single project than we had 2 and a half years ago, and we haven’t finished pulling together all the external sources that live outside of this repository right now. These modules brought even more dependencies with them, while some of them gained significant weight during this time, in terms of codebase size within them.
Some of the challenges this growth has brought about are increased complexity, inefficient builds for almost all teams, and growing execution times when building the project.
One could no longer execute a simple Maven build command and expect a reasonable build time with a valid target artifact, we now have to pass a good few arguments to this command to tweak what happens during the build.
All these modules being in a single monorepo also meant that all teams had to build everyone else’s code now as well - to produce a runnable version of Jira on their laptops, unless they were intimately familiar with all the modules and the dependencies between them, to know which are the ones they need to build to get the same result.
And of course, the increased size and number of modules also meant slower builds, not only because there is more to do, but because of the added complexity also made it harder to reason about what is taking so long.
To tackle these challenges, we started building some tooling to alleviate the problems.
We built a command line tool, called Jmake, that wraps Maven invocations and it makes sure those are executed with the right set of parameters, which can also evolve as the codebase changes. It also has some additional functionality to help developers working on Jira, like a number of healthchecks to identify problems, misconfiguration or missing tools on their system, and give them suggestions on how to resolve them.
We’ve also written a Maven extension to perform incremental builds. This extension keeps track of the previous build results, and makes sure only the modules that have changes are rebuilt, and the rest is skipped, so a specific team working on an individual Jira plugin can simply rebuild the module for that, but not for all the other plugins.
And we had multiple attempts to tackle long build times, which turned out to be harder than we initially expected. We dedicated a full quarter to this, because long build times were highlighted as the most significant pain point in our development satisfaction survey.
But first of all, we had to convert user reports like “my build is super slow” into measurements and data that we can analyze.
We started looking into maven-profiler, an open-source Maven extension that tracks the execution times of individual tasks during builds, and generates an HTML or JSON report.
This was great to highlight individual tasks that were slow on the machine that a person has used, but it didn’t help understanding the problem as a whole. It didn’t say anything about the internal dependencies in our monolithic project, and it was also not easy to share these results with others.
So we created an internal fork of the Maven profiler, and added a new reporter backend, that shipped the execution times metrics off to Datadog. These included anonymized properties about the build itself, focusing on the relevant part, which is the actual times here.
This enabled us to investigate how builds are performing across all our developers and when building on our continuous integration system as well. We could see aggregated statistics about the total build time, or only the download time portion in it, for example, with additional metrics like the memory usage figures. It allowed drilling down to identify the worst performing Maven modules on average or by other aggregate means, which was great but also not enough.
Metrics didn’t allow us to dissect the data as much as we wanted it to, because we needed to be careful about high cardinality and sending too many properties. We’ve realized that we needed to be able to separate and analyze individual builds in isolation, once we understood where the slow parts were in general. Aggregate metrics therefore weren’t enough anymore, we needed structured events - and this is where Splunk came in.
We’ve changed the Maven profiler’s reporting backend so that instead of sending metrics directly into Datadog, it submits them to an internal service that we call Devmetrics Publisher, then it will format those events appropriately for its target systems. Splunk receives them as structured events, while Datadog receives the aggregated measurements as metrics. This new setup also allowed better future integrations, making it easier to extend or change the services we can talk to.
Structured events were much better for us, as you can always aggregate them, but you don’t have to aggregate them beforehand. And since you’re not losing any of the original properties, it supports ad-hoc queries very well. Plus you don’t need to worry about high cardinality either, Splunk is perfectly capable of dealing with that too.
With this information available, we’ve started building dashboards to surface this data. We ended up with a very busy dashboard initially, because we weren’t exactly sure what we were looking for, and we were also unsure what it is telling us. We’ve had the same information on multiple panels with different aggregation to try and get it to suggest or hint at something. At least, having all these details in a single place in Splunk opened new ways to explore them and to look for interesting patterns in them.
Let’s look at how these dashboards have evolved over time into something more helpful, that revealed a lot of hints on what to focus our energy on when it comes to reducing our build times.
We started advertising our main dashboard on our command line tool, which gave developers a link to see the metrics collected about the build that has just finished. This enabled developers to look at their own statistics if they’re interested, and it also enabled my team to look at any of these sessions. We could use different Splunk queries to identify slow builds, then pull out the session identifiers from them, and analyze them one-by-one.
Some of the panels on this initial dashboard were less helpful than others. This one for example looks pretty nice, but I have no idea what it wants to tell me if I’m honest. The important thing is that we had the data we needed, and we displayed it in one way. Then started iterating over it, changing the dashboard until it’s got increasingly more helpful.
A perfect example of this was our build timeline panel. This shows when each module has started and finished building, on a single, scrollable timeline view. While it had all the information we needed, it was not easy to understand it and to draw conclusions from it. Some modules take longer, some take less time, some start earlier or later, but what does that mean?
If we change it slightly though, it gets much more useful immediately. This is still the same information as before, it shows when did each module start and stop, but now it also shows where our build is pausing and not doing as much as we wanted it to. Each of those blocks are modules, and each row is a thread where the build executes, so we can see which of them are running in parallel. Those gaps you can see told us two things: we have too many internal dependencies in our project, and increasing parallelism may not help too much as we already don’t have enough things to execute in some cases.
Internal dependencies are necessary for code reuse within our monolithic codebase, but they come at a cost. One aspect of it is that a module depending on another one will have to wait to start building until that other one has finished all its tasks, as I explained earlier. This led us to investigate breaking many of these connections, and changing the layout of our modules for example.
We’ve also realized we need to understand the critical path in our dependency chain.
This is the list of modules that were depending on each other, and the ones that made the build take as long as it did.
On our timeline view, this is shown in the last row, and is basically a copy of the list of the affected modules from above it. For efficiency, we decided to compute this list at build time, and just send it to Splunk as-is, so we can query it more easily.
With this dashboard, we could now see the list of modules we needed to make shorter if we wanted to reduce the overall time taken during builds.
Heatmaps were also super useful and they attract the eyes to the most relevant piece of information instantly. If you’re interested in heatmaps, you can enable them in Splunk by selecting it as the Data Overlay in the formatting menu of tables.
Based on this information, we came up with a plan.
We wanted to eliminate unnecessary or artificial dependencies. For example, integration tests are not necessary to build a runnable artifact - we only need to build those if we actually want to run them.
Splitting a Maven module into two was another good way to tackle this problem. We could either shorten a module on the critical path, or make it more concurrent so its tasks run in parallel.
And we also found some interesting connections introduced by strictly test-only dependency chains.
So what have we actually achieved? Let me show you through the changes of our timeline view I showed earlier.
These are the still the same view, but they display all the individual tasks in the modules rather than the modules only.
We have started optimizing the Maven build last December, and at the time, we had quite a few gaps where we didn’t use all the processing power we had available, and we also had some smaller modules scattered around. Those are the ones that had to wait for some other module to complete before they could start executing their tasks themselves. Basically, we found out that we’re wasting time and processing power.
We then removed dependencies on integration tests until there were none in our dependency chain when building the Jira web application. This meant, we had to split some modules to move the tests into their own, separate child modules, which now didn’t have to build by default. We now had less things to build and eliminated some dependencies as well.
Continuing on this path, we’ve identified unnecessary dependencies with the help of some tooling, think of a simple script, nothing fancy.
Either because of the changing codebase, or because of cloning configuration for new modules, we had quite a few instances of dependencies being declared, but not actually used. Maven doesn’t validate these and just trusts us, developers, so it was happily delaying modules unnecessarily when this was the case.
In another instance, we have forced a connection between two Maven modules to avoid having their builds executed at the same time, because that was basically rendering our developer laptops unusable, but it turned out we didn’t need this “optimization” anymore.
Next, we’ve realized - again with the help of scripts and tools -, that a huge number of modules are only delayed so far in the build because of the transitive dependencies of a test scoped dependency. These modules needed some code from a test scaffolding module, which also used or implemented some classes from another bigger module, and now all consumers of the test code were also waiting on the large module as well.
By splitting the test scaffolding code into a piece that doesn’t require anything from the large module in the middle, a new batch appeared early in the build which is mostly the set of modules that could now start at the same time as our large module.
Now the overall build started to look better, but there were a few longer modules towards the end we wanted to split up a bit.
One of them is responsible for a lot of REST endpoints in Jira, and we generate a Swagger schema specification for them in 3 different ways, each of them being a single task. Each of these take relatively long, but operate on the same compiled classes, so we just had to move the code into a new module, and the 3 executions into 3 individual modules that can now build in parallel, all at the same time.
The last thing we were looking at around March this year, was splitting the very last module, the web application itself, that builds the archive for deployment.
This module had to depend on virtually every other module that came before it so that all of those are included, but it also had some additional tasks on its own, like compiling JSP files and minifying frontend assets.
Simply extracting everything that is not packaging into a new module reduced the build time of the last module to less than 20%, with the remaining bits building much earlier, in parallel with other, unrelated modules. The small chart in the bottom right shows how long it used to be, and on the other side of the arrow, you can see how short it became.
The overall shape of the build has changed like this. There are much less gaps now, which means we’re using parallel execution more efficiently, and this is mainly thanks to the reduced internal dependencies. We also have slightly fewer modules to process now with the test-only code disappearing from the builds.
All these changes over 3 months resulted in about 40% shorter builds.
This is based on the 90th percentile of all developer builds.
And this is only when someone had to completely rebuild all the modules, otherwise our incremental build extension could now pick a smaller set of modules because of the reduced dependencies.
We have a Splunk dashboard we look at after each morning standup, where we could see how we were tracking during the quarter. We can see a figure for our current state, with a small trendline at the bottom, and a more informative panel with a histogram as well. The shape and the peak of the histogram tells us whether we’re moving in the right direction or not - nowadays it’s fairly stable.
We’ve also tried different visualizations for this. The bubble chart - for example - shows how many builds took a certain amount of time, grouped by minutes, so it is essentially a histogram, but over a time period. The bigger the bubble, the more builds took that amount of time. It shows that we have quite a few short ones, less than 3 minutes, those are mainly the incremental builds. Then there’s an increase in number of builds around 6-7 minutes as our 90th percentile showed us earlier. During the quarter, this would actually show the bigger circles moving around a bit towards the bottom.
And of course, not everything worked out the way we hoped for.
The problem space was quite complex, or more complex than we thought.
We found out that even the same model of MacBook can produce very different results depending on how many other applications are running, what the developer is doing during a build and so on. We could verify this based on data we could query in Splunk.
We’ve also observed that some optimizations were completely missed by certain teams if they very rarely had to rebuild the module we’ve changed.
And also, developers working from home or from another office had very different builds than what we had here in the Sydney office. Network intensive tasks tend to take longer for them for example, to the point where it’s becoming a bottleneck.
The data we collected was also very noisy.
We have a lot of Jira developers building Jira a lot of times, but still not often enough to be able to properly detect outliers.
We could also not easily filter out builds that were doomed to be long because the developer machine was running out of memory, or happened to have network connection issues, or something else.
And we had a hard time matching the changes we made to the changes we saw (or didn’t see) on the aggregate dashboards. We added some extra tags in our metrics with each change, but due to the small sample size and the constant evolution of the codebase, we had a really hard time correlating the data with the changes.
Choosing a good key metric to indicate our progress was quite difficult.
We spent a bit of time debating on how to measure if builds are getting faster, whether we want to optimize for the usual, average or overall experience, but couldn’t come up with anything advanced that is easy to measure, so we settled on the 90th percentile, even though we didn’t necessarily think it was the right metric.
We chose to ignore the machine type in the aggregation, but builds on Linux ended up being almost twice as fast than on OS X - though we have much less developers on Linux laptops than on MacBooks. Again, we could use Splunk to slice and dice the information available in there.
And we also had to exclude my team from this calculation, because we were doing full rebuilds all of the time while working on these optimizations, and our data would have skewed the results otherwise. We have done this with the help of some advanced search functionality in Splunk, called search macros.
We’ve also overestimated our ability to change other people’s behaviours.
We made some improvements to the Maven dependencies that alleviated the need for clearing the previous build results away before building new changes, hoping that we would see less full builds and more incremental builds, but this didn’t happen. We have queried a number of full builds in Splunk, where the data showed they could have been incremental, but weren’t. It turns out that teams have learned and got used to knowing that the incremental build extension has its own limitations and it’s easier to default to starting from a clean state, than to face hard to debug issues which can happen in some cases, like when you move between different Git branches for example.
Having this insight on the data available in Splunk was essential in understanding what to focus on, and in this case, where we need to change course.
All things considered, it was a successful quarter for our team, and Splunk helped us massively in achieving these results. Even though the 90th percentile might not have been the perfect metric, we did improve it and developers have noticed these improvements. So my advice would be: Focus on progress over perfection.
Find a good enough metric you can commit to and don’t worry too much about whether you found the best one or not. You can always change it and course-correct if it turns out to be a bad one.
Have trust that you’re moving in the right direction and metrics will follow. Instead of fearing that you chose poorly, do what you think makes sense, then verify your results.
Collect as many structured events as you can, then you can aggregate them in lots of different ways. But it doesn’t work the other way. If you aggregate first, you’re never going to see the data you aggregated away.
Also, collect as many properties as you can. You can work out later which of them are useful to you and which aren’t.
Just make sure you don’t end up with dashboards so busy that you lose focus of what’s going on on them and what they are trying to tell you.
Build visualizations that are easy for humans to understand and look over. We are bad at remembering things if we have to scroll, but we’re excellent at pattern matching. Especially if we can see those patterns on a single screen.
And of course, and we’re hiring. Come work with us on cool things and build awesome tools!
SplunkLive! - Want to Turbocharge your Developer Pipeline?
Want to Turbocharge your
Help your developers waste less time & optimize their build times using Splunk
VIKTOR ADAM | SOFTWARE ENGINEER @ ATLASSIAN | @RYCUS86
Has all the information we need
Start & finish times of each module
Hard to understand
The actual view is too long, needs scrolling
Does not help to draw conclusions easily
Timeline (per thread)
Same information as before
But now it fits on a single screen
Easy to spot them for the human eye