Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2UoIYPO.
Ben Hartshorne describes the transformation that Honeycomb went through, when they dropped build times by 40% and gave themselves the ability to track build times and asset sizes over time. Hartshorne covers the techniques one can use to accomplish the same goals in different environments. Filmed at qconsf.com.
Ben Hartshorne is an engineer at Honeycomb. For the last 13 years, he has built monitoring, alerting, and observability systems for companies ranging from startups like Simply Hired and Parse to large organizations such as Wikimedia and Facebook. He enjoys this work and is happy to finally be building a company and product that will help tease out nuances in data in novel and powerful ways.
2. InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
honeycomb-build-ssc/
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
22. ● Instrumentation
○ Changed our model
○ Guided work
○ Proved results
○ Uncovered unknown variance
Part 1: Story Time
23. ● Using production-focused tools
○ Developers unfamiliar with build
use the same tools they know
to understand a different service
Part 1: Story Time
25. ● Go back to the beginning of the story
○ Between “observability”
and “so we instrumented it”
○ We need to define the problem
Part 2: How did we do it?
26. ● Vocab: Build System
○ Really Build and Test
○ On a trigger (usually a commit),
do some stuff
○ Can Succeed or Fail
○ Runs many shell commands
Part 2: How did we do it?
27. ● Core duties of a build system
○ Set up an isolated environment
○ Run a bunch of commands
○ Stop when one fails
○ Record the result
Part 2: How did we do it?
28. ● Core instrumentation for builds
○ How long did it take
○ What commands were run
■ How long did each take
○ Did it succeed
■ Did each command succeed
Part 2: How did we do it?
29. ● Core instrumentation for builds
○ How long did it take
○ What commands were run
■ How long did each take
○ Did it succeed
Part 2: How did we do it?
31. ● Let’s write a small wrapper that
○ Takes a name and a command
○ Measures how long it takes to run
○ Records its exit status
■ and passes it along
○ Outputs the resulting data
Part 2: How did we do it?
35. ● Our new build config
Part 2: How did we do it?
36. ● What gets this beyond prototype?
○ Switch languages bash -> go
○ Link commands together
○ Improve our data model
○ Collect additional context
Part 2: How did we do it?
37. ● Link our commands together
○ Tie all the spans in to a trace
○ Use the Build ID from the
build system
Part 2: How did we do it?
39. ● Improve our data model
○ Group commands into steps
■ Time the whole group
○ build is the whole thing
○ step is a group of commands
○ cmd is one specific command
Part 2: How did we do it?
44. ● Collect additional context
○ Branch name, PR#, CI system, etc.
○ Custom fields from the developer
■ Artifact size
■ Build depth
■ Links to other data sources
■ Test results, etc.
Part 2: How did we do it?
45. ● Collect additional custom context
○ Put name=value pairs in a file
○ Put the name of that file in the env
○ Buildevents includes those fields
Part 2: How did we do it?
47. ● One more trick
○ Interrogate the build system API
○ CircleCI has live API endpoints
○ Reveals the real start time
○ Other juicy data
Part 2: How did we do it?
48. ● Speaking of open source …
github / honeycombio / buildevents
Part 2: How did we do it?
Travis-CI Google Cloud BuildGitLab JenkinsX CircleCI
49. ● Build systems run shell commands
● By wrapping shell commands
we hook in instrumentation
● Use the env or fs for IPC
● Pull extra context from the env
● Pull extra context from APIs
● Send that data to visualization
Part 2: How did we do it?
51. ● Build systems are part of the SSC
○ and traces easily represent builds
● What other parts would benefit from
new visualizations and tracking?
○ Commit to deploy lifecycle
○ ???
Part 3: What’s next?
52. ● Commit to deploy lifecycle
○ Many commits to one PR
○ Many review cycles before merge
○ Many builds along the way
○ Many environments in deploy
○ Many PRs in one deploy?
○ Reversions?
Part 3: What’s next?
53. ● Commit to deploy lifecycle challenge
○ Represent individual run well
■ Show commit-forward
■ Show deploy-backward
■ Represent cycles, delays
Part 3: What’s next?
54. ● Commit to deploy lifecycle challenge
○ Represent runs in aggregate
■ Overall lead time
■ Trends in PR review delay
■ Time from merge to deploy
Part 3: What’s next?
55. ● Commit to deploy lifecycle challenge
○ Ease of integration into toolchain
■ Source code repositories
■ Build systems
■ Deploy systems
■ etc.
Part 3: What’s next?