SlideShare a Scribd company logo
1 of 61
Observability in the
Software Supply Chain
Seeing into your build system
QCon SF 2019
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
honeycomb-build-ssc/
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Intro
Ben Hartshorne
@maplebed
ben@honeycomb.io
What’s this talk about
● Part story
● Part pulling back the curtain
● Part encouragement
Part 1 - Story Time
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
● Our builds were slow
● Well, they felt slow
● Were they actually?
● We had no idea
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
… but we believe in
Observability!!!
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
● Instrumentation
○ Changed our model
○ Guided work
Part 1: Story Time
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
https://imgs.xkcd.com/comics/compiling.png
Part 1: Story Time
● Instrumentation
○ Changed our model
○ Guided work
○ Proved results
○ Uncovered unknown variance
Part 1: Story Time
● Using production-focused tools
○ Developers unfamiliar with build
use the same tools they know
to understand a different service
Part 1: Story Time
Part 2 - How does it work?
● Go back to the beginning of the story
○ Between “observability”
and “so we instrumented it”
○ We need to define the problem
Part 2: How did we do it?
● Vocab: Build System
○ Really Build and Test
○ On a trigger (usually a commit),
do some stuff
○ Can Succeed or Fail
○ Runs many shell commands
Part 2: How did we do it?
● Core duties of a build system
○ Set up an isolated environment
○ Run a bunch of commands
○ Stop when one fails
○ Record the result
Part 2: How did we do it?
● Core instrumentation for builds
○ How long did it take
○ What commands were run
■ How long did each take
○ Did it succeed
■ Did each command succeed
Part 2: How did we do it?
● Core instrumentation for builds
○ How long did it take
○ What commands were run
■ How long did each take
○ Did it succeed
Part 2: How did we do it?
● Sample build config
Part 2: How did we do it?
● Let’s write a small wrapper that
○ Takes a name and a command
○ Measures how long it takes to run
○ Records its exit status
■ and passes it along
○ Outputs the resulting data
Part 2: How did we do it?
Part 2: How did we do it?
Part 2: How did we do it?
Part 2: How did we do it?
● Our new build config
Part 2: How did we do it?
● What gets this beyond prototype?
○ Switch languages bash -> go
○ Link commands together
○ Improve our data model
○ Collect additional context
Part 2: How did we do it?
● Link our commands together
○ Tie all the spans in to a trace
○ Use the Build ID from the
build system
Part 2: How did we do it?
Part 2: How did we do it?
● Improve our data model
○ Group commands into steps
■ Time the whole group
○ build is the whole thing
○ step is a group of commands
○ cmd is one specific command
Part 2: How did we do it?
Part 2: How did we do it?
Part 2: How did we do it?
Part 2: How did we do it?
Part 2: How did we do it?
● Collect additional context
○ Branch name, PR#, CI system, etc.
○ Custom fields from the developer
■ Artifact size
■ Build depth
■ Links to other data sources
■ Test results, etc.
Part 2: How did we do it?
● Collect additional custom context
○ Put name=value pairs in a file
○ Put the name of that file in the env
○ Buildevents includes those fields
Part 2: How did we do it?
Part 2: How did we do it?
● One more trick
○ Interrogate the build system API
○ CircleCI has live API endpoints
○ Reveals the real start time
○ Other juicy data
Part 2: How did we do it?
● Speaking of open source …
github / honeycombio / buildevents
Part 2: How did we do it?
Travis-CI Google Cloud BuildGitLab JenkinsX CircleCI
● Build systems run shell commands
● By wrapping shell commands
we hook in instrumentation
● Use the env or fs for IPC
● Pull extra context from the env
● Pull extra context from APIs
● Send that data to visualization
Part 2: How did we do it?
Part 3 - What’s next?
● Build systems are part of the SSC
○ and traces easily represent builds
● What other parts would benefit from
new visualizations and tracking?
○ Commit to deploy lifecycle
○ ???
Part 3: What’s next?
● Commit to deploy lifecycle
○ Many commits to one PR
○ Many review cycles before merge
○ Many builds along the way
○ Many environments in deploy
○ Many PRs in one deploy?
○ Reversions?
Part 3: What’s next?
● Commit to deploy lifecycle challenge
○ Represent individual run well
■ Show commit-forward
■ Show deploy-backward
■ Represent cycles, delays
Part 3: What’s next?
● Commit to deploy lifecycle challenge
○ Represent runs in aggregate
■ Overall lead time
■ Trends in PR review delay
■ Time from merge to deploy
Part 3: What’s next?
● Commit to deploy lifecycle challenge
○ Ease of integration into toolchain
■ Source code repositories
■ Build systems
■ Deploy systems
■ etc.
Part 3: What’s next?
Part 3: What’s next?
Homework!!!!!
● Instrumentation and visualization
lead to smarter work
● Build systems are an
easy first step
Summary
● The tools we use for prod
can be used in the SSC
● The SSC is ripe for
new insights and visualizations
Summary
● The ideas of observability
apply outside of production
and bring value to the SSC
Summary
Ben Hartshorne
maplebed
honeycomb.io
Thank you
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
honeycomb-build-ssc/

More Related Content

More from C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

More from C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Recently uploaded

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Observability in the SSC: Seeing Into Your Build System

  • 1. Observability in the Software Supply Chain Seeing into your build system QCon SF 2019
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ honeycomb-build-ssc/
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 5. What’s this talk about ● Part story ● Part pulling back the curtain ● Part encouragement
  • 6. Part 1 - Story Time
  • 7. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 8. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png ● Our builds were slow ● Well, they felt slow ● Were they actually? ● We had no idea
  • 9. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png … but we believe in Observability!!!
  • 10. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 11. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 12. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 13. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 14. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 15. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 16. ● Instrumentation ○ Changed our model ○ Guided work Part 1: Story Time
  • 17. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 18. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 19. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 20. Part 1: Story Time https://imgs.xkcd.com/comics/compiling.png
  • 22. ● Instrumentation ○ Changed our model ○ Guided work ○ Proved results ○ Uncovered unknown variance Part 1: Story Time
  • 23. ● Using production-focused tools ○ Developers unfamiliar with build use the same tools they know to understand a different service Part 1: Story Time
  • 24. Part 2 - How does it work?
  • 25. ● Go back to the beginning of the story ○ Between “observability” and “so we instrumented it” ○ We need to define the problem Part 2: How did we do it?
  • 26. ● Vocab: Build System ○ Really Build and Test ○ On a trigger (usually a commit), do some stuff ○ Can Succeed or Fail ○ Runs many shell commands Part 2: How did we do it?
  • 27. ● Core duties of a build system ○ Set up an isolated environment ○ Run a bunch of commands ○ Stop when one fails ○ Record the result Part 2: How did we do it?
  • 28. ● Core instrumentation for builds ○ How long did it take ○ What commands were run ■ How long did each take ○ Did it succeed ■ Did each command succeed Part 2: How did we do it?
  • 29. ● Core instrumentation for builds ○ How long did it take ○ What commands were run ■ How long did each take ○ Did it succeed Part 2: How did we do it?
  • 30. ● Sample build config Part 2: How did we do it?
  • 31. ● Let’s write a small wrapper that ○ Takes a name and a command ○ Measures how long it takes to run ○ Records its exit status ■ and passes it along ○ Outputs the resulting data Part 2: How did we do it?
  • 32. Part 2: How did we do it?
  • 33. Part 2: How did we do it?
  • 34. Part 2: How did we do it?
  • 35. ● Our new build config Part 2: How did we do it?
  • 36. ● What gets this beyond prototype? ○ Switch languages bash -> go ○ Link commands together ○ Improve our data model ○ Collect additional context Part 2: How did we do it?
  • 37. ● Link our commands together ○ Tie all the spans in to a trace ○ Use the Build ID from the build system Part 2: How did we do it?
  • 38. Part 2: How did we do it?
  • 39. ● Improve our data model ○ Group commands into steps ■ Time the whole group ○ build is the whole thing ○ step is a group of commands ○ cmd is one specific command Part 2: How did we do it?
  • 40. Part 2: How did we do it?
  • 41. Part 2: How did we do it?
  • 42. Part 2: How did we do it?
  • 43. Part 2: How did we do it?
  • 44. ● Collect additional context ○ Branch name, PR#, CI system, etc. ○ Custom fields from the developer ■ Artifact size ■ Build depth ■ Links to other data sources ■ Test results, etc. Part 2: How did we do it?
  • 45. ● Collect additional custom context ○ Put name=value pairs in a file ○ Put the name of that file in the env ○ Buildevents includes those fields Part 2: How did we do it?
  • 46. Part 2: How did we do it?
  • 47. ● One more trick ○ Interrogate the build system API ○ CircleCI has live API endpoints ○ Reveals the real start time ○ Other juicy data Part 2: How did we do it?
  • 48. ● Speaking of open source … github / honeycombio / buildevents Part 2: How did we do it? Travis-CI Google Cloud BuildGitLab JenkinsX CircleCI
  • 49. ● Build systems run shell commands ● By wrapping shell commands we hook in instrumentation ● Use the env or fs for IPC ● Pull extra context from the env ● Pull extra context from APIs ● Send that data to visualization Part 2: How did we do it?
  • 50. Part 3 - What’s next?
  • 51. ● Build systems are part of the SSC ○ and traces easily represent builds ● What other parts would benefit from new visualizations and tracking? ○ Commit to deploy lifecycle ○ ??? Part 3: What’s next?
  • 52. ● Commit to deploy lifecycle ○ Many commits to one PR ○ Many review cycles before merge ○ Many builds along the way ○ Many environments in deploy ○ Many PRs in one deploy? ○ Reversions? Part 3: What’s next?
  • 53. ● Commit to deploy lifecycle challenge ○ Represent individual run well ■ Show commit-forward ■ Show deploy-backward ■ Represent cycles, delays Part 3: What’s next?
  • 54. ● Commit to deploy lifecycle challenge ○ Represent runs in aggregate ■ Overall lead time ■ Trends in PR review delay ■ Time from merge to deploy Part 3: What’s next?
  • 55. ● Commit to deploy lifecycle challenge ○ Ease of integration into toolchain ■ Source code repositories ■ Build systems ■ Deploy systems ■ etc. Part 3: What’s next?
  • 56. Part 3: What’s next? Homework!!!!!
  • 57. ● Instrumentation and visualization lead to smarter work ● Build systems are an easy first step Summary
  • 58. ● The tools we use for prod can be used in the SSC ● The SSC is ripe for new insights and visualizations Summary
  • 59. ● The ideas of observability apply outside of production and bring value to the SSC Summary
  • 61. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ honeycomb-build-ssc/