SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
1.
Managing Software
Dependencies and the Supply
Chain
Wrangling Software Engineering Projects
MIT EM.S20
Andrew Lamb
April 6, 2022
2.
Goal
Give both a commercial and an open-source perspective on the benefits, costs,
and risks of taking on dependencies.
3.
About me
MIT Course VI-2 ‘02, MEng ‘03
17 years professional development 🤔
15 commercial enterprise software (startups at various stages)
● Oracle, DataPower/IBM, Vertica/HP, Nutonian, DataRobot
Last 2 years in open source commercial software development
● InfluxData, contributor to influxdb_iox
● Maintainer of arrow-rs, arrow-datafusion, and sqlparser-rs projects
● PMC member of Apache Arrow
4.
Software “Supply Chain” ?
Code
Contributors
Project
Management
(e.g PRs)
User (😊)
AWS
Marketplace
Apple Pay
CI / CD
system
Software
Distribution
E.g.
Dockerhub,
App Store
5.
Software Supply Chain Complexity
2005: Andrew’s First Startup (DataPower)
● C/C++, < 5 dependences (OpenSSL)
● Single binary, distributed to customers, on CD or via FTP
2022: Andrew’s Current Startup (InfluxDB)
● IOx has …. 606 dependencies
(rust alone)
Distributed as a
docker image on
GCR
6.
Dependencies?
● Software Engineering 101 (6.001 / 6.037)
● “Don’t Reinvent the Wheel”: Use a pre-existing library of code
● The number and quality of pre-existing libraries grown massively
● Example:
○ 2004: DataPower had a custom written HTTP/S implementation, url parser,
and more!
○ 2022: Most languages have a library to do it (requests for python, node,
reqwest in Rust, etc)
7.
(Dramatically) Lowers Cost of Building Software
● Low Barrier to Entry: Someone else designed the API, implemented
and (hopefully) tested it
○ E.g. can get a cross platform, secure webserver up and running almost instantly,
● Maintenance: You benefit from bugs fixed by others
● Debuggability: Source code is available, you can often even step
through it
8.
Managing Dependencies: Licensing
● Software Patent licensing is still a (huge) thing
○ IBM makes $1Bn a year on software licensing
● You need to ensure you have the legal right to use the software.
● Good news: Most organizations have figured out licensing, have
known good “approved” set of licenses.
○ As long as you stick to known good ones
● Example “Auto Approve” (permissive): MIT, BSD, Apache 2
● Example “Special Dispensation”: MongoDB server side license
● Example “Do not use”: GPL / LGPL
9.
Managing Dependencies: Quality
Quality of many Open Source dependencies is outstanding
● Crowdsourcing means more investment into bug reporting and fixing
● In theory you can look at the code to assess the quality
● You have many options to choose from
10.
Managing Dependencies: Quality
● Amount of time spent on reviewing / assessing open source is minimal (both
commercially and in open source) – think reviewing 606 packages
● No one to cry to: Maintainers have
limited time to respond to your issue
● Open source maintainers typically
stretched (very) thin
● Parable: “broke my old version, sorry”:
dtolnay/quote/#204
11.
Managing Dependencies: Security
● Somewhat terrifying to read “Backstabber's toolkit” paper
● Open source maintainers do not have loads of time
○ Open source is fundamentally based on trust but verify (in the maintainers + community)
○ Possible to abuse that trust and insert malicious code
● Surface Area: dependencies of dependencies
12.
Managing Dependencies: Build times / package bloat
● Dependencies add build time to compiled languages (C/C++, Rust)
● Add significant bloat to binary / distribution size (MBs!)
○ Parable: Dependency (python) stack in one startup was > 1.5GB package.
● “DLL Hell”: Version matching dependencies (of dependencies)
13.
Managing Dependencies: Keeping up to date
● Dependencies get upgraded with unpredictable regularity
● Things like security fixes you want/need, also features you probably don’t
Challenges
● Open source projects invest relatively less time on maintaining past releases.
○ p.s. Microsoft Windows: programs written 20+ years ago still run fine
● ⇒ bump dependencies a lot (daily)
● “Semantic versioning” - helps auto update dependencies 🤗
○ Sometimes do release incompatibilities and break builds 😖
○ Can get different binaries depending on *when* you run your build 😱
○ “Backstabbers Toolkit” 😓
14.
Managing Dependencies: Packaging
Packaging: Gathering your code and dependencies into an executable “package”
that user can run on their system
As number dependencies grow, so does challenges in packaging / DLL Hell
● Language Runtime
● Your direct dependencies (e.g. http library)
● Indirect dependencies (e.g url parser)
● System dependencies (libssl, libqt, etc)
16.
Think Twice about Adding New Dependencies
“A little copying is better than a little
dependency.”
- Rob Pike via https://go-proverbs.github.io/
E.g. One data structure from a library of data structures
Anti-example: http clients / crypto library
17.
Best Practice: CI/CD (test, test, and test some more)
CI: Run
Tests
on change
branch
Build
“Artifacts”
CD: release
/ deploy
Source
Code
(in git)
CI: Run Tests
(on main
branch)
Propose
change via Pull
Request
approve +
merge to
main
branch
CI == Continuous Integration
CD == Continuous Deployment
Likely more
tests here
Likely more
tests here
18.
Best Practice: Package Manager
❏ Use package manager built into your ecosystem:
❏ Java; maven
❏ Python: Pip
❏ Nodejs: NPM
❏ Ruby: Ruby Gems
❏ Rust: cargo
❏ …
❏ C/C++ CMake (not quite a package manager, but closer than Makefiles)
❏ Use “freeze” “shrinkwrap” or “version lock” feature to control updates
❏ Ensure you use widely used packages (wisdom of crowds)
19.
Managing Dependencies: Best Practices
❏ Invest heavily in automated testing
❏ Especially end to end tests, and key features that rely on behavior of dependencies
❏ Invest in keeping dependencies up to date
❏ Update direct dependencies (tools like Dependabot can help)
❏ Help debug and fix your dependent libraries
❏ Submit patches back upstream
❏ May need to fork / apply a fix while you wait for maintainer to release new version
20.
Managing Dependencies: Packaging
Technology to the rescue (enabler)
● Static Linking
● yum + .rpm ; apt + .deb
● FX; Electron (for Java; nodejs / desktop apps)
● Containerization (docker, et al)
● VMs (“Virtual Appliances”)