Successfully reported this slideshow.

Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf

0

Share

Loading in …3
×
1 of 23
1 of 23

Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf

0

Share

Download to read offline

Slides presented at an MIT seminar: Wrangling Software Projects

Provides both a commercial and an open-source perspective on the benefits, costs, and risks of taking on dependencies.

Slides presented at an MIT seminar: Wrangling Software Projects

Provides both a commercial and an open-source perspective on the benefits, costs, and risks of taking on dependencies.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf

  1. 1. Managing Software Dependencies and the Supply Chain Wrangling Software Engineering Projects MIT EM.S20 Andrew Lamb April 6, 2022
  2. 2. Goal Give both a commercial and an open-source perspective on the benefits, costs, and risks of taking on dependencies.
  3. 3. About me MIT Course VI-2 ‘02, MEng ‘03 17 years professional development 🤔 15 commercial enterprise software (startups at various stages) ● Oracle, DataPower/IBM, Vertica/HP, Nutonian, DataRobot Last 2 years in open source commercial software development ● InfluxData, contributor to influxdb_iox ● Maintainer of arrow-rs, arrow-datafusion, and sqlparser-rs projects ● PMC member of Apache Arrow
  4. 4. Software “Supply Chain” ? Code Contributors Project Management (e.g PRs) User (😊) AWS Marketplace Apple Pay CI / CD system Software Distribution E.g. Dockerhub, App Store
  5. 5. Software Supply Chain Complexity 2005: Andrew’s First Startup (DataPower) ● C/C++, < 5 dependences (OpenSSL) ● Single binary, distributed to customers, on CD or via FTP 2022: Andrew’s Current Startup (InfluxDB) ● IOx has …. 606 dependencies (rust alone) Distributed as a docker image on GCR
  6. 6. Dependencies? ● Software Engineering 101 (6.001 / 6.037) ● “Don’t Reinvent the Wheel”: Use a pre-existing library of code ● The number and quality of pre-existing libraries grown massively ● Example: ○ 2004: DataPower had a custom written HTTP/S implementation, url parser, and more! ○ 2022: Most languages have a library to do it (requests for python, node, reqwest in Rust, etc)
  7. 7. (Dramatically) Lowers Cost of Building Software ● Low Barrier to Entry: Someone else designed the API, implemented and (hopefully) tested it ○ E.g. can get a cross platform, secure webserver up and running almost instantly, ● Maintenance: You benefit from bugs fixed by others ● Debuggability: Source code is available, you can often even step through it
  8. 8. Managing Dependencies: Licensing ● Software Patent licensing is still a (huge) thing ○ IBM makes $1Bn a year on software licensing ● You need to ensure you have the legal right to use the software. ● Good news: Most organizations have figured out licensing, have known good “approved” set of licenses. ○ As long as you stick to known good ones ● Example “Auto Approve” (permissive): MIT, BSD, Apache 2 ● Example “Special Dispensation”: MongoDB server side license ● Example “Do not use”: GPL / LGPL
  9. 9. Managing Dependencies: Quality Quality of many Open Source dependencies is outstanding ● Crowdsourcing means more investment into bug reporting and fixing ● In theory you can look at the code to assess the quality ● You have many options to choose from
  10. 10. Managing Dependencies: Quality ● Amount of time spent on reviewing / assessing open source is minimal (both commercially and in open source) – think reviewing 606 packages ● No one to cry to: Maintainers have limited time to respond to your issue ● Open source maintainers typically stretched (very) thin ● Parable: “broke my old version, sorry”: dtolnay/quote/#204
  11. 11. Managing Dependencies: Security ● Somewhat terrifying to read “Backstabber's toolkit” paper ● Open source maintainers do not have loads of time ○ Open source is fundamentally based on trust but verify (in the maintainers + community) ○ Possible to abuse that trust and insert malicious code ● Surface Area: dependencies of dependencies
  12. 12. Managing Dependencies: Build times / package bloat ● Dependencies add build time to compiled languages (C/C++, Rust) ● Add significant bloat to binary / distribution size (MBs!) ○ Parable: Dependency (python) stack in one startup was > 1.5GB package. ● “DLL Hell”: Version matching dependencies (of dependencies)
  13. 13. Managing Dependencies: Keeping up to date ● Dependencies get upgraded with unpredictable regularity ● Things like security fixes you want/need, also features you probably don’t Challenges ● Open source projects invest relatively less time on maintaining past releases. ○ p.s. Microsoft Windows: programs written 20+ years ago still run fine ● ⇒ bump dependencies a lot (daily) ● “Semantic versioning” - helps auto update dependencies 🤗 ○ Sometimes do release incompatibilities and break builds 😖 ○ Can get different binaries depending on *when* you run your build 😱 ○ “Backstabbers Toolkit” 😓
  14. 14. Managing Dependencies: Packaging Packaging: Gathering your code and dependencies into an executable “package” that user can run on their system As number dependencies grow, so does challenges in packaging / DLL Hell ● Language Runtime ● Your direct dependencies (e.g. http library) ● Indirect dependencies (e.g url parser) ● System dependencies (libssl, libqt, etc)
  15. 15. How to Manage
  16. 16. Think Twice about Adding New Dependencies “A little copying is better than a little dependency.” - Rob Pike via https://go-proverbs.github.io/ E.g. One data structure from a library of data structures Anti-example: http clients / crypto library
  17. 17. Best Practice: CI/CD (test, test, and test some more) CI: Run Tests on change branch Build “Artifacts” CD: release / deploy Source Code (in git) CI: Run Tests (on main branch) Propose change via Pull Request approve + merge to main branch CI == Continuous Integration CD == Continuous Deployment Likely more tests here Likely more tests here
  18. 18. Best Practice: Package Manager ❏ Use package manager built into your ecosystem: ❏ Java; maven ❏ Python: Pip ❏ Nodejs: NPM ❏ Ruby: Ruby Gems ❏ Rust: cargo ❏ … ❏ C/C++ CMake (not quite a package manager, but closer than Makefiles) ❏ Use “freeze” “shrinkwrap” or “version lock” feature to control updates ❏ Ensure you use widely used packages (wisdom of crowds)
  19. 19. Managing Dependencies: Best Practices ❏ Invest heavily in automated testing ❏ Especially end to end tests, and key features that rely on behavior of dependencies ❏ Invest in keeping dependencies up to date ❏ Update direct dependencies (tools like Dependabot can help) ❏ Help debug and fix your dependent libraries ❏ Submit patches back upstream ❏ May need to fork / apply a fix while you wait for maintainer to release new version
  20. 20. Managing Dependencies: Packaging Technology to the rescue (enabler) ● Static Linking ● yum + .rpm ; apt + .deb ● FX; Electron (for Java; nodejs / desktop apps) ● Containerization (docker, et al) ● VMs (“Virtual Appliances”)
  21. 21. Thank you Questions?
  22. 22. Readings (tentative): https://ieeexplore-ieee-org.libproxy.mit.edu/stamp/stamp.jsp?tp=&arnumber=242525 – software maturity https://www.oreilly.com/library/view/understanding-open-source/0596005814/ch06.html – reasonably thorough overview of software licensing https://arxiv.org/pdf/2005.09535.pdf – supply-chain attacks https://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm.html – specific example of how easy/common broad supply-chain breaks are today [optional] https://blogs.sap.com/2020/06/26/attacks-on-open-source-supply-chains-how-hackers-poison-the-well/ [optional] https://www.gnu.org/licenses/license-compatibility.en.html [optional] https://www.tandfonline.com/doi/pdf/10.1080/14783360500235819?needAccess=true – software maturity

×