This study reviews software dependencies and complexity, describes the dependency extraction (DepEx) framework, and conducts a case study of Ubuntu to assess its operational performance.
1. Analyzing the Evolution of Inter-package
Dependencies in Operating Systems: A
Case Study of Ubuntu
Victor Prokhorenko, Chadni Islam,
Muhammad Ali Babar
2. Overview
Software dependencies and complexity
Operating Systems context
DepEx framework - Dependency extraction
Ubuntu – Findings
Future work directions and enhancements
3. Types of software dependencies
• Code libraries (source or binary)
• Network sockets
• UNIX domain sockets
• Pipes
• File-based
• Run-time code provisioning
• Downloading from external source
• Generated by external applications
• Discoverable and undiscoverable
4. Code libraries
• Available in source form at compile time
• Used through various form of Import- and Include-like statements
• Can be in-lined or embedded, thus eliminating external dependency
• Dynamically loadable at run time
• Typically met in form of .so and .dll files
• Can have recursive dependencies of their own
5. Binary Dynamically Loadable Libraries
• Required to be present in the system for a given application to start
• Required to be located in a discoverable “place”
• Required to contain the necessary functionality
• List of exported functions
• Versioning considerations
• Source code may be inaccessible
• Sourced from:
• Application bundle
• Pre-existing Operating System
7. Complexity aspect: dependency-based metrics
• How do dependencies
reflect application
complexity?
• How do dependencies
reflect library importance?
• Four metrics investigated:
Presence, Coverage,
Occurrence, Usage
• Developer-facing
complexity vs. recursion
8. Operating system context
• Single application development phase
• Testing through compilation and execution
• Multiple tools: compiler, IDE, tests, debuggers
• Multiple applications usage phase
• Automatic dependency installation (apt install)
• Library version conflicts: versioned names
• Bundling: archive, container, VM
• Base system inflation
9. System-wide dependencies observability
• Emergent high-level architecture appears as a result of combining multiple
independently-developed applications
• Constant system modifications and updates lead to lack of stable picture
• Lack of bird’s eye view:
• Are there any libraries missing that are required by executables in the system?
Which ones?
• Which executables would not be able to run due to the lack of required libraries?
• What are the most popular/critical libraries (i.e. required by most number of
executables)? Least popular?
• Which libraries are present in the system but not required by any executable?
10. DepEx: Dependency Extractor framework
• Plugin-based architecture
• Presence and Coverage metrics
• Scans file system and stores
discovered dependencies in a
structured database
• Current development targeted at
run-time file modifications tracking
11. Ubuntu case study
• Why Ubuntu?
• High popularity
• Consistent archives
• Detailed release notes
• Long history – chance to find evolutionary patterns
• Technical challenges
• Disk space requirements
• Image format and compression changed over time
12. Ubuntu case study: statistics
• 84 consecutive versions (5.04 to 23.04)
• 18 years of history
• 114GB of compressed images
• 9.8 million total files
• Over 408000 total binaries and executables
• Almost 2 millions of library-level dependencies extracted
18. Ubuntu: most popular libraries (direct)
Rank Library Direct uses
1 libc 4397
2 libpthread 1438
3 libglib 1037
4 libgobject 945
5 libm 836
6 librt 719
7 libgthread 660
8 libgmodule 658
9 libgtk-x11 656
10 libdl 601 0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
L I B C L I B P T H R E A D L I B G L I B L I B G O B J E C T L I B M L I B R T L I B G T H R E A D L I B G M O D U L E L I B G T K - X 1 1 L I B D L
DIRECT USES
21. Conclusion
• Executables with a high number of recursive dependencies can be removed
• “Complexity” comes from large number of subsystems: image formats, setup
• Highly popular libraries are here to stay
• A large number of libraries (up to three quarters) are not explicitly required
• Plugins – discovered and loaded through a different mechanism
• Shipped “just in case” for applications that would likely be installed
• While periodic cleanups occur in practice, averages still steadily grow
• Developer-facing complexity tends to be controlled
22. Future work directions and enhancements
• More plugins for more executable types
• Network awareness
• Run-time file activity tracking and dependency graph updating
• Real-time system health monitoring
• Missing libraries, broken executables, missing links
• Recovery/fixing recommendations
• Most required missing library?
• Last file event impact?