1. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
1
16:00 Ahmed Zerouali
Public PhD Defense
“A Measurement Framework for Analyzing Technical Lag in Open-Source
Software Ecosystems”
2. A Measurement Framework for
Analyzing Technical Lag in
Open-Source Software Ecosystems
Presented by:
Ahmed Zerouali
Advisor:
Dr. Tom Mens
Public PhD Defense
Software Engineering Lab, Université de Mons - Belgium, 4 September 2019
Jury:
Dr. Olivier Delgrange
Dr. Alexandre Decan
Dr. Jesus Gonzalez-Barahona
Dr. Alexander Serebrenik
3. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Belgian Excellence of
Science Research Project 2018- 2021
https://secoassist.github.io/
3
4. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Background
4
5. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Focus
IssuesUp-to-date
5
6. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Focus
Don’t updateUpdate
How can we help software developers to
decide when and why they should update
6
?
7. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Published papers
7
8. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag
8
9. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
Technical lag¹: the increasing difference between
deployed software packages and the ideal available
upstream packages.
¹ Gonzalez-Barahona, et al. "Technical Lag in Software Compilations: Measuring How
Outdated a Software Deployment Is." IFIP International Conference on Open Source
Systems. Springer, Cham, 2017.
➢ Ideal: stability, security, functionality, recency, etc.
➢ Difference: version updates, bugs, vulnerabilities, lines
of code, commits, etc.
9
10. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
Credits: https://exploring-data.com/vis/npm-packages-dependencies/
+20M
dependencies
10
11. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
package.json
11
12. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
12
Semantic Versioning
Examples: 0.0.1, 1.0.0, 1.2.3, 1.2.3-beta
13. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
Other: *, ==1.2.3, >1.2.3, <1.2.3, 1.2.x, 1.x.x
13
14. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Example
14
1.0.1
3.6.0
*
1.2.0
allowed
4.0.0
^1.0.0
^1.0.0 = [ 1.0.0, 2.0.0 [
dependent
package P
required
package D
up-to-date
dependency
15. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Example
15
1.0.1 1.2.0 2.0.1
3.6.0
4.0.0
2.0.0
*
^1.0.0 = [ 1.0.0, 2.0.0 [
allowed
4.1.0
missing updates
dependent
package P
required
package D
outdated
^1.0.0
dependency
16. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Example
16
17. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Followed research approach
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Technical lag
formal framework
Mixed-methods
approach
17
18. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Mixed-methods
approach
Technical lag
formal framework
18
19. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
> Qualitative analysis
Semi-structured Interviews:
➢ 5 software practitioners
➢ Place: FOSDEM 2019
➢ Highly educated interviewees with an average of 10
years of experience
19
Technical Lag is important, especially if we mix
between the benefits of updating and the effort
needed to do that.
20. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
> Qualitative analysis
Online surveys:
➢ 17 candidates (from facebook groups)
➢ Highly educated interviewees with an average of 3
years of experience
MCQ: What would be the most appropriate (ideal) version
of a software library to use?
20
★ Most stable (14)
★ Latest available (9)
★ Most documented (7)
★ Most secure (5)
21. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Mixed-methods
approach
Technical lag
formal framework
21
22. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
22
∆ version
∆ time
∆ bugs
∆ vulnerabilities
23. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
● is a set of component releases
● is a set of possible lag values
● ideal : → is a function returning the “ideal” component release
● delta : x → is a function computing the difference between two
component releases
● agg : is a function aggregating the results of a set of lags
23
24. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
24
Given a technical lag framework , we define:
Aggregated Technical lag
Technical lag
Let be a set of components, then:
25. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Mixed-methods
approach
Technical lag
formal framework
25
26. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
> Framework Instantiation
26
27. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
27
28. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
28
Technical Lag in npm Packages
Time-based instantiation of :
➢ Ideal: Highest available version
➢ Delta: Time Lag = date(ideal) - date(used)
➢ Aggregation: Max
29. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
29
Technical Lag in npm Packages
Version-based instantiation of :
➢ Ideal: Highest available version
➢ Delta: Version lag = (∆Major, ∆Minor, ∆Patch)
➢ Aggregation: Sum
30. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
30
Technical Lag in npm Packages
> Time-based instantiation
1.0.1 1.1.0 2.0.01.2.0 2.0.1
Dependent
package
Technical lag
time lag = date(2.0.1) - date(1.1.0)
Required
package
31. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
31
Technical Lag in npm Packages
> Version-based instantiation
1.0.1 1.1.0 2.0.01.2.0 2.0.1
Dependent
package
Technical lag
1 minor
time lag = date(2.0.1) - date(1.1.0)
Required
package
32. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
32
Technical Lag in npm Packages
> Version-based instantiation
1.0.1 1.1.0 2.0.01.2.0 2.0.1
Dependent
package
Technical lag
1 minor
time lag = date(2.0.1) - date(1.1.0)
1 major
Required
package
33. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
33
Technical Lag in npm Packages
> Version-based instantiation
1.0.1 1.1.0 2.0.01.2.0 2.0.1
Dependent
package
Technical lag
1 minor
time lag = date(2.0.1) - date(1.1.0)
version lag = (1 major, 1 minor ,1 patch)
1 major 1 patch
Required
package
34. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
> Example
Time-based technical lag for the package debug:
- ideal (2.6.9) = 3.1.0
- time_lag(2.6.9) = 26-09-2017 - 22-09-2017 = 4 days
- version_lag(2.6.9) = (1,1,1)
34
35. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
> Example
Time-based technical lag for the package ms:
- time_lag(2.0.0) = 198 days
35
36. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
> Example
Aggregated Time-based technical lag for the package release
youtube-player@5.5.0:
- time_lag(debug@2.6.9) = 4 days
- time_lag(ms@2.0.0) = 198 days
➔ agglag({debug@2.6.9, ms@2.0.0}) = max (4 days, 198 days) = 198 days
36
37. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
New package releases have an increased technical lag.
Technical lag is induced by version constraints
Time lag induced by direct dependencies in npm package
releases:
37
38. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
Time lag induced by transitive runtime dependencies in npm
package releases:
Technical lag is accumulated from a level to another in the dependency
tree.
38
39. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm applications
40. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm applications
Time lag induced by direct dependencies in GitHub applications:
Technical lag in GitHub applications is higher than in npm
package releases
40
41. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Docker Containers
41
42. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Background
42
● Containers are isolated bundles
of software packages
● Docker is one of the main tools
for containerisation
● DockerHub is the largest repository
for container images
43. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Background
43
What are the biggest barriers to putting containers in a
production environment? - ClusterHQ
44. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Background
44
45. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Focus
45
46. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Background
1.0.1 1.1.0 2.0.01.2.1 2.1.0
Docker container
C
Technical lag
technical lag =
∆ Versions (freshness)
∆ Vulnerabilities (security)
∆ Bugs (stability)
Ideal Version
deployed
container
Included
Package
version
Available
releases of a
Debian package
46
47. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Software Freshness
The majority of packages in Debian containers is up-to-date...
… but most of the images contain outdated packages.
How outdated are images?
IDEAL = LATEST
47
48. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Software Freshness
Outdated Debian packages in Docker containers induce a
median version lag of 1 version.
What is the version lag induced by the used Debian package releases?
IDEAL = LATEST
48
49. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Software Security
The number of vulnerabilities depends on the Debian release, and is
moderately correlated with the number of outdated packages in a container.
49
50. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Software Security
Few packages (2.5%) need to be downgraded in order to have the
most secure version.
Can we reduce security lag in DockerHub container images?
IDEAL = Most Secure
50
51. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Survey
51
52. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
52
1 out of 3 depending on a vulnerable npm package never update
their dependency and remain vulnerable
A. Decan et al. “On the impact of security vulnerabilities in the npm package
dependency network”, MSR 2018.
"37% of websites include a JavaScript library
with a known open source vulnerability”
T. Lauinger et al. "Thou Shalt Not Depend on Me: Analysing the Use of Outdated
JavaScript Libraries on the Web", NDSS 2017.
So what about Docker containers having npm packages?
Security vulnerabilities in npm JavaScript
53. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
53
★ Old Node images might be missing many updates, including
one major update.
★ All official Node-based images have vulnerable npm packages,
with an average of 16 security vulnerabilities per image.
★ Older images are more likely to have more vulnerabilities.
54. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Mixed-methods
approach
Technical lag
formal framework
54
55. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
ConPan: A tool to analyze packages in software containers
> Existing tools
55
56. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
(*): https://github.com/neglectos/ConPan
ConPan(*): “CONtainer Packages ANalyzer”
ConPan: A tool to analyze packages in software containers
> Overview
56
57. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Example: $ conpan -p debian -c google/mysql -d ~/ConPan/data/debian/
ConPan: A tool to analyze packages in software containers
> From the CLI
57
58. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
ConPan: A tool to analyze packages in software containers
> From the API
58
59. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
ConPan: A tool to analyze packages in software containers
> From the API
59
60. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
ConPan: A tool to analyze packages in software containers
> From the API
60
61. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Summary and Outlook
61
62. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Summary
62
63. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Future Work
● Other instantiations of the technical lag
○ Effort needed to reduce the technical lag
● Extend and enhance ConPan
● Cross ecosystems comparison
● Promote technical lag to be used by software developers
63
64. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Conclusion
The technical lag could help open source software developers
and deployers to keep their software in a healthy shape.
64
65. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
https://github.com/neglectos/PhD_Dissertation
66. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
66
Evolution of Dependency constraints in npm packages
- Caret (^) usage is increasing over time.
- Caret introduction coincides with Major version lag increase.
67. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
67
The usage of strict constraint is much higher in external
applications
Evolution of Dependency constraints in GitHub applications