Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 67

PhD public defense: A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems

0

Share

Download to read offline

PhD thesis for more details: https://doi.org/10.13140/RG.2.2.14414.41287

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

PhD public defense: A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems

  1. 1. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 1 16:00 Ahmed Zerouali Public PhD Defense “A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems”
  2. 2. A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Presented by: Ahmed Zerouali Advisor: Dr. Tom Mens Public PhD Defense Software Engineering Lab, Université de Mons - Belgium, 4 September 2019 Jury: Dr. Olivier Delgrange Dr. Alexandre Decan Dr. Jesus Gonzalez-Barahona Dr. Alexander Serebrenik
  3. 3. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Belgian Excellence of Science Research Project 2018- 2021 https://secoassist.github.io/ 3
  4. 4. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Background 4
  5. 5. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Focus IssuesUp-to-date 5
  6. 6. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Focus Don’t updateUpdate How can we help software developers to decide when and why they should update 6 ?
  7. 7. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Published papers 7
  8. 8. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag 8
  9. 9. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background Technical lag¹: the increasing difference between deployed software packages and the ideal available upstream packages. ¹ Gonzalez-Barahona, et al. "Technical Lag in Software Compilations: Measuring How Outdated a Software Deployment Is." IFIP International Conference on Open Source Systems. Springer, Cham, 2017. ➢ Ideal: stability, security, functionality, recency, etc. ➢ Difference: version updates, bugs, vulnerabilities, lines of code, commits, etc. 9
  10. 10. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background Credits: https://exploring-data.com/vis/npm-packages-dependencies/ +20M dependencies 10
  11. 11. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background package.json 11
  12. 12. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background 12 Semantic Versioning Examples: 0.0.1, 1.0.0, 1.2.3, 1.2.3-beta
  13. 13. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background Other: *, ==1.2.3, >1.2.3, <1.2.3, 1.2.x, 1.x.x 13
  14. 14. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Example 14 1.0.1 3.6.0 * 1.2.0 allowed 4.0.0 ^1.0.0 ^1.0.0 = [ 1.0.0, 2.0.0 [ dependent package P required package D up-to-date dependency
  15. 15. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Example 15 1.0.1 1.2.0 2.0.1 3.6.0 4.0.0 2.0.0 * ^1.0.0 = [ 1.0.0, 2.0.0 [ allowed 4.1.0 missing updates dependent package P required package D outdated ^1.0.0 dependency
  16. 16. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Example 16
  17. 17. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Followed research approach Qualitative analysis: surveys and interviews Tooling Quantitative analysis Technical lag formal framework Mixed-methods approach 17
  18. 18. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Qualitative analysis: surveys and interviews Tooling Quantitative analysis Mixed-methods approach Technical lag formal framework 18
  19. 19. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag > Qualitative analysis Semi-structured Interviews: ➢ 5 software practitioners ➢ Place: FOSDEM 2019 ➢ Highly educated interviewees with an average of 10 years of experience 19 Technical Lag is important, especially if we mix between the benefits of updating and the effort needed to do that.
  20. 20. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag > Qualitative analysis Online surveys: ➢ 17 candidates (from facebook groups) ➢ Highly educated interviewees with an average of 3 years of experience MCQ: What would be the most appropriate (ideal) version of a software library to use? 20 ★ Most stable (14) ★ Latest available (9) ★ Most documented (7) ★ Most secure (5)
  21. 21. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Qualitative analysis: surveys and interviews Tooling Quantitative analysis Mixed-methods approach Technical lag formal framework 21
  22. 22. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag 22 ∆ version ∆ time ∆ bugs ∆ vulnerabilities
  23. 23. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag ● is a set of component releases ● is a set of possible lag values ● ideal : → is a function returning the “ideal” component release ● delta : x → is a function computing the difference between two component releases ● agg : is a function aggregating the results of a set of lags 23
  24. 24. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag 24 Given a technical lag framework , we define: Aggregated Technical lag Technical lag Let be a set of components, then:
  25. 25. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Qualitative analysis: surveys and interviews Tooling Quantitative analysis Mixed-methods approach Technical lag formal framework 25
  26. 26. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag > Framework Instantiation 26
  27. 27. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages 27
  28. 28. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 28 Technical Lag in npm Packages Time-based instantiation of : ➢ Ideal: Highest available version ➢ Delta: Time Lag = date(ideal) - date(used) ➢ Aggregation: Max
  29. 29. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 29 Technical Lag in npm Packages Version-based instantiation of : ➢ Ideal: Highest available version ➢ Delta: Version lag = (∆Major, ∆Minor, ∆Patch) ➢ Aggregation: Sum
  30. 30. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 30 Technical Lag in npm Packages > Time-based instantiation 1.0.1 1.1.0 2.0.01.2.0 2.0.1 Dependent package Technical lag time lag = date(2.0.1) - date(1.1.0) Required package
  31. 31. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 31 Technical Lag in npm Packages > Version-based instantiation 1.0.1 1.1.0 2.0.01.2.0 2.0.1 Dependent package Technical lag 1 minor time lag = date(2.0.1) - date(1.1.0) Required package
  32. 32. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 32 Technical Lag in npm Packages > Version-based instantiation 1.0.1 1.1.0 2.0.01.2.0 2.0.1 Dependent package Technical lag 1 minor time lag = date(2.0.1) - date(1.1.0) 1 major Required package
  33. 33. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 33 Technical Lag in npm Packages > Version-based instantiation 1.0.1 1.1.0 2.0.01.2.0 2.0.1 Dependent package Technical lag 1 minor time lag = date(2.0.1) - date(1.1.0) version lag = (1 major, 1 minor ,1 patch) 1 major 1 patch Required package
  34. 34. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages > Example Time-based technical lag for the package debug: - ideal (2.6.9) = 3.1.0 - time_lag(2.6.9) = 26-09-2017 - 22-09-2017 = 4 days - version_lag(2.6.9) = (1,1,1) 34
  35. 35. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages > Example Time-based technical lag for the package ms: - time_lag(2.0.0) = 198 days 35
  36. 36. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages > Example Aggregated Time-based technical lag for the package release youtube-player@5.5.0: - time_lag(debug@2.6.9) = 4 days - time_lag(ms@2.0.0) = 198 days ➔ agglag({debug@2.6.9, ms@2.0.0}) = max (4 days, 198 days) = 198 days 36
  37. 37. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages New package releases have an increased technical lag. Technical lag is induced by version constraints Time lag induced by direct dependencies in npm package releases: 37
  38. 38. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages Time lag induced by transitive runtime dependencies in npm package releases: Technical lag is accumulated from a level to another in the dependency tree. 38
  39. 39. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm applications
  40. 40. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm applications Time lag induced by direct dependencies in GitHub applications: Technical lag in GitHub applications is higher than in npm package releases 40
  41. 41. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Docker Containers 41
  42. 42. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Background 42 ● Containers are isolated bundles of software packages ● Docker is one of the main tools for containerisation ● DockerHub is the largest repository for container images
  43. 43. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Background 43 What are the biggest barriers to putting containers in a production environment? - ClusterHQ
  44. 44. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Background 44
  45. 45. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Focus 45
  46. 46. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Background 1.0.1 1.1.0 2.0.01.2.1 2.1.0 Docker container C Technical lag technical lag = ∆ Versions (freshness) ∆ Vulnerabilities (security) ∆ Bugs (stability) Ideal Version deployed container Included Package version Available releases of a Debian package 46
  47. 47. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Software Freshness The majority of packages in Debian containers is up-to-date... … but most of the images contain outdated packages. How outdated are images? IDEAL = LATEST 47
  48. 48. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Software Freshness Outdated Debian packages in Docker containers induce a median version lag of 1 version. What is the version lag induced by the used Debian package releases? IDEAL = LATEST 48
  49. 49. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Software Security The number of vulnerabilities depends on the Debian release, and is moderately correlated with the number of outdated packages in a container. 49
  50. 50. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Software Security Few packages (2.5%) need to be downgraded in order to have the most secure version. Can we reduce security lag in DockerHub container images? IDEAL = Most Secure 50
  51. 51. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Survey 51
  52. 52. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers 52 1 out of 3 depending on a vulnerable npm package never update their dependency and remain vulnerable A. Decan et al. “On the impact of security vulnerabilities in the npm package dependency network”, MSR 2018. "37% of websites include a JavaScript library with a known open source vulnerability” T. Lauinger et al. "Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web", NDSS 2017. So what about Docker containers having npm packages? Security vulnerabilities in npm JavaScript
  53. 53. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers 53 ★ Old Node images might be missing many updates, including one major update. ★ All official Node-based images have vulnerable npm packages, with an average of 16 security vulnerabilities per image. ★ Older images are more likely to have more vulnerabilities.
  54. 54. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Qualitative analysis: surveys and interviews Tooling Quantitative analysis Mixed-methods approach Technical lag formal framework 54
  55. 55. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems ConPan: A tool to analyze packages in software containers > Existing tools 55
  56. 56. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems (*): https://github.com/neglectos/ConPan ConPan(*): “CONtainer Packages ANalyzer” ConPan: A tool to analyze packages in software containers > Overview 56
  57. 57. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Example: $ conpan -p debian -c google/mysql -d ~/ConPan/data/debian/ ConPan: A tool to analyze packages in software containers > From the CLI 57
  58. 58. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems ConPan: A tool to analyze packages in software containers > From the API 58
  59. 59. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems ConPan: A tool to analyze packages in software containers > From the API 59
  60. 60. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems ConPan: A tool to analyze packages in software containers > From the API 60
  61. 61. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Summary and Outlook 61
  62. 62. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Summary 62
  63. 63. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Future Work ● Other instantiations of the technical lag ○ Effort needed to reduce the technical lag ● Extend and enhance ConPan ● Cross ecosystems comparison ● Promote technical lag to be used by software developers 63
  64. 64. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Conclusion The technical lag could help open source software developers and deployers to keep their software in a healthy shape. 64
  65. 65. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems https://github.com/neglectos/PhD_Dissertation
  66. 66. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 66 Evolution of Dependency constraints in npm packages - Caret (^) usage is increasing over time. - Caret introduction coincides with Major version lag increase.
  67. 67. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 67 The usage of strict constraint is much higher in external applications Evolution of Dependency constraints in GitHub applications

×