Nurturing the Software Ecosystems of the Future
Achievements of an Inter-University Research Project
Serge Demeyer, Tom Mens, Coen De Roover & Anthony Cleve
secoassist.github.io
@seco-assist
Duration: 4 years (2018-2022)
Budget: 2,4 million euros (150k per partner per year)
An "Excellence of Science" research project
Duration: 4 years (2018-2022)
Budget: 2,4 million euros (150k per partner per year)
An "Excellence of Science" research project
SOCIO - TECHNICAL
A software ecosystem is ...
X
SECO-ASSIST Research Goals
Today, over 80% of the software used in any IT product or service is open source
Societal challenge
Protect the society of the risks and dangers of an increasing dependence on software
ecosystems
Fundamental goals
Study and understand the socio-technical characteristics of software ecosystem health,
quality and sustainability over time
Predict/assist ecosystem evolution to increase long-term sustainability
Applied goal
Propose automated tools to help software development communities in increasing their
productivity, interaction, quality and resilience over time
SECO-ASSIST Research Goals
Improve social health
Retain key project contributors and attract new ones
Predict abandoners and suggest replacements
Identify toxic contributors
Ensure sufficient team diversity
Improve technical health
Better software tests, taking into account the software
dependencies, to reduce bugs and security leaks
Increase productivity and quality by using reusable
software libraries
Increase software maintainability by supporting
software upgrades and migration to new technologies
Improve interactions and co-evolution between data-
intensive software and their database(s)
SECO-ASSIST Research Goals
secoassist.github.io
@seco-assist
Tom
Mens
Anthony
Cleve
Coen
De Roover
Serge
Demeyer
social networks
software
testing
software
reuse
database
interactions
4 dimensions for analytics and recommendations
Socio-technical analysis of software contributor communities
evolution and impact of socio-technical congruence in software packaging ecosystems [ESEC/FSE 2019]
analysis of pull request comments in GitHub repositories [BENEVOL 2019]
probabilistic forecasting model to predict future activity of software project contributors [JSS 2020]
Detecting and analysing bot usage in development projects
identification of key characteristics exhibited by bot activities [BotSE 2020]
ML-based technique for detecting bots based on the repetitiveness of their commenting activity [JSS 2021]
study on the prevalence of bots as contributors in GitHub projects [IEEE Software 2022]
Improving development workflow automation
longitudinal study on the use and evolution of Continuous Integration tools in GitHub [SANER 2022]
large-scale quantitative analysis of the GitHub Actions ecosystem [ICSME 2022]
Studying variant projects in software families
study of the prevalence and importance of project forking in GitHub [BENEVOL 2020, EMSE 2022].
motivations for launching variants and impediments to maintaining the co-existing projects [SANER 2022]
study to quantify the extent of the sub-optimal maintenance in software families [ESEC/FSE22]
Main results – social networks & development workflows
Identifying inadequate test suites
mutation coverage to measure the strength of a test suite
recommendation of extra asserts to make the suite stronger [VST 2020]
Test amplification
first demonstration of the feasibility of test amplification for dynamically typed languages
SmallAmp – a tool to strengthen test suites within the Pharo Smalltalk ecosystem [EMSE 2022]
AmPyfier – first tool to strengthen test suites within the Python ecosystem [JSEP 2022]
Test transplantation
use tests from dependent projects to increase the coverage of base packages [EASE 2022]
use test slicing to reconstruct the appropriate object states when transplanting tests [SCAM 2022]
Main results – software testing
Release & implementation recommendations for library contributors
target = Ansible Galaxy ecosystem (reusable Infrastructure-as-Code libraries)
automated version increment recommendation (minor, major, patch) [SCAM 2020, JSS 2021, MSR 2021]
detection of 6 novel code smells related to Ansible’s semantics [MSR 2022]
Selection recommendations for library users
helping developers choose a library within vast ecosystems [SoHeal 2020, SCAM 2020, SANER 2022]
Instantiation recommendations for framework users
graph-based mining of frequent framework instantiation patterns [SANER 2019]
capturing the interplays between multiple related instantiation actions [SCAM 2022]
Dependency recommendations for library contributors
quantifying the problem of outdated dependencies [ICSME 2018, JSEP 2019, SANER 2019]
quantifying the outdatedness of packages pre-installed in DockerHub images [SCP 2021, EMSE 2021]
analyzing the adherence to semantic versioning [TSE 2019, SoHeal 2020, SCP 2021]
assessing the impact of security vulnerabilities [MSR 2018, EMSE 2022]
studying the practice of backporting fixes (including security patches) to older releases [TSE 2022]
Main results – software reuse
Static detection and analysis of SQL bad smells
static detection of bad smells in SQL queries [ICSE 2018]
prevalence and evolution of SQL code smells in data-intensive open source systems [MSR 2020]
Empirical studies on data-intensive systems
analyzing self-admitted technical debt in database access code [EMSE 2022]
investigating the (joint) use of data models and technologies [ER 2021]
Modeling, manipulating and evolving multi-database systems
HyDRa – a conceptual framework to design and manipulate hybrid polystores [ER 2021]
… and to ease their evolution [SANER 2022, BENEVOL 2022]
performance-based recommendation of polystore schema changes [BENEVOL 2022, ER 2022]
automated query adaptation to preserve system consistency [SCAM 2020]
Analyzing database-related testing practices
state-of-the-practice in testing database manipulation code [CAiSE 2021]
taxonomy of best practices for testing database code [Information Systems 2022]
Main results – database interactions
Open Source Tools (1)
BoDeGHA: A command-line tool to identify development bots in GitHub repositories by analysing pull request
and issue comments
https://github.com/mehdigolzadeh/BoDeGha
BoDeGiC: An (open source) command-line tool to identify bots in GitHub repositories by analysing git commit
messages
https://github.com/mehdigolzadeh/BoDeGiC
SQLInspect: A static SQL analyzer with plug-in support for Eclipse to inspect database usage in Java
applications
https://bitbucket.org/csnagy/sqlinspect
GAP: a command-line tool for forecasting future commit activity of contributors involved in software projects
distributed through git
https://github.com/AlexandreDecan/gap
ConPan: an open source command-line tool to inspect Docker containers
https://github.com/neglectos/ConPan
SmallAmp: a test amplification tool in Pharo Smalltalk to create new test methods based on manually written
ones to increase mutation coverage
https://github.com/mabdi/small-amp
Small-mince: A tool to slice tests in Pharo Smalltalk
https://github.com/mabdi/small-mince
PaReco: a tool to detect missed opportunities and effort duplication in ecosystems
https://github.com/KadjelRamkisoen/PaReco
Continuous Integration Antipattern Analyzer: a command line tool to analyze CI workflows in git repositories
https://github.com/FreekDS/CIAN
portion: a Python library (with 300+ stars on GitHub) providing data structures and operations to create,
manipulate and query disjunctions of intervals of any comparable objects and interval sets out of the box
https://github.com/AlexandreDecan/portion
SISMIC: a Python library providing a tool suite to define, simulate, execute and test statecharts, supporting test-
driven development, behaviour-driven development, design by contract, and property statecharts to monitor
violations of behavioural properties during statechart execution https://github.com/AlexandreDecan/sismic
Open Source Tools (2)
MUTAMA: a tool recommending MVNRepository tags for a given Java library
https://github.com/cvelazquezr/MUTAMA
RESICO: a tool for resolving the simple names of API types in incomplete code snippets (e.g., from Stack
Overflow) to their fully-qualified name https://github.com/cvelazquezr/RESICO
SCARE: a tool for extracting the structural changes between two releases of an Ansible role published on the
Ansible Galaxy ecosystem
https://github.com/ROpdebee/SCARE
LiFUSO: a tool for enumerating library features from its Stack Overflow posts
https://github.com/softwarelanguageslab/lifuso
HyDRa: a framework for hybrid polystore modeling and manipulation
https://github.com/gobertm/HyDRa
npmgraph: A tool for checking license compatibilities for npm packages
https://github.com/IlyasMakari/npmgraph.an
https://zenodo.org/record/5913761
Open Source Tools (3)
Career Perspective
(preliminary quantitative analysis)
10
PhD
3
Postdoc
3
Professor
2
Permanent researcher
6
Postdoc
8
PhD
secoassist.github.io
Visit our website for more info

Nurturing the Software Ecosystems of the Future

  • 1.
    Nurturing the SoftwareEcosystems of the Future Achievements of an Inter-University Research Project Serge Demeyer, Tom Mens, Coen De Roover & Anthony Cleve secoassist.github.io @seco-assist
  • 2.
    Duration: 4 years(2018-2022) Budget: 2,4 million euros (150k per partner per year) An "Excellence of Science" research project
  • 3.
    Duration: 4 years(2018-2022) Budget: 2,4 million euros (150k per partner per year) An "Excellence of Science" research project
  • 4.
    SOCIO - TECHNICAL Asoftware ecosystem is ... X
  • 5.
    SECO-ASSIST Research Goals Today,over 80% of the software used in any IT product or service is open source Societal challenge Protect the society of the risks and dangers of an increasing dependence on software ecosystems Fundamental goals Study and understand the socio-technical characteristics of software ecosystem health, quality and sustainability over time Predict/assist ecosystem evolution to increase long-term sustainability Applied goal Propose automated tools to help software development communities in increasing their productivity, interaction, quality and resilience over time
  • 6.
    SECO-ASSIST Research Goals Improvesocial health Retain key project contributors and attract new ones Predict abandoners and suggest replacements Identify toxic contributors Ensure sufficient team diversity
  • 7.
    Improve technical health Bettersoftware tests, taking into account the software dependencies, to reduce bugs and security leaks Increase productivity and quality by using reusable software libraries Increase software maintainability by supporting software upgrades and migration to new technologies Improve interactions and co-evolution between data- intensive software and their database(s) SECO-ASSIST Research Goals
  • 8.
  • 9.
  • 10.
  • 14.
    Socio-technical analysis ofsoftware contributor communities evolution and impact of socio-technical congruence in software packaging ecosystems [ESEC/FSE 2019] analysis of pull request comments in GitHub repositories [BENEVOL 2019] probabilistic forecasting model to predict future activity of software project contributors [JSS 2020] Detecting and analysing bot usage in development projects identification of key characteristics exhibited by bot activities [BotSE 2020] ML-based technique for detecting bots based on the repetitiveness of their commenting activity [JSS 2021] study on the prevalence of bots as contributors in GitHub projects [IEEE Software 2022] Improving development workflow automation longitudinal study on the use and evolution of Continuous Integration tools in GitHub [SANER 2022] large-scale quantitative analysis of the GitHub Actions ecosystem [ICSME 2022] Studying variant projects in software families study of the prevalence and importance of project forking in GitHub [BENEVOL 2020, EMSE 2022]. motivations for launching variants and impediments to maintaining the co-existing projects [SANER 2022] study to quantify the extent of the sub-optimal maintenance in software families [ESEC/FSE22] Main results – social networks & development workflows
  • 15.
    Identifying inadequate testsuites mutation coverage to measure the strength of a test suite recommendation of extra asserts to make the suite stronger [VST 2020] Test amplification first demonstration of the feasibility of test amplification for dynamically typed languages SmallAmp – a tool to strengthen test suites within the Pharo Smalltalk ecosystem [EMSE 2022] AmPyfier – first tool to strengthen test suites within the Python ecosystem [JSEP 2022] Test transplantation use tests from dependent projects to increase the coverage of base packages [EASE 2022] use test slicing to reconstruct the appropriate object states when transplanting tests [SCAM 2022] Main results – software testing
  • 16.
    Release & implementationrecommendations for library contributors target = Ansible Galaxy ecosystem (reusable Infrastructure-as-Code libraries) automated version increment recommendation (minor, major, patch) [SCAM 2020, JSS 2021, MSR 2021] detection of 6 novel code smells related to Ansible’s semantics [MSR 2022] Selection recommendations for library users helping developers choose a library within vast ecosystems [SoHeal 2020, SCAM 2020, SANER 2022] Instantiation recommendations for framework users graph-based mining of frequent framework instantiation patterns [SANER 2019] capturing the interplays between multiple related instantiation actions [SCAM 2022] Dependency recommendations for library contributors quantifying the problem of outdated dependencies [ICSME 2018, JSEP 2019, SANER 2019] quantifying the outdatedness of packages pre-installed in DockerHub images [SCP 2021, EMSE 2021] analyzing the adherence to semantic versioning [TSE 2019, SoHeal 2020, SCP 2021] assessing the impact of security vulnerabilities [MSR 2018, EMSE 2022] studying the practice of backporting fixes (including security patches) to older releases [TSE 2022] Main results – software reuse
  • 17.
    Static detection andanalysis of SQL bad smells static detection of bad smells in SQL queries [ICSE 2018] prevalence and evolution of SQL code smells in data-intensive open source systems [MSR 2020] Empirical studies on data-intensive systems analyzing self-admitted technical debt in database access code [EMSE 2022] investigating the (joint) use of data models and technologies [ER 2021] Modeling, manipulating and evolving multi-database systems HyDRa – a conceptual framework to design and manipulate hybrid polystores [ER 2021] … and to ease their evolution [SANER 2022, BENEVOL 2022] performance-based recommendation of polystore schema changes [BENEVOL 2022, ER 2022] automated query adaptation to preserve system consistency [SCAM 2020] Analyzing database-related testing practices state-of-the-practice in testing database manipulation code [CAiSE 2021] taxonomy of best practices for testing database code [Information Systems 2022] Main results – database interactions
  • 18.
    Open Source Tools(1) BoDeGHA: A command-line tool to identify development bots in GitHub repositories by analysing pull request and issue comments https://github.com/mehdigolzadeh/BoDeGha BoDeGiC: An (open source) command-line tool to identify bots in GitHub repositories by analysing git commit messages https://github.com/mehdigolzadeh/BoDeGiC SQLInspect: A static SQL analyzer with plug-in support for Eclipse to inspect database usage in Java applications https://bitbucket.org/csnagy/sqlinspect GAP: a command-line tool for forecasting future commit activity of contributors involved in software projects distributed through git https://github.com/AlexandreDecan/gap ConPan: an open source command-line tool to inspect Docker containers https://github.com/neglectos/ConPan
  • 19.
    SmallAmp: a testamplification tool in Pharo Smalltalk to create new test methods based on manually written ones to increase mutation coverage https://github.com/mabdi/small-amp Small-mince: A tool to slice tests in Pharo Smalltalk https://github.com/mabdi/small-mince PaReco: a tool to detect missed opportunities and effort duplication in ecosystems https://github.com/KadjelRamkisoen/PaReco Continuous Integration Antipattern Analyzer: a command line tool to analyze CI workflows in git repositories https://github.com/FreekDS/CIAN portion: a Python library (with 300+ stars on GitHub) providing data structures and operations to create, manipulate and query disjunctions of intervals of any comparable objects and interval sets out of the box https://github.com/AlexandreDecan/portion SISMIC: a Python library providing a tool suite to define, simulate, execute and test statecharts, supporting test- driven development, behaviour-driven development, design by contract, and property statecharts to monitor violations of behavioural properties during statechart execution https://github.com/AlexandreDecan/sismic Open Source Tools (2)
  • 20.
    MUTAMA: a toolrecommending MVNRepository tags for a given Java library https://github.com/cvelazquezr/MUTAMA RESICO: a tool for resolving the simple names of API types in incomplete code snippets (e.g., from Stack Overflow) to their fully-qualified name https://github.com/cvelazquezr/RESICO SCARE: a tool for extracting the structural changes between two releases of an Ansible role published on the Ansible Galaxy ecosystem https://github.com/ROpdebee/SCARE LiFUSO: a tool for enumerating library features from its Stack Overflow posts https://github.com/softwarelanguageslab/lifuso HyDRa: a framework for hybrid polystore modeling and manipulation https://github.com/gobertm/HyDRa npmgraph: A tool for checking license compatibilities for npm packages https://github.com/IlyasMakari/npmgraph.an https://zenodo.org/record/5913761 Open Source Tools (3)
  • 21.
    Career Perspective (preliminary quantitativeanalysis) 10 PhD 3 Postdoc 3 Professor 2 Permanent researcher 6 Postdoc 8 PhD
  • 22.