Evolving(So*ware(Ecosystems(
Marktoberdorf(Summer(School(2014

Lecture(1
Tom(Mens(
So#ware(Engineering(Lab(
University(of(Mons
informa7que.umons.ac.be/genlog
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
About(me
• PhD(obtained(in(1999(@VUB,(Brussels(
• «(A(Formal(Founda7ons(for(Object?Oriented(So#ware(
Evolu7on(»(
• FWO(post?doc(un7l(2003((@VUB)(
!
• Lecturer/Associate(professor(@(UMONS,(
Mons(since(2003(
• Head(of(the(so#ware(engineering(lab(
• Research(in(so#ware(evolu7on(
• Teaching(in(so#ware(engineering
2
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Current(Research(Interests
• so#ware(evolu7on(
• so#ware(quality(
• model?driven(so#ware(engineering(
• empirical(so#ware(engineering(
• mul7modal(human?machine(interac7on(
!
• Use(of(formal(techniques(to(support(the(above((
• graph(transforma7on(
• logic?based(formalisms(
• model(checking(
• sta7s7cal(analysis(
• Develop(automated(tool(support
3
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Ongoing(Research(Projects
ARC Project « Ecological Studies of Open Source Software
Ecoysystems »
- 2012-2017
- Interdisciplinary research
- Use ideas from biological ecology to understand and improve evolution/
maintenance of software ecosystems
!
FRFC Project « Data-Intensive Software System Evolution »
- 2013-2017, in collaboration with A. Cleve (University of Namur)
- Empirical study of evolution of data-intensive software systems
-Co-evolution between database and code
!
FRIA PhD Scholarship « Executable Modeling of Gestural Interaction
Applications »
- 2011-2015
- Using domain-specific modeling languages
- Based on high-level Petri nets and graph transformation
4
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
L1
Human?machine(interac7on
Executable Modeling of

Gestural Interaction Applications
5
Evolving(So*ware(Ecosystems(
Research(Context
Tom(Mens(
So#ware(Engineering(Lab(
University(of(Mons
informa7que.umons.ac.be/genlog
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Research(Context
•(Study(of(macroDlevel(so#ware(evolu7on(
– evolu7on(of(large(collec7ons(or(distribu7ons(of(
so#ware(projects(or(packages(
– E.g.(forges(like(GITHUB,(SourceForge,(Savannah,(
Google(Code
7
J.M.(Gonzalez?Barahona(et(al.(Macro&level*so,ware*evolu/on:*a*case*study*of*a*large*
so,ware*compila/on.(Empirical(So#ware(Engineering(14(3):(262?285((2009)(
M.(Caneill,(S.(Zacchiroli.(Debsources:*Live*and*historical*views*on*macro&level*so,ware*
evolu/on.(Int.(Symp.(ESEM(2014(
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Research(Context
•(Study(of(macroDlevel(so#ware(evolu7on(
– evolu7on(of(large(collec7ons(or(distribu7ons(of(
so#ware(projects(or(packages(
– E.g.(forges(like(GITHUB,(SourceForge,(Savannah,(
Google(Code(
•(Focus(on(coherent(collec7ons(of(projects(
– (a.k.a.(so*ware(ecosystems(
– E.g.(Debian,(Ubuntu,(GNOME,(KDE,(CRAN,(Eclipse,(…(
•(Study(social/community(aspects(of(these(
collec7ons
8
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Research(Context
Focus(on(open(source(so#ware(
•(Free(access(to(source(code,(defect(data,(
developer(and(user(communica7on(
•(Historical(data(available(in(open(repositories(
– Observable(communi7es(
– Observable(ac7vi7es(
•(Increasing(popularity(for(personal(and(
commercial(use(
•(A(huge(range(of(community(and(so#ware(sizes

9
Long?term(goals
• Determine the main factors that drive
the success or failure of OSS projects
within their ecosystem
!
• Investigate new techniques and
mechanisms to predict and improve
quality and survival of OSS projects
– Inspired by research in biological ecology
10
informa7que.umons.ac.be/genlog/projects/ecos
Long?term(goals(
Research(Ques7ons
• Specific questions depend on the software
ecosystem under study
!
CRAN
Debian
Gnome
Eclipse
Long?term(goals(
Research(Ques7ons
• Specific questions for Eclipse
• Software development environment
• Plugin framework architecture
• Which plugins pose less problems or survive
longer?
• e.g. based on their API usage
• stable and supported APIs versus unstable,
discouraged and unsupported APIs
12
Businge et al. “Survival of Eclipse third-party plug-ins”; “Co-evolution
of the Eclipse SDK Framework and Its Third-Party Plug-Ins”; “Analyzing
the Eclipse API Usage: Putting the Developer in the Loop”
Mens et al. “The Evolution of Eclipse”, ICSM 2008
Long?term(goals(
Research(Ques7ons
• Specific questions for CRAN
• R package archive network
• Which packages are more likely to cause, upon
update, problems in dependent packages?
13
Claes et al. “On the Maintainability of CRAN Packages”,
CSMR-WCRE 2014
German, Adams et al. “The Evolution of the R Software Ecosystem”,
CSMR 2013
Long?term(goals(
Research(Ques7ons
14
Long?term(goals(
Research(Ques7ons
15
Long?term(goals(
Research(Ques7ons
• CRAN (R package archive)
• R package description file format:
16
Package: pkgname!
Version: 0.5-1!
Date: 2004-01-01!
Title: My First Collection of Functions!
Authors@R: c(person("Joe", "Developer", role = c("aut", "cre"),!
! ! email = "Joe.Developer@some.domain.net"),!
! person("Pat", "Developer", role = "aut"),!
! person("A.", "User", role = "ctb",!
! ! email = "A.User@whereever.net"))!
Author: Joe Developer and Pat Developer, with contributions from A. User!
Maintainer: Joe Developer <Joe.Developer@some.domain.net>!
Depends: R (>= 1.8.0), nlme!
Suggests: MASS!
Description: A short (one paragraph) description of what!
the package does and why it may be useful.!
License: GPL (>= 2)!
URL: http://www.r-project.org, http://www.another.url!
BugReports: http://pkgname.bugtracker.url
Long?term(goals(
Research(Ques7ons
• CRAN (R package archive)
• Types of R package dependencies:
• Strong dependencies (required to install and load a
package)
• Depends: all packages the package directly depends
upon
• Imports: all packages whose namespaces are
imported from
• Weak dependencies
• Suggests: packages that are not necessarily needed.
E.g. they may be required for running some tests or
examples, but not for installing/loading
• Enhances: packages that enhance other packages,
e.g., by providing methods for classes from these
packages
17
Long?term(goals(
Research(Ques7ons
18
• CRAN Package
dependencies
• Generated with
miniCRAN
• Tool to create and
visualise an internally
consistent, mini version
of CRAN with selected
packages only.
Long?term(goals(
Research(Ques7ons
• Specific questions for Debian
• Open source Linux distribution
• Which packages are more likely to cause future

co-installation conflicts with other packages?
• Can I upgrade a set of installed Debian packages
without “breaking” my installation?
• Based on a formalisation and SAT solving
19
Vouillon and Di Cosmo, “Broken Sets in
Sotware Repository Evolution”, ICSE 2013
Long?term(goals(
Research(Ques7ons
• Debian (open source Linux distribution)
• Historical evolution of package co-installation conflicts
20
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
03-05
06-05
09-05
12-05
03-06
06-06
09-06
12-06
03-07
06-07
09-07
12-07
03-08
06-08
09-08
12-08
03-09
06-09
09-09
12-09
03-10
06-10
09-10
12-10
03-11
06-11
09-11
12-11
03-12
06-12
09-12
12-12
03-13
06-13
09-13
12-13
03-14
06-14
number of packages
number of conflicting packages
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
0,5
0,55
03-05
06-05
09-05
12-05
03-06
06-06
09-06
12-06
03-07
06-07
09-07
12-07
03-08
06-08
09-08
12-08
03-09
06-09
09-09
12-09
03-10
06-10
09-10
12-10
03-11
06-11
09-11
12-11
03-12
06-12
09-12
12-12
03-13
06-13
09-13
12-13
03-14
06-14
ratio of conflicting packages
Long?term(goals(
Research(Ques7ons
• Specific questions for GNOME
• Linux desktop environment
• Which projects have a higher chance of
survival?
• How is workload distributed over different
projects/contributors?
• What is the “bus factor” risk? Who are the top
contributors (for a specific activity type)?
21
“Vasilescu et al. “On the variation and specialisation
of workload: A case study of the GNOME ecosystem
community”, Emp. Softw. Eng. journal 2014.
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Long?term(goals(
Research(Ques7ons
22
GNOME(Collaborator(Network
Long?term(goals(
Tooling
• Develop automated tools that help
– developer (communities) improve upon their practices
– companies and users to compare and adopt OSS
projects
23
a
b
c
d
e
Birth First Release Maintenance Major Release
Long?term(goals(
Tooling
• Develop automated tools
– E.g. Complicity (Neu et al., University of Lugano)
24
1
2
3
Fig. 1. Main View of Complicity visualizing the GNOME projects at ecosystem level, and the details of the Gimp project
Long?term(goals(
Tooling
• Develop automated tools
– E.g. SECONDA for Gnome
25
VI. WORK IN PROGRESS
The current version of the tool is under active de-
velopment and changing every week. Currently, we are
working on the following issues that will be integrated
in future releases of SECONDA: integrate the identity
matching algorithm (already implemented) as a postpro-
cessing phase of the data extraction module; implement
an incremental version of the data extraction and metrics
computation to accommodate new projects’ revisions
without needing to recompute everything everytime,
thereby saving bandwith, time and memory; integrate the
community and developer analysis; integrating the sta-
tistical analysis module; implement a reporting module;
analyse other ecosystems than GNOME; support other
types of version repositories, and integrate information
from bug trackers, mailing lists and development fora.
IV. DATA EXTRACTION AND METRICS COMPUTATION
SECONDA provides two types of manipulation of the
data extracted from GIT repositories: global analysis,
which clones the GIT repository and performs a global
coarse-grained analysis; and local analysis, which anal-
yses the software projects in the local repository clone
at a fine-grained level. Although both analyses can be
used independently, it is advisable to first carry out the
global analysis, and then request a local analysis for
those projects that deserve more attention.
A. Global analysis
The data extraction module downloads and maintains
a local cached copy of the GNOME repository on which
the metrics module runs a coarse-grained analysis, using
SLOCCOUNT for obtaining the projects’ size metrics,
for the latest revision of each project, and with GIT for
Long?term(goals(
Tooling
• Develop automated tools
• Example maintaineR (for CRAN packages)
26
Long?term(goals(
Tooling
• Develop automated tools
• E.g. coinst and other tools for visualising and
resolving co-installation conflicts
coinst.irill.org
27
Relevant(Scien7fic(
Venues
• ICSME
• Int. Conf. Software Maintenance and Evolution
• SANER (merger of CSMR and WCRE)
• Int. Conf. Software Analysis, Evolution and Reengineering
• MSR
• Working Conference on Mining Software Repositories
• IWSECO
• International Workshop on Software Ecosystems
28

MOD2014-Mens-Lecture1