The document discusses performing large-scale software engineering studies. It outlines how empirical research is currently done, including issues with small sample sizes, lack of experiment replication, and unavailable tools and data. The document then proposes a platform for software engineering research to address these issues. The platform would provide pre-processed data in standard formats, shared tools and results, and large-scale processing capabilities to enable more rigorous empirical studies.
17. Researcherβs view
Time
1. Go to project.org
2. Dowload SVN, Mail,
Bug, ask for IRC logs
3. .....?
4. Publish research
18. Researcherβs view (some) Projectβs view
Time
1. Go to project.org 1. Hm, a new visitor
2. Hey, she is mirroring our bugzilla
2. Dowload SVN, Mail,
Bug, ask for IRC logs 3. .....?
3. .....? 4. Ban her!
4. Publish research
19. Researcherβs view (some) Projectβs view
Time
1. Go to project.org 1. Hm, a new visitor
2. Hey, she is mirroring our bugzilla
2. Dowload SVN, Mail,
Bug, ask for IRC logs 3. .....?
3. .....? 4. Ban her!
4. Publish research
28. In our research
1. We examined the current situation
2. We propose a platform for large scale
research
3. We validated its design with 2 case studies
29. Empirical
= Model
Study
+ Data
+ Metrics / Tools
+ Analysis Methods
+ Results analysis
30. Empirical
= Model
Study
+ Data
+ Metrics / Tools
+ Analysis Methods
+ Results analysis
31. Empirical
= Model
Study
+ Data
+ Metrics / Tools
+ Analysis Methods
+ Results analysis
32. Empirical
= Model
Study
+ Data
+ Metrics / Tools
+ Analysis Methods
+ Results analysis
33. Research methods
40
35
30
25
Number of Works
20
15
10
5
0
0 EXP FCS ECS CCS SUR
Research Methods used
34. What sources of data are in use?
45
40
35
30
Number of Works
25
20
15
10
5
0
0 BTS SRC SF ECT SCM
Data Source
35. What is the examined data size (in number of projects) ?
30
25
20
Number of Works
15
10
5
0
0 1 2 3 4 5 6 8 10 25 50
Sample sizes (number of projects)
36. Findings
β’ Sample size very small
β’ How can we extract generic results?
β’ No experiment replication
β’ Do we believe in each otherβs work or
just ignore it?
β’ We did not check the stats...
37. @
Only 20% of the tools and data reported in ICSE
papers could be retrieved a year after publication
41. Ready made tools
Formalised data
formats
Easily extensible
A software engineering
research platform
Researcher Large scale
Pre processed data community procesing
49. Data
Raw Data Metadata
Tool Results
Mailing Lists BTS SCM
Processed raw data
50. Mirroring Root
/
project.
Project 1 Project 2 Project 3
properties
git svn mails bugs
Standard Standard
GIT SVN
format bug<id>
format
.xml
List 1 List 2
tmp cur new
messageid.eml
51. Raw Data
Mirroring Root
/
project.
Project 1 Project 2 Project 3
properties
git svn mails bugs
Standard Standard
GIT SVN
format bug<id>
format
.xml
List 1 List 2
tmp cur new
messageid.eml
55. Tools
Metric
Metric
Plug-in
Job Metadata Web Cluster Metric
Plug-in Metric
Logging
Schedul Updater services Service Plug-in Activator
er
DB Messagi Web Plug-in Parser Data
Security
Service ng Admin Admin Servic Access
e
SQO-OSS SQO-OSS SQO-OSS
Project 1
PV
svn mails bugs
PV
PV
List 1 List 2
PV
Project
Mirror
tmp cur new
Metadata Storage
58. Tools
@MetricDeclarations(metrics = {
@MetricDecl(mnemonic="MNOF", activators={ProjectDirectory.class},
descr="Number of Source Code Files in Module"),
@MetricDecl(mnemonic="MNOL", activators={ProjectDirectory.class},
descr="Number of lines in module", dependencies={"Wc.loc"}),
@MetricDecl(mnemonic="AMS", activators={ProjectVersion.class},
descr="Average Module Size"),
@MetricDecl(mnemonic="ISSRCMOD", activators={ProjectDirectory.class},
descr="Mark for modules containing source files")
})
public class ModuleMetricsImplementation extends AbstractMetric {
public void run(ProjectFile pf) throws AlreadyProcessingException {[...]}
public void run(ProjectVersion pv) throws AlreadyProcessingException {[...]}
public List<Result> getResult(ProjectFile pf, Metric m) {
return getResult(pf, ProjectFileMeasurement.class, m, Result.ResultType.INTEGER);
}
public List<Result> getResult(ProjectVersion pv, Metric m) {
return getResult(pv, ProjectVersionMeasurement.class, m, Result.ResultType.FLOAT);
}
}
59. public void run(ProjectFile pf) {
// We do not support directories
Tools
if (pf.getIsDirectory()) {
return;
}
InputStream in = fds.getFileContents(pf);
if (in == null) {
return;
}
// Create an input stream from the project file's content
try {
// Measure the number of lines in the project file
LineNumberReader lnr =
new LineNumberReader(new InputStreamReader(in));
int lines = 0;
while (lnr.readLine() != null) {
lines++;
}
lnr.close();
// Store the results
Metric metric = Metric.getMetricByMnemonic("LOC");
ProjectFileMeasurement locm = new ProjectFileMeasurement();
locm.setMetric(metric);
locm.setProjectFile(pf);
locm.setWhenRun(new Timestamp(System.currentTimeMillis()));
locm.setResult(String.valueOf(lines));
db.addRecord(locm);
markEvaluation(metric, pf.getProjectVersion().getProject());
} catch (IOException e) {
log.error(this.getClass().getName() + " IO Error <" + e
+ "> while measuring: " + pf.getFileName());
}
}
67. How do we identify
intense discussions
100000 120000
Number of messages Levels of thread depth
90000
100000
80000
70000
80000
60000
Occurences
Occurences
50000 60000
40000
40000
30000
20000
20000
10000
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Number of messages per thread Thread depth
68. Hypotheses
β’ H1: Number of messages and thread depth
are dependent variables
β’ H2: We can identify intense discussions by
identifying threads in top depth and msg/
thread quartiles
β’ H3: Intense discussions affect the
repositoryβs source line intake
69. Method
β’ Import projects in Alitheia Core
β’ Develop metric plug-in to count the
variables we are interested in
β’ 3 metrics
β’ Emails from 60 projects, ~1,2 * 10^6 emails,
679427 threads
β’ Plug-in loc: 270 lines
70. 100
Fit function (0.55 + 1.59x)
80
Number of Messages
60
40
20
0
0 5 10 15 20 25 30 35 40 45 50
Thread depth
H1:
R^2 = 0.70
71. 100
Fit function (0.55 + 1.59x)
80
Number of Messages
60
40
20
0
0 5 10 15 20 25 30 35 40 45 50
Thread depth
H1:
R^2 = 0.70
76. Hypotheses
β’ H1: Number of programmers affects code
maintainability at the project level
β’ H2: Number of programmers affects code
maintainability at the directory level.
77. Method
β’ Per langauge
β’ C & Java (risky)
β’ Plug-ins that calculate
β’ Number of developers per period of time
β’ Halstead & McCabe (700 lines)
β’ Omarβs Maintainability Index (240 lines)
β’ Data from 213 projects
Software engineering and is an empirical science as it tries to explain phenomena that occur in software development using data that are a result of it.\n
This statement is not mine but belongs to Vic Basilli and other people that said or implied it before him or worked with him or followed his recommendation and started a new and exciting research area, the MSR. So being an empirical science software engineering has to follow the steps of scientific method\n
This statement is not mine but belongs to Vic Basilli and other people that said or implied it before him or worked with him or followed his recommendation and started a new and exciting research area, the MSR. So being an empirical science software engineering has to follow the steps of scientific method\n
As required by the scientific method we observe the behaviour or the data...\n
We formulate hypotheses...\n
And we build models ....\n
Validate or invalidate those hypotheses by running them against data from the real world. As an empirical science, we are in constant need for data\n
While in other empirical fields it is difficult and expensive to get empirical data, for example consider medicine,\n in software engineering we have the OSS movement which produces vast quantities \nso the question that comes into mind is &#x201C;can we use all that free data to do research with?&#x201D;\n
This may sound like a rhetorical question, but data shows that it is not. \n
From a systematic review we have done, most software engineering papers (even those in good journals and conferences) are validating their hypotheses with data from just a couple of projects\n
Trying to explain why, let&#x2019;s have a look at some project numbers first. The largest project repository than one can work with is KDE at ~50GB of data. In a talk, Audris Mockus said that he collected more than 1TB of data\n
If we compare to other empirical sciences \n
The other great problem is that of original data disparity. Most research so far was done with CVS and Bugzilla. In reality, there are a lot of tools that store process data \n
Then there is the problem of non co-operating projects, which is OK because running and maintaining the infrastructures researchers almost never give back to the community\n
Then there is the problem of non co-operating projects, which is OK because running and maintaining the infrastructures researchers almost never give back to the community\n
Then there is the problem of non co-operating projects, which is OK because running and maintaining the infrastructures researchers almost never give back to the community\n
So let&#x2019;s see how is research done in more mature discilines\n
The have large data sets that are pre-processed. Researchers most of the time do not have to download new data and massage them before starting conducting experimets. the Flossmole project has shown the way\n
Researchers share their results in the form of workshops and conferences, competitions and more significantly tools that produce them\n
They also do not take forgranted everything published as they replicate the studies and the findings. We can give the example of medicine here.\n
And most important of all, most other empirical research disciplines have research platforms. Shared tools that permit researchers to conduct research using standardised input and output formats on standardized datasets.\n
But most of them are already using machines like this (and get funding) to do their experiments on shared research infrastructures\n
Or request huge amounts of funding for building machines like this\n
\n
\n
By reading more than 300 papers and conducting a systematic review of more than 70 randomly selected\n
By reading more than 300 papers and conducting a systematic review of more than 70 randomly selected\n
By reading more than 300 papers and conducting a systematic review of more than 70 randomly selected\n
By reading more than 300 papers and conducting a systematic review of more than 70 randomly selected\n
By reading more than 300 papers and conducting a systematic review of more than 70 randomly selected\n
\n
\n
\n
\n
Our results are reinforced by a similar study that Carlo Gezzi did for his ICSE 2009 keynote\n
By saying the work platform, several requirements appear in front of us\n
By saying the work platform, several requirements appear in front of us\n
By saying the work platform, several requirements appear in front of us\n
By saying the work platform, several requirements appear in front of us\n
By saying the work platform, several requirements appear in front of us\n
By saying the work platform, several requirements appear in front of us\n
All those reasons led our research group to the SQO-OSS project, which produced the Alitheia Core tool. The project&#x2019;s aim was to produce software quality analysis tools, but the original targets strayed towards creating infrastructures rather than the tools themselves\n
All those reasons led our research group to the SQO-OSS project, which produced the Alitheia Core tool. The project&#x2019;s aim was to produce software quality analysis tools, but the original targets strayed towards creating infrastructures rather than the tools themselves\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
This is the interface each plugin metric must satisfy\n
This is how a metric plug-in looks like\n
This is the implementation of the line counting plug-in I &#x2018;ve presented you earlier on, minus some bureaucracy (constructors, imports etc). Just about 20 Locs to retrieve a file from a revision, read its lines and store results in a database. This is comparable to a shell or python script, but it is way much faster. The abstractions Alitheia Core are very high level and cross platform and this is why the overhead it adds in algorithm implementation is minimal\n
\n
\n
This is our cluster: Left a 3 thread processing node + project server, middle a 6 thread processing core, bottom mirroring and storage of raw data, file server, database and 8 processing threads to keep the CPUs busy while I/O, right web server + 16 slow processing threads. To scale, we just need more processing nodes.\n
\n
Another example of the cluster on its knees - Linear scalability display: The screen of the database server running just the database while other nodes start connecting: Queries Per Second increase is almost linear and load equals to processor cores -> the machine is saturated. \n