Academia-to-Industry Transition of Search and Learning- Based Software Engineering: Opportunities and Challenges

Academia-To-Industry Transition Of Search And Learning-
Based Software Engineering: Opportunities And Challenges
Bestoun S. Ahmed, Ph.D.
Department of Computer Science and Engineering
Faculty of Electrical Engineering
Czech Technical University in Prague
Karlovo náměstí 13, 121 35 Praha 2
@bestoon82
albeybes@fel.cvut.cz
www.bestoun.net

The Two Cultures
• 1959- the clash of "the two
cultures”
• The humanities and the
sciences.
• A similar cleavage between the
academy and industry.

–But why not most of them were not used by industry?
“We are publishing many great solutions for
nowadays’ problems”

Academia-Industry Collaboration
TRANSACTIONS OF THE AMERICAN CLINICAL AND CLIMATOLOGICAL ASSOCIATION, VOL. 113, 2002
ACAIDEMIC-INDUSTRIAL COLLABORATION: THE GOOD,
THE BAD, AND THE UGLY
JOSEPH B. MARTIN
BOSTON, MA
ABSTRACT
Academic-industrial collaborations and technology transfer have
over the past 50 years played an increasingly prominent role in the
biomedical sciences. University partnerships with industry can expe-
dite the availability of innovative drugs and other medical technolo-
gies, bringing both important public health benefits and a source of
income for universities and their faculty through a variety offinancial
arrangements. However, these relationships raise ethical concerns,
particularly when research involves human subjects in clinical trials.
Lapses in oversight of industry-sponsored clinical trials at universi-
ties, and especially patient deaths in a number of trials, have brought
these issues into the public spotlight and have led the federal govern-
ment to intensify its oversight of clinical research. The leadership of
Harvard Medical School convened a group of leaders in academic

Academia-Industry: Two Different Missions
• Academic mission:
• Education and discovery driven
by intellectual curiosity-what we
in academia like to regard as
"pure motives.”
• Industry mission:
• Translational research,
commercialization, and proﬁt
making.
Breaches between the two missions?

Breach The Wall
• Science,Technology, Engineering, Computer Science (19th century onwards)
Patents > Licensing > Royalties
• Medical Devices and Biotechnology (1950 onwards)
• Basic science support from industry (1980 onwards)
ACADEMIC-INDUSTRIAL COLLABORATION
Academia Industry
Misi9n Mission
Education, discovery Translational research,
driven by intellectual commercialization,
curiosity: "pure motives" profit mailng
FIG. 1. The two cultures.
wall separating these two activities, rendering an increasingly porous
interface, admired by many and abhorred by some. The first breach in
the wall developed around technology, engineering, and computer sci-
ence, which led to a very deliberate process ofpatenting, licensing, and
royalty income by major research universities engaged in the funda-
mental sciences (Table 1). During the last 50 years, with the National
TABLE 1
Breaches in the Wall
1. Science, Technology, Engineering, Computer Science (19th century onwards)
Patents > Licensing > Royalties
2. Medical Devices and Biotechnology (1950-2000): same process
New ethical issue: agents or devices to be used in humans
228

Information and Software Technology 79 (2016) 106–127
Contents lists available at ScienceDirect
Information and Software Technology
journal homepage: www.elsevier.com/locate/infsof
Challenges and best practices in industry-academia collaborations in
software engineering: A systematic literature review
Vahid Garousia,b,∗
, Kai Petersenc
, Baris Ozkand
a
Software Engineering Research Group, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
b
Maral Software Engineering Consulting Corporation, Calgary, Canada
c
Department of Software Engineering, School of Engineering, Blekinge Institute of Technology, Sweden
d
Department of Information Systems Engineering, Atilim University, Ankara, Turkey
a r t i c l e i n f o
Article history:
Received 31 December 2015
Revised 5 May 2016
Accepted 23 July 2016
Available online 30 July 2016
Keywords:
Software engineering
Industry-academia collaborations
Industry
Universities
Challenges
Success patterns
Best practices
Systematic literature review
a b s t r a c t
Context: The global software industry and the software engineering (SE) academia are two large commu-
nities. However, unfortunately, the level of joint industry-academia collaborations in SE is still relatively
very low, compared to the amount of activity in each of the two communities. It seems that the two
’camps’ show only limited interest/motivation to collaborate with one other. Many researchers and prac-
titioners have written about the challenges, success patterns (what to do, i.e., how to collaborate) and
anti-patterns (what not do do) for industry-academia collaborations.
Objective: To identify (a) the challenges to avoid risks to the collaboration by being aware of the chal-
lenges, (b) the best practices to provide an inventory of practices (patterns) allowing for an informed
choice of practices to use when planning and conducting collaborative projects.
Method: A systematic review has been conducted. Synthesis has been done using grounded-theory based
coding procedures.
Results: Through thematic analysis we identiﬁed 10 challenge themes and 17 best practice themes. A key
outcome was the inventory of best practices, the most common ones recommended in different contexts
*Study period 2003-2016

The ratio of authors from academia, industry, and
joint authorships (Published Research Papers)
SE topic areas of the projects studied
Types of contributions Research types

–Some companies have NDA and it is hard to convince them to publish papers
“Keep in mind !”

The Relationship: Experience
• Universities are changing the management vision to funding-oriented
management.
• Many requests for collaboration from academia.
• Less Response from industry.
AcademiaIndustry
Many Collaboration Requests
Less Collaboration Requests

Theory versus practice
Industry versus academe

Challenges To Collaborate In Soft Eng.
• There are many challenges mentioned by Garousi et. al*
• Most relevant to us:
• Results produced through research are not relevant for practice
• Researchers do not understand the relevant problems from an
industry point of view
• Different interests and objectives
• Different reward systems
• Lack of prior relationships between a company and academia
• Lack of resources due to high investment in terms of resources
• Licensing restrictions on tools
*V. Garousi, K. Petersen, and B. Ozkan,‘‘Challenges and best practices in industry-academia collaborations in software engineering:A systematic literature review,’’
Inf. Softw.Technol., vol. 79, pp. 106–127, Nov. 2016

Barriers: Our Experience
• Companies are interested in fast output.
• Academia is infested in publication, which is not preferred
generally by industry (information disclosing avoidance).
• Bureaucracy from the university side (especially the lawyers).
• Bureaucracy from the industry side, especially to ﬁnd the
contact person.
• Size of the company is important.
• Collaboration and ﬁnding goes inversely with the company size.

Another chance to collaborate ?
“Search and Learning-based Software Engineering”

One Of My Meetings With Industry
- Look, I don’t think search or learning-based algorithms will contribute to our work.
- But, let us do the meeting.
• That is basically the end of the meeting from the beginning.

SBSE
“The application of metaheuristic search-based
optimization techniques to ﬁnd near-optimal solutions
in software engineering problems”

• The term SBSE was ﬁrst used in 2001 by Harman and Jones.
• However, optimization to a software engineering problem was reported
by Webb Miller and David Spooner in 1976 in the area of software testing.
Search-based software engineering
Mark Harmana,*, Bryan F. Jonesb,1
a
Department of Information Systems and Computing, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK
b
School of Computing, University of Glamorgan, Pontypridd, CF37 1DL, UK
Abstract
This paper claims that a new ®eld of software engineering research and practice is emerging: search-based software engineering. The paper
argues that software engineering is ideal for the application of metaheuristic search techniques, such as genetic algorithms, simulated
annealing and tabu search. Such search-based techniques could provide solutions to the dif®cult problems of balancing competing (and
some times inconsistent) constraints and may suggest ways of ®nding acceptable solutions in situations where perfect solutions are either
theoretically impossible or practically infeasible.
In order to develop the ®eld of search-based software engineering, a reformulation of classic software engineering problems as search
problems is required. The paper brie¯y sets out key ingredients for successful reformulation and evaluation criteria for search-based software
engineering. q 2001 Elsevier Science B.V. All rights reserved.
Keywords: Software engineering; Metaheuristic; Genetic algorithm
1. Introduction
Software engineers often face problems associated with
the balancing of competing constraints, trade-offs between
concerns and requirement imprecision. Perfect solutions are
often either impossible or impractical and the nature of the
problems often makes the de®nition of analytical algorithms
problematic.
Like other engineering disciplines, software engineering
is typically concerned with near optimal solutions or those
which fall within a speci®ed acceptable tolerance. It is
precisely these factors which make robust metaheuristic
search-based optimisation techniques readily applicable.
Metaheuristic algorithms, such as genetic algorithms
(GA) [17], simulated annealing [37] and tabu search [16]
have been applied successfully to a number of engineering
GA research and researchers have even received interest
from observers in the ®eld of social science. Though GA
practitioners may not agree with the ®ndings of sociologists
[18], it is an indication of the wide appreciation of the
signi®cance of these search-based technologies that they
should have penetrated the collective consciousness of
even `non-technical' disciplines such as social science.
However, the discipline of software engineering appears
to be unique with regard to the application of genetic algo-
rithms (and similar search-based, metaheuristic optimisa-
tion techniques); metaheuristic algorithms have received
comparatively little attention from software engineers in
comparison with that which they have received from
researchers and practitioners in the more established ®elds
of engineering.
Information and Software Technology 43 (2001) 833±839
www.elsevier.com/locate/infsof

The first paper to use a meta-heuristic search technique was
probably the work of Boyer, Elspas and Levitt on the SELECT
system [16]. The paper is remarkable in many ways. Consider
the following paragraph, quoted from the paper:
“The limitation of the above algorithms to linear
combinations is an unacceptable, and vexing, one.
For example, they could not handle an inequality like
X∗Y +10∗Z− W ≥ 5 among its constraints, unless
one were prepared to assign to X a trial value,
and then attempt a solution (assuming the other
inequalities are linear). We therefore considered
various alternatives that would not be subject to this
limitation. The most promising of these alternatives
appears to be a conjugate gradient algorithm (‘hill
climbing’ program) that seeks to minimise a poten-
tial function constructed from the inequalities.” [16]
Here we can see, not only the first use of computational
search (hill climbing) in software engineering, but also a
hint at the idea (assignment of concrete values) that was
subsequently to become Dynamic Symbolic Execution (DSE)
[21]. Within this single paragraph we therefore may arguably
find the origins of both DSE and SBST (and, by extension,
SBSE too).
The SELECT paper is also remarkable in its sober and
prescient assessment of the relative merits of testing and
verification. Shortly after its publication, these two closely
related research communities entered into a protracted and
unhelpful ‘feud’ that generated a great deal more heat than
light [29], [31], [35], [60]. Fortunately, we have more recently
witnessed an accommodation between the two communities
[61], and greater degree of welcome collaboration at their
intersection [59]. We really ought to ruefully reflect on the
delay in this rapprochement given the ‘understanding’ already
Fig. 1: Cumulative number of Search Based Software Testing
papers. As can be seen, the overall trend continues to suggest
a polynomial yearly rise in the number of papers, highlighting
the breadth of interest and strong health of SBST.
Unlike Boyer et al. [16], Miller and Spooner used concrete
execution of the program rather than symbolic execution,
making their approach more similar to the techniques that
ultimately became SBST, while the work of Boyer et al.
followed a closely-related (but different) evolutionary path,
which ultimately led to DSE. Current research develops both
these techniques, and also hybrids that combine the best
features of both [9], [63], [71], [110].
It appears that SBST research lay dormant for at approxi-
mately a decade until the work of Korel [68], which introduced
a practical test data generation approach, the Alternating
http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/ 2/2
The number of publications in the year from 1976 t0 2012
Fig. 2: The changing ratio of SBSE papers that are SBST
papers. Initially, SBST dominated SBSE. Over the years, this
te
th
‘r
d
te
a
y
e
ri
th
fo
te
c
m
p
*Y. J. M. Harman andY. Zhang,“Achievements, open problems and challenges for search based software testing,” in Proc. 8th IEEE Int. Conf. Softw.
Testing,VerificationValidation,Apr. 2015, pp. 1–12.

How It Works? Simply
• There are few general steps that are common among all
the tools and algorithms:
1. Choose a search-based optimization technique.
2. Decide on an objective function (ﬁtness function)
that will be a benchmark for the optimization
process.
3. Search for the best candidate among a list of
candidates.
4. Based on the chosen optimization technique, update
the list of candidates.
5. Iterate until no better candidates found.

PSO As An Example
• PSO = Particle Swarm Optimization
• "Particles" ﬂy in this hyperspace and try to ﬁnd the global minima/maxima, their movement
being governed by a simple mathematical equation.
( ) ( )1 1 2 , 3 ,
1 1
t t i t t g t t
t t t
v c v c p x c p x
x x v
+
+ +
ì = + - + -ï
í
= +ïî pi
vt
xt pg
xt+1
particle’s itself
particle’s personal best
particle’s neighbours best
,
,
1 2 3
: velocity at time step
: position at time step
: best previous position, at time step
: best previous best, at time step ,
, , : co
neighbour'
gnitive/social
s
t
t
i t
g t
v t
x t
p t
p t
c c c
=
=
=
=
= confidence coefficients

Your problem must be a non-deterministic problem!
“Remember”
• Even for the same input, can exhibit different behaviors on different runs, as
opposed to a deterministic algorithm.
• There should not be an exact algorithm to solve your problem.
• Many algorithms using Search-based algorithms for deterministic problems
nowadays just to catch the attention and use some fancy representations.
• But ………. They are wrong algorithms

1- The problem representation
“Two important components”
2- The Fitness Function

Fitness Function: The Key Ingredient
• In Software Engineering: the ﬁtness function converts some
requirement of the system into some measurable number.
Requirements
Functional
Non-functional
Like the coverage
of some attribute
of the system
-Security
-Usability
-Safety
-Efﬁciency
-Energy Consumption
-Scalability
-Availability

Test Case Generation And Minimization
• 34 switches = 234 = 1.7 x 1010 possible inputs = 1.7 x 1010 tests?
• Every possible combination of inputs? Which interactions cause faults?
• How many test cases
we need to test?

The combinatorial interaction approach

Payment Server Smart Phone Web Server User Browser Business Database
Master iPhone iPlanet Chrome SQL
Visa Blackberry Apache Explorer Oracle
Firefox Access

Test Case Minimization Approach
Red color tuples 25% of the t-tuples
Green color tuples 50% of the t-tuples
Blue color tuples 75% of the t-tuples
Brown color tuples 100% of the t-tuples

How To Use The Fitness Function?
PSTG Strategy - How to use
ﬁtness function to choose
best test case that cover most
of the interactions

Some applications of combinatorial interaction

GUI Testing
The use of event based modeling in GUI testing

Configuration Generation
nest to be added to the FTS
es in the d-tuples list (Step
g as n-tuples remain in the
dure of the strategy.
enerated test suite, the strat-
tudy on a reliable artifact
correctness of the strategy
m. The generated test suite
internal structure of the ar-
ation, the test suite is filtered
d detected faults. In making
ency of the first stage (i.e.,
at of other strategies.
licly as tools to be down-
strategies are unavailable
n the same environment is
n. The proposed strategy is
gies, namely Jenny, TConfig,
experimental environment
ws 7 operating system, 64-
of RAM. The algorithms are
ents
ed by the size of the con-
pared with those strategies
and criteria to be converted into a weighted number. Each criteri-
on has an effect on the final result, which decides the rank and
monthly wage of the officer. The final number is the resulting point.
The program is selected because it has a nontrivial code base and
different configurations. Fig. 12 shows the main window of the
program.
The program regards different configurations as input factors. Each
input factor has different levels. For example, the user can choose
“No Degree,” “Primary,” “Secondary,” “Diploma,” “Bachelor,” “Master,”
and “Doctorate” levels for the “Degree” factor. Table 3 summarizes
the factors and levels for the program.
To this end, the input configuration of the program can be rep-
resented by one factor with seven levels, one factor with six levels,
eight factors with two levels each, and two factors with three levels
each. Thus, this input configuration can be notated in an MCA no-
tation as MCA (N; d, 71
61
28
32
). We need 96,768 test cases to test
Table 3
Summary of the input factors and levels for the case study program.
No. Factors Levels
1 Degree [No Degree, Primary, Secondary, Diploma,
Bachelor, Master, Doctor]
2 Children [Non, 1, 2, 3, 4, More_than_4]
3 Read [Checked, unchecked]
4 Write [Checked, unchecked]
5 Speak [Checked, unchecked]
6 Understand [Checked, unchecked]
7 New graduate [Checked, unchecked]
8 Experience [Checked, unchecked]
9 English [Checked, unchecked]
10 Disability [Checked, unchecked]
11 Marital status [Single, Married, Widow]
12 Resident [Local, Outsider, Foreigner]

Control Engineering
Application of combinatorial test design
• Application of Combinatorial Interaction PID
ControllerTuning.

Generating Conﬁguration Tests For SPL

Avocado tool + CIT plugin
• The goal is to add the CIT plugin to the other plugin set
Development
team
Avocado
+
CIT plugin
Quality Engineering
team
CIT
Test
cases
Problem
specification

CIT
Context and Structure
"Varianter" object (all combinations)
"Varianter" object (resulting combinations)
- CIT is hooked in the Avocado Job, after
the action of the YAML_TO_MUX plugin,
before the call to the Avocado Runner.
- The hook is triggered by the command
line option "--combinatorial N".
- CIT plugin receives the Varianter object,
serializes it, filters out the unneeded
variants, creates a new Varianter object
out of the remaining variants and returns
the object to the Avocado Job.

Avocado Multiplexer + CIT : Benefits
• Consider the example in the left. There are
three branches, machine(5 leaves), image(6
leaves) and architecture(29 leaves).
• Basically, with the current Avocado
approach, there will be (5x6x29= 870) test
cases.
• Using our CIT algorithm, we can reduce the
number of test cases to 175 while keeping
the test coverage on very similar level.
• By testing the effectiveness of CIT, we found
that the reduced test set is as effective as the
exhaustive test set (i.e., 870 test cases)

Input Parameter Analysis
*D. Richard Kuhn , Raghu N. Kacker ,Yu Lei, Introduction to CombinatorialTesting, Chapman & Hall/CRC, 2013

Other Applications
• Input testing: A systematic sampling of the input
parameters for the system-under-test instead of random
selection.
• Model-based testing: Generating test cases for
many models, like class diagrams, state charts.
• Test suite prioritization: Arrange the test cases in
the most effective way.

Preamble
• In line with the customer demands for new functionalities,
software line of codes (LOCs) increases tremendously in the last
10 years.
• From ->kilobytes -> Megabytes -> Gigabytes->Terabytes
…………

Software Becomes More Complex
• How to do maintenance?
• Test engineers need to test more and more codes!!

Clustering
• Clustering = the process of organizing objects into groups
whose members are similar in some way.
• A cluster is therefore a collection of objects which are “similar” between
them and are “dissimilar” to the objects belonging to other clusters.

Clustering And Impact Analysis

Objectives
• Maximizing cohesion
• Describe functional strength
of a module i.e. similar
functions are grouped into a
speciﬁc module.
• Cohesion is a desirable design
component attribute as when
a change has to be made, it is
localized in a single
component.
• Minimizing coupling
• Describe interdependencies among
modules
• Loose coupling means component
changes are unlikely to affect other
components
• Shared variables or control information
exchange lead to tight coupling
• Loose coupling can be achieved by
state decentralization (as in objects)
and component communication via
parameters or message passing

Modularization Quality And Modularization Factor

Energy Optimization
• Automatically transforming the color scheme of a mobile web application by rewrites the
server side code and templates of a web application so that the resulting web application
generates pages that are more energy efficient when displayed on a smartphone.
Figure 1: the architecture of Nyx
Figure 2: Example HARG for Program 1.
To build the HARG, our approach parses the HOG to ag-
gregate the individual characters in each node’s FSA into
HTML tags. The traversal begins by traversing all of the
edges in the FSA associated with the root node of the HOG
and then following all of the outgoing edges of the root node
and repeating this process until all nodes in the HOG have
been traversed. During the traversal, the approach main-
tains a parse state that allows it to determine if it is cur-
rently parsing an HTML tag, attributes, or text. When the
More broadly, the techniques for obtaining the HOG can
lead to an over approximation of an application’s possible
HTML pages. In turn, this can lead to the identification
of spurious visual relationships that correspond to infeasi
ble paths. This does not cause a problem for the approach
as this merely introduces additional color relationship con
straints that must be accounted for while generating the
Color Transformation Scheme in Section 5.3.
5. COLOR TRANSFORMATION
In the second phase, the approach calculates the new en
ergy e cient color scheme for the application – the Color
Transformation Scheme (CTS). There are two requirements
for the CTS, it must: (1) use energy e cient colors as the
basis for the new color scheme, and (2) maintain the color re
lationships between neighboring HTML elements. The first
requirement serves the general goal of the approach and the
second ensures that the color-transformed pages are readable
and, ideally, as visually appealing as the original pages. To
address the first requirement, the CTS should replace large
light colored background areas with dark colors (preferably
black), as mentioned in Section 2. To address the second
*Making Web Applications More Energy Efficient for OLED Smartphones , Ding Li, Angelica Huyen Tran, William G.J. Halfond, ICSE 2014,
HTML Output Graph (HOG)
HTMLAdjacency Relationship Graph (HARG)
Color Conflict Graph (CCG)
Color Transformation Scheme (CTS)

Learning-Based Software Engineering
• The use of machine learning technique to learn from
some set of data or information retrieval system.
A Learning-Based Software Engineering Environment
Sidney C.Bailin, Robert H. Gattis, and Walt Truszkowski*
Abstract
We describe an initial prototype of a software
engineering environment that combines case-based
reasoning (CBR) and explanation-based learning (EBL)
functions. CBR and EBL are used to evolve the
environment's understanding of softwareprinciples as it is
used. The case base serves as a repository for reusable
solutions to software engineering problems. New
solutions are synthesized from the case base through a
process of adaptation,evaluation,and repair. Whena new
solution is returned, the user has the option of rnodioing
it through a series of primitive edit operations. The
environment is capable of abstracting from these
operations using explanation-based generalization, and
synthesizing a new repair rule on the basis of the
abstraction. We have successfully taught the environment
a non-trivial design repair rule by means of a single
example, and have observed the environment apply this
learned rule to the solution of a new inputproblem.
expertiseon engineeringsoftwaregrowsand developsover
time. A knowledge-based computer system capable of
offloading non-trivial engineering tasks from the human
must also possess such an ability. If a KBSEE is ever to
be able to move beyond "toy" problems, it must be able
to learn.
Our analysis of alternative machine learning
approaches led us to propose a combination of two
techniques:case-based reasoning (CBR) and explanation-
based learning (EBL). Case-based reasoning is, in
essence, an approach to reusing previously acquired
knowledge. New situtations are recognized as having
characteristics in common with situations previously
encountered. Whatever has been learned through the
previous encounter is then applied, with some adaptation
if necessary, to the new situation. In the KBSEE, these
situations(cases) are softwareengineeringproblems (for
example, a set of requirements), and the reusable
knowledge consists of solutionsto such problems (e.g., a
design meetingtherequirements).
Explanation-basedlearning is an approach to learning

Two Main Applications
• Classiﬁcation learning
• Learning from crowdsourcing
• Adaptive feedback system learning
SUTAlgo
Learn
Ver.

Applied Soft Computing 62 (2018) 579–591
Contents lists available at ScienceDirect
Applied Soft Computing
journal homepage: www.elsevier.com/locate/asoc
Learning to classify software defects from crowds: A novel approach
Jerónimo Hernández-Gonzáleza,∗
, Daniel Rodriguezc
, Iñaki Inzaa
, Rachel Harrisond
,
Jose A. Lozanoa,b
a
Department of Computer Science and Artificial Intelligence, University of the Basque Country UPV/EHU, Donostia, Spain
b
Basque Center for Applied Mathematics BCAM, Bilbao, Spain
c
Department of Computer Science, University of Alcala, Madrid, Spain
d
Department of Computing, Oxford Brookes University, Oxford, UK
a r t i c l e i n f o
Article history:
Received 1 April 2017
Received in revised form 27 October 2017
Accepted 31 October 2017
Available online 7 November 2017
Keywords:
Learning from crowds
Orthogonal defect classification
Missing ground truth
Bayesian network classifiers
a b s t r a c t
In software engineering, associating each reported defect with a category allows, among many other
things, for the appropriate allocation of resources. Although this classification task can be automated
using standard machine learning techniques, the categorization of defects for model training requires
expert knowledge, which is not always available. To circumvent this dependency, we propose to apply
the learning from crowds paradigm, where training categories are obtained from multiple non-expert
annotators (and so may be incomplete, noisy or erroneous) and, dealing with this subjective class infor-
mation, classifiers are efficiently learnt. To illustrate our proposal, we present two real applications of
the IBM’s orthogonal defect classification working on the issue tracking systems from two different real
domains. Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled
by a crowd of annotators are used to predict the category (impact) of reported software defects. The
considered methodologies show enhanced performance regarding the straightforward solution (major-
ity voting) according to different metrics. This shows the possibilities of using non-expert knowledge
aggregation techniques when expert knowledge is unavailable.
© 2017 Elsevier B.V. All rights reserved.

Table 3
Examples of defects and the corresponding labelings provided by the annotators.
Summary Description L1 L2 L3 L4 L5
Compendium dataset
Error Launching
Compendium LD after
install
Hi team, Error message launching
Compendium LD after initial
install: Java Virtual Machine
Launcher Could not find the main
class:
com.compendium.ProjectCompendium.
Program will exit. I have run
through the suggestion on the
forums of adding the path to javaw
in the.bat, and verified the path
through a command prompt is
successful, same error. Any other
tips? Regards, Eric
Installability Other Installability Installability Installability
Spell Checker Add a spelling checker to
Compendium with the ability to
switch on and off auto-spell
checking.
Requirements Requirements Requirements Requirements Requirements
Can small icons also work
with images? Make small
images?
Right now when you choose small
icons, it shrinks the normal
Compendium icons but not any
reference node images, so they
stay really big. Can we add an
option to shrink those
proportionally as well?
Requirements Requirements Usability Requirements Other
Text find/replace Global search/find/replace
functionality. Maybe coupled with
existing search parameters. Ability
to change text in found nodes
without having to open the nodes,
edit the label/detail, etc.
Requirements Usability Usability Requirements Requirements
Mozilla dataset
NSS autoconf does not
include IRIX
–enable-crypto does not work on
IRIX as security/nss/confi-gure.in
does not define XP UNIX and
friends on IRIX.
Requirements Reliability Maintenance Requirements Maintenance
Mozilla automatically
checks the “Reassign bug
to” radio button
Mozilla automatically checks the
“Reassing bug to” radio button in
Bugzilla causing unintentional
changes to bugs. Tested with
win32 051404 mozilla win32 build
on NT. More to come.
Other Other Reliability Reliability Reliability
Installation failed with
error -214 due to empty
flash.xpi
seen on mac commercial build
2001-05-09-04-trunk. The
installers, both full and stub, failed
with a -214 error. Though the
installation “appears” complete,
when launched, it crashes at the
Installability Installability Installability Installability Installability
roblem. A graphical description of the training process is
n Fig. 3. The use of the training crowd-labeled data by the
earning techniques is compared.
se experiments were carried out using our own imple-
n of the different learning algorithms and evaluation
. As our developments are written in Java, we can make
of several data management features currently imple-
n the popular software Weka [52]. In this way, the text
by users to describe defects (in two text fields, summary
ption) was processed. In a pre-processing stage, standard
niques have been used to extract a relevant set of vari-
m the text fields and transform the original database into
which can be handled by ML techniques. Specifically, the
tringToWordVector filter implemented in Weka [52] was
p-words were removed based on Rainbow [53], text was
to lowercase; the iterated version of the Lovins stem-
was applied as well as an alphabetic tokenizer where
the relatively recent emergence of the learning
paradigm, the model evaluation in this scenario is st
explored. In this paper, the evaluation strategy foll
on the same idea that confers its characteristic rob
majority voting: the combination of multiple indep
ments [9,40]. That is, the mean value of the perfor
calculated using the labels of one annotator at a t
truth is considered. In practice, all the experiments
are evaluated as follows (see Fig. 4): (1) after mode
performance of the model is estimated using the
each labeler, one at a time, as ground truth, and (2) t
of all the estimates is the final performance value.
imental results are obtained with a 10 × 5-fold cr
procedure [55].

Learn from your testing output !
What is next ?
SUT
Search&Learn-based
Testing Strategy
System’s input Test output

On The Challenges Of Search Algorithm
• Changing the optimization algorithm will not necessary
change the output to better. (My Experience)
• It is the coverage criteria and the ﬁnest function that lead
to better results. (My Experience)

SBSE And LBSE: Challenges With The Industry
• The non-deterministic output due to the randomness.
• Expensive computation.
• Modeling the system or the problem space.
• Realize that you need one of them.
Probably we need to change our way of
communication with industry

Academia-to-Industry Transition of Search and Learning- Based Software Engineering: Opportunities and Challenges

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Academia-to-Industry Transition of Search and Learning- Based Software Engineering: Opportunities and Challenges

Similar to Academia-to-Industry Transition of Search and Learning- Based Software Engineering: Opportunities and Challenges (20)

Recently uploaded

Recently uploaded (20)

Academia-to-Industry Transition of Search and Learning- Based Software Engineering: Opportunities and Challenges